Dynamic difficulty with reinforcement learning

By Jelmar Riedeman

Content

1. The goal
2. The research
- 2.1. Reinforcement learning algorithms
- 2.2. Unity frameworks
- 2.3. External frameworks
  - 2.3.1. Python frameworks
  - 2.3.2. C# frameworks
3. Making the game and algorithm
4. The results
5. Sources

1. The goal

My goal is to create a pacman game in Unity where the ghost skills scale with the skills of the player.

2. The research

At first I researched if there are any existing machine learning frameworks for Unity.

2.1. Reinforcement learning algorithms

Reinforcement learning is a type of machine learning takes actions based on the enviorment it is in and gets rewarded or punished accordingly.

(What Is Reinforcement Learning?, 2022)

There are multiple different types of reinforcement learning.
So to figure out which ones we need we fill first need to look at the requirements for the project.

There are model based and model free algorithms.
Model based algorithms try to learn the map and take the best action it can take.
Model free algorithms however uses its previous experiences in the current state to take new actions.
Since model based algorithms try to learn the map and model free algorithms take action based on their current state the model free algorithms can learn faster based on the player actions to create a dynamic difficulty for the player.

Then there is one more question we need to answer before we can figure out whitch reinforcement learning algorithm we need to implement.
That is because there are two types of model free reinforcement learning.
The policy based algorithms witch try to find a formula that can solve every problem.
And Q learning which take actions based on the state they are currently in.
Because the ghosts in Pacman are not that smart and i still want them to learn from the player I have chosen for a Q learning algorithm with only a few states.

(Moni, 2021)

2.2. Unity frameworks

I found one framework for Unity called Unity ML-Agents.

In this framework there are a few pros and cons.
The pros are that after it is set up it is easy to use, and it learns by playing the game.
There are also some cons to the framework like.
The framework can only learn when the local server is running.
And the framework cannot be used by computers with older CPU’s.

(Unity ML-Agents Toolkit, n.d.)

2.3. External frameworks

There are two types of external frameworks i researched.
The Python frameworks, because that is where most machine learning frameworks are made for.
And C#, because Unity is made in C# and that makes working together easier.

2.3.1. Python frameworks

One of the reasons to use python is that most machine learning frameworks are made in python.

I looked at the biggest tree frameworks to figure out which one to use.
Keras:
This framework has the most optimization options.

(Team Keras, n.d.)

Tensorflow:
This one is the easiest to use.

(Introduction to RL and Deep Q Networks | TensorFlow Agents, n.d.)

Pytorch:
A framework that is in between ease of use and advanced optimization.

(Reinforcement Learning (DQN) Tutorial — PyTorch Tutorials 1.13.0+cu117 Documentation, n.d.)

Sadly, the functionality of python in Unity is still quite new. Because of this python only works in the Unity editor and not while playing the game.

(Python Scripting | Python Scripting | 7.0.0-pre.1, n.d.)

2.3.2. C# frameworks

Unity is made in C#, so I also researched if there are any reinforcement learning algorithms in this language.

There are also multiple frameworks for C# so I looked at the biggest tree.
Numpy:
This is the one with the most optimization options.

(GitHub – SciSharp/Numpy.NET: C#/F# Bindings for NumPy – a Fundamental Library for Scientific Computing, Machine Learning and AI, n.d.)

Keras & Tensorflow:
In the C# version of the framework Keras and the framework Tensoflow they both say they use each other’s APIs to create the separate frameworks. They also both say that the Tensorflow framework the best is of the two.
Both also say they are the same as there Python counterparts only written in C#.

(GitHub – SciSharp/Keras.NET: Keras.NET Is a High-level Neural Networks API for C# and F#, With Python Binding and Capable of Running on Top of TensorFlow, CNTK, or Theano., n.d.)

(GitHub – SciSharp/TensorFlow.NET: .NET Standard Bindings for Google’s TensorFlow for Developing, Training and Deploying Machine Learning Models in C# and F#., n.d.)

These frameworks require a .net console applications to be installed. This means that they cannot be directly used in Unity. They can however talk to Unity in the way a network application would. Then there is also verry little documentation about the frameworks making it hard to use them.

3. Making the game and algorithm

I did not make the Pacman game myself. I got if from Zigurous (n.d.).

To make the Q Learning algorithm I followed the tutorial Learn Python with Rune (2021).
The tutorial is made for a different game and with a different programming language so I had to change quite a few things.
The first thing I had to do was deciding what the different states are the ghost can be in. The state is basically what for us would be our vision of the screen. What I decided the ghost could see is which direction it can go, if Pacman has eaten a power up and can now eat the ghosts and how far Pacman is removed from the player (where the ghost can see the distance if Pacman is in a distance from 1 to 5 or 5+).

private void ApplyLearning()
{
    float newStateMax = qTable[state][GetBestActionIndex()];
    qTable[oldState][action] = (1 - alpha) * qTable[state][action] + alpha * (reward + gamma * newStateMax - oldStateMax);
    oldStateMax = qTable[state][GetBestActionIndex()];
    oldState = state;
    reward = 0;
 }

The ghost gets rewarded or punished accordingly and updates his Q Table. I configured the math formula so that the ghost priorities long long term rewards over short term rewards. This is important because the state is not guaranteed to change if the ghost takes an action, so it is better if the ghost also prioritizes the long-term rewards although not by a lot, or the ghost will whait too long with eating Pacman.

int numberOfStates = 0    
numberOfStates += maxActionsPerOption * maxActionsPerOption * maxActionsPerOption * maxActionsPerOption * distancePrecicion * distancePrecicion * numberOfGhostStates;
numberOfStates += maxActionsPerOption * maxActionsPerOption * maxActionsPerOption * distancePrecicion * distancePrecicion * numberOfGhostStates;
numberOfStates += maxActionsPerOption * maxActionsPerOption * distancePrecicion * distancePrecicion * numberOfGhostStates;
numberOfStates += maxActionsPerOption * distancePrecicion * distancePrecicion * numberOfGhostStates;
numberOfStates += distancePrecicion * distancePrecicion * numberOfGhostStates;
numberOfStates += distancePrecicion * numberOfGhostStates;
numberOfStates += numberOfGhostStates;

When this was all over I had to make a save function in the game to save the Q table. For that I fellowed the tutorial from Brackeys (2018).
Because the ghosts need to train for a while I made it so that after every round the ghost Q tables are saved.
To not slow down the game I used another threat to save the Q tables while the ghosts continued on training without impact performance..

With all this done there was a working Pacman game with a learning ghost.

4. The results

In the end everything worked, however it did not work as was envisioned.

The ghost where able to learn, but before they learned they ware random and after the training they were stupid.
I tried to correct this by increasing the number of states (the amount of things the ghost could see), but Q tables increase exponentially, and because I used 4 ghosts they quickly hit the limit.

I also tried to instead of the position of pacman to give the ghost a part of the map around it, but by using this the ghosts where able to see even less of the area around it with a huge Q table (number of states).

This all went wrong because I decided the ghost from Pacman did not have to be that smart, and chose to use a Q table. I should have used Policy based algorithms instead of a Q learning algorithm. That way the ghosts could have become smarter and I could have just stopped training them when they were smart enough. Making things stupider is after all way easier than making things smart.

5. Sources

What is reinforcement learning? (2022, August 22). deepsense.ai. https://deepsense.ai/what-is-reinforcement-learning-the-complete-guide/

Moni, R. (2021, December 7). Reinforcement Learning algorithms — an intuitive overview. Medium. https://smartlabai.medium.com/reinforcement-learning-algorithms-an-intuitive-overview-904e2dff5bbc

Unity ML-Agents Toolkit. (n.d.). GitHub. https://github.com/Unity-Technologies/ml-agents

Team, K. (n.d.). Keras documentation: Actor Critic Method. https://keras.io/examples/rl/actor_critic_cartpole/

Introduction to RL and Deep Q Networks | TensorFlow Agents. (n.d.). TensorFlow. https://www.tensorflow.org/agents/tutorials/0_intro_rl

Reinforcement Learning (DQN) Tutorial — PyTorch Tutorials 1.13.0+cu117 documentation. (n.d.). https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

Python Scripting | Python Scripting | 7.0.0-pre.1. (n.d.). https://docs.unity3d.com/Packages/com.unity.scripting.python@7.0/manual/index.html

GitHub – SciSharp/Numpy.NET: C#/F# bindings for NumPy – a fundamental library for scientific computing, machine learning and AI. (n.d.). GitHub. https://github.com/SciSharp/Numpy.NET

GitHub – SciSharp/Keras.NET: Keras.NET is a high-level neural networks API for C# and F#, with Python Binding and capable of running on top of TensorFlow, CNTK, or Theano. (n.d.). GitHub. https://github.com/SciSharp/Keras.NET

GitHub – SciSharp/TensorFlow.NET: .NET Standard bindings for Google’s TensorFlow for developing, training and deploying Machine Learning models in C# and F#. (n.d.). GitHub. https://github.com/SciSharp/TensorFlow.NET

Zigurous. (n.d.). GitHub – zigurous/unity-pacman-tutorial: Learn to make Pacman in Unity. GitHub. https://github.com/zigurous/unity-pacman-tutorial

Learn Python with Rune. (2021, July 26). Reinforcement Learning from Scratch | Learn Python | 8h Full Python Course | Lesson 16 (0-16) [Video]. YouTube. https://www.youtube.com/watch?v=y4LEVVE2mV8

Brackeys. (2018, December 2). SAVE & LOAD SYSTEM in Unity [Video]. YouTube. https://www.youtube.com/watch?v=XOjd_qU2Ido