Compare commits

...

2 commits
main ... colab

Author SHA1 Message Date
Jake Walker
8b968b1676 Remove solutions 2024-06-17 13:34:20 +01:00
Jake Walker
3e562267fb Update notebooks for Google Colab compatability 2024-06-17 11:11:20 +01:00
6 changed files with 32 additions and 859 deletions

View file

@ -64,30 +64,8 @@
"metadata": {},
"outputs": [],
"source": [
"num_train_samples = 50000\n",
"\n",
"x_train = np.empty((num_train_samples, 3, 32, 32), dtype=\"uint8\")\n",
"y_train = np.empty((num_train_samples,), dtype=\"uint8\")\n",
"\n",
"for i in range(1, 6):\n",
" file_path = os.path.join(\"cifar-10-batches-py\", f\"data_batch_{i}\")\n",
" (\n",
" x_train[(i - 1) * 10000 : i * 10000, :, :, :],\n",
" y_train[(i - 1) * 10000 : i * 10000],\n",
" ) = load_batch(file_path)\n",
"\n",
"file_path = os.path.join(\"cifar-10-batches-py\", \"test_batch\")\n",
"x_test, y_test = load_batch(file_path)\n",
"\n",
"y_train = np.reshape(y_train, (len(y_train), 1))\n",
"y_test = np.reshape(y_test, (len(y_test), 1))\n",
"\n",
"if backend.image_data_format() == \"channels_last\":\n",
" x_train = x_train.transpose(0, 2, 3, 1)\n",
" x_test = x_test.transpose(0, 2, 3, 1)\n",
"\n",
"x_test = x_test.astype(x_train.dtype)\n",
"y_test = y_test.astype(y_train.dtype)"
"from keras.datasets import cifar10\n",
"(x_train, y_train), (x_test, y_test) = cifar10.load_data()"
]
},
{
@ -151,8 +129,8 @@
"metadata": {},
"outputs": [],
"source": [
"y_train_one_hot = keras.src.utils.numerical_utils.to_categorical(y_train, 10)\n",
"y_test_one_hot = keras.src.utils.numerical_utils.to_categorical(y_test, 10)"
"y_train_one_hot = keras.utils.to_categorical(y_train, 10)\n",
"y_test_one_hot = keras.utils.to_categorical(y_test, 10)"
]
},
{
@ -278,6 +256,15 @@
"Let's try and feed a picture of a cat to the model, and see what it thinks... As a reminder, the model hasn't been trained on pictures of cats."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!wget -O cat.jpg https://git.subspace.solutions/cads/ai-lesson-resources/media/branch/main/cat.jpg"
]
},
{
"cell_type": "code",
"execution_count": null,

View file

@ -1,359 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Image Classification\n",
"\n",
"Simple image classification using the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html).\n",
"\n",
"The CIFAR-10 dataset has 60,000 32x32 colour images in 10 classes (6,000 per class). These are split into 50,000 training images and 10,000 testing images.\n",
"\n",
"Here are the classes:\n",
"1. Airplane\n",
"2. Car\n",
"3. Bird\n",
"4. Cat\n",
"5. Deer\n",
"6. Dog\n",
"7. Frog\n",
"8. Horse\n",
"9. Ship\n",
"10. Truck"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import keras\n",
"import numpy as np\n",
"import os\n",
"from keras.src.datasets.cifar import load_batch\n",
"from keras import backend\n",
"from skimage.transform import resize\n",
"\n",
"classes = [\n",
" \"airplane\",\n",
" \"car\",\n",
" \"bird\",\n",
" \"cat\",\n",
" \"deer\",\n",
" \"dog\",\n",
" \"frog\",\n",
" \"horse\",\n",
" \"ship\",\n",
" \"truck\",\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load the dataset 💿"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_train_samples = 50000\n",
"\n",
"x_train = np.empty((num_train_samples, 3, 32, 32), dtype=\"uint8\")\n",
"y_train = np.empty((num_train_samples,), dtype=\"uint8\")\n",
"\n",
"for i in range(1, 6):\n",
" file_path = os.path.join(\"cifar-10-batches-py\", f\"data_batch_{i}\")\n",
" (\n",
" x_train[(i - 1) * 10000 : i * 10000, :, :, :],\n",
" y_train[(i - 1) * 10000 : i * 10000],\n",
" ) = load_batch(file_path)\n",
"\n",
"file_path = os.path.join(\"cifar-10-batches-py\", \"test_batch\")\n",
"x_test, y_test = load_batch(file_path)\n",
"\n",
"y_train = np.reshape(y_train, (len(y_train), 1))\n",
"y_test = np.reshape(y_test, (len(y_test), 1))\n",
"\n",
"if backend.image_data_format() == \"channels_last\":\n",
" x_train = x_train.transpose(0, 2, 3, 1)\n",
" x_test = x_test.transpose(0, 2, 3, 1)\n",
"\n",
"x_test = x_test.astype(x_train.dtype)\n",
"y_test = y_test.astype(y_train.dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring 🔎"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(x_train.shape)\n",
"print(y_train.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`x_train` is the actual images in the dataset. You can see they are 32x32 and the 3 is for red, green and blue values.\n",
"`y_train` is the category for each image, this is just a single number between 0 and 9."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_train[1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.imshow(x_train[1])\n",
"print(y_train[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Processing 🫧\n",
"\n",
"Our neural network works with decimal numbers between 0 and 1, so we need to convert the categories into 0s and 1s. We take an array of 0s and set a 1 for the category.\n",
"\n",
"For example, the number 2 would get encoded to `[0, 0, 1, ...]`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_train_one_hot = keras.src.utils.numerical_utils.to_categorical(y_train, 10)\n",
"y_test_one_hot = keras.src.utils.numerical_utils.to_categorical(y_test, 10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# task: can you print out the one hot encoded label for the truck above?\n",
"print(y_train_one_hot[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At the moment each pixel is represented by a number from 0 to 255. We also need to convert these to be between 0 and 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_train = x_train.astype(\"float32\")\n",
"x_test = x_test.astype(\"float32\")\n",
"x_train = x_train / 255\n",
"x_test = x_test / 255"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build and Train CNN 🔨"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D\n",
"\n",
"model = Sequential()\n",
"model.add(\n",
" Conv2D(32, (3, 3), activation=\"relu\", padding=\"same\", input_shape=(32, 32, 3))\n",
")\n",
"model.add(MaxPooling2D(pool_size=(2, 2)))\n",
"model.add(Dropout(0.25))\n",
"model.add(Conv2D(64, (3, 3), activation=\"relu\", padding=\"same\"))\n",
"model.add(MaxPooling2D(pool_size=(2, 2)))\n",
"model.add(Dropout(0.25))\n",
"model.add(Flatten())\n",
"model.add(Dense(512, activation=\"relu\"))\n",
"model.add(Dropout(0.5))\n",
"model.add(Dense(10, activation=\"softmax\"))\n",
"\n",
"model.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hist = model.fit(\n",
" x_train, y_train_one_hot, batch_size=32, epochs=1, validation_split=0.2\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate 🧪"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(x_test, y_test_one_hot)[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"~50% accuracy... not great"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What about for something it's not been trained on?\n",
"\n",
"Let's try and feed a picture of a cat to the model, and see what it thinks... As a reminder, the model hasn't been trained on pictures of cats."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat = plt.imread(\"cat.jpg\")\n",
"cat_resized = resize(cat, (32, 32, 3))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.imshow(cat_resized)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"probabilities = model.predict(\n",
" np.array(\n",
" [\n",
" cat_resized,\n",
" ]\n",
" )\n",
")\n",
"probabilities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"index = np.argsort(probabilities[0, :])\n",
"print(f\"Most likely: {classes[index[9]]}, probability={probabilities[0,index[9]]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional Challenges 🏆\n",
"\n",
"- Try adding in some more layers to the neural network, adding a second `Conv2D` layer under both of the existing ones.\n",
"- Try increasing the number of `epochs` when training.\n",
"- Save/load your model with `model.save('mymodel.h5')` and `keras.models.load_model('mymodel.h5')`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -20,6 +20,16 @@
"## Import the packages 📦"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!apt-get update && apt-get install -y build-essential cmake swig\n",
"!pip install stable-baselines3\\[extra\\]==2.3.2 gymnasium\\[box2d\\]==0.29.1"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -407,6 +417,15 @@
"\n",
"Is moon landing too boring for you? Try to **change the environment**, why not use MountainCar-v0, CartPole-v1 or CarRacing-v0? Check how they work [using the gym documentation](https://www.gymlibrary.dev/) and have fun 🎉."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!wget -O ppo-LunarLander-v2-good.zip https://git.subspace.solutions/cads/ai-lesson-resources/media/branch/main/ppo-LunarLander-v2-good.zip"
]
}
],
"metadata": {

View file

@ -1,446 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "njb_ProuHiOe"
},
"source": [
"# Intro to Reinforcement Learning\n",
"\n",
"This notebook is modified from [Unit 1 of Hugging Face's Deep RL course](https://github.com/huggingface/deep-rl-class/blob/main/notebooks/unit1/unit1.ipynb), it has been simplified."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wrgpVFqyENVf"
},
"source": [
"## Import the packages 📦"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cygWLPGsEQ0m"
},
"outputs": [],
"source": [
"from stable_baselines3 import PPO\n",
"from stable_baselines3.common.env_util import make_vec_env\n",
"from stable_baselines3.common.evaluation import evaluate_policy\n",
"from stable_baselines3.common.monitor import Monitor"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-TzNN0bQ_j-3"
},
"source": [
"At each step:\n",
"- Our Agent receives a **state (S0)** from the **Environment** — we receive the first frame of our game (Environment).\n",
"- Based on that **state (S0),** the Agent takes an **action (A0)** — our Agent will move to the right.\n",
"- The environment transitions to a **new** **state (S1)** — new frame.\n",
"- The environment gives some **reward (R1)** to the Agent — were not dead *(Positive Reward +1)*.\n",
"\n",
"\n",
"With Gymnasium:\n",
"\n",
"1⃣ We create our environment using `gymnasium.make()`\n",
"\n",
"2⃣ We reset the environment to its initial state with `observation = env.reset()`\n",
"\n",
"At each step:\n",
"\n",
"3⃣ Get an action using our model (in our example we take a random action)\n",
"\n",
"4⃣ Using `env.step(action)`, we perform this action in the environment and get\n",
"- `observation`: The new state (st+1)\n",
"- `reward`: The reward we get after executing the action\n",
"- `terminated`: Indicates if the episode terminated (agent reach the terminal state)\n",
"- `truncated`: Introduced with this new version, it indicates a timelimit or if an agent go out of bounds of the environment for instance.\n",
"- `info`: A dictionary that provides additional information (depends on the environment).\n",
"\n",
"For more explanations check this 👉 https://gymnasium.farama.org/api/env/#gymnasium.Env.step\n",
"\n",
"If the episode is terminated:\n",
"- We reset the environment to its initial state with `observation = env.reset()`\n",
"\n",
"**Let's look at an example!** Make sure to read the code\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "w7vOFlpA_ONz"
},
"outputs": [],
"source": [
"import gymnasium as gym\n",
"\n",
"# First, we create our environment called LunarLander-v2\n",
"env = gym.make(\"LunarLander-v2\")\n",
"\n",
"# Then we reset this environment\n",
"observation, info = env.reset()\n",
"\n",
"for _ in range(20):\n",
" # Take a random action\n",
" action = env.action_space.sample()\n",
" print(\"Action taken:\", action)\n",
"\n",
" # Do this action in the environment and get\n",
" # next_state, reward, terminated, truncated and info\n",
" observation, reward, terminated, truncated, info = env.step(action)\n",
"\n",
" # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
" if terminated or truncated:\n",
" # Reset the environment\n",
" print(\"Environment is reset\")\n",
" observation, info = env.reset()\n",
"\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XIrKGGSlENZB"
},
"source": [
"## Create the LunarLander environment 🌛 and understand how it works\n",
"\n",
"### [The environment 🎮](https://gymnasium.farama.org/environments/box2d/lunar_lander/)\n",
"\n",
"In this first tutorial, were going to train our agent, a [Lunar Lander](https://gymnasium.farama.org/environments/box2d/lunar_lander/), **to land correctly on the moon**. To do that, the agent needs to learn **to adapt its speed and position (horizontal, vertical, and angular) to land correctly.**\n",
"\n",
"---\n",
"\n",
"\n",
"💡 A good habit when you start to use an environment is to check its documentation\n",
"\n",
"👉 https://gymnasium.farama.org/environments/box2d/lunar_lander/\n",
"\n",
"---\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "poLBgRocF9aT"
},
"source": [
"Let's see what the Environment looks like:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZNPG0g_UGCfh"
},
"outputs": [],
"source": [
"# We create our environment with gym.make(\"<name_of_the_environment>\")\n",
"env = gym.make(\"LunarLander-v2\")\n",
"env.reset()\n",
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
"print(\"Observation Space Shape\", env.observation_space.shape)\n",
"print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2MXc15qFE0M9"
},
"source": [
"We see with `Observation Space Shape (8,)` that the observation is a vector of size 8, where each value contains different information about the lander:\n",
"- Horizontal pad coordinate (x)\n",
"- Vertical pad coordinate (y)\n",
"- Horizontal speed (x)\n",
"- Vertical speed (y)\n",
"- Angle\n",
"- Angular speed\n",
"- If the left leg contact point has touched the land (boolean)\n",
"- If the right leg contact point has touched the land (boolean)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "We5WqOBGLoSm"
},
"outputs": [],
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"Action Space Shape\", env.action_space.n)\n",
"print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MyxXwkI2Magx"
},
"source": [
"The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮:\n",
"\n",
"- Action 0: Do nothing,\n",
"- Action 1: Fire left orientation engine,\n",
"- Action 2: Fire the main engine,\n",
"- Action 3: Fire right orientation engine.\n",
"\n",
"Reward function (the function that will gives a reward at each timestep) 💰:\n",
"\n",
"After every step a reward is granted. The total reward of an episode is the **sum of the rewards for all the steps within that episode**.\n",
"\n",
"For each step, the reward:\n",
"\n",
"- Is increased/decreased the closer/further the lander is to the landing pad.\n",
"- Is increased/decreased the slower/faster the lander is moving.\n",
"- Is decreased the more the lander is tilted (angle not horizontal).\n",
"- Is increased by 10 points for each leg that is in contact with the ground.\n",
"- Is decreased by 0.03 points each frame a side engine is firing.\n",
"- Is decreased by 0.3 points each frame the main engine is firing.\n",
"\n",
"The episode receive an **additional reward of -100 or +100 points for crashing or landing safely respectively.**\n",
"\n",
"An episode is **considered a solution if it scores at least 200 points.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dFD9RAFjG8aq"
},
"source": [
"#### Vectorized Environment\n",
"\n",
"- We create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments, this way, **we'll have more diverse experiences during the training.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "99hqQ_etEy1N"
},
"outputs": [],
"source": [
"# Create the environment\n",
"env = make_vec_env(\"LunarLander-v2\", n_envs=16)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VgrE86r5E5IK"
},
"source": [
"## Create the Model 🤖\n",
"- We have studied our environment and we understood the problem: **being able to land the Lunar Lander to the Landing Pad correctly by controlling left, right and main orientation engine**. Now let's build the algorithm we're going to use to solve this Problem 🚀.\n",
"\n",
"- To do so, we're going to use our first Deep RL library, [Stable Baselines3 (SB3)](https://stable-baselines3.readthedocs.io/en/master/).\n",
"\n",
"- SB3 is a set of **reliable implementations of reinforcement learning algorithms in PyTorch**.\n",
"\n",
"---\n",
"\n",
"💡 A good habit when using a new library is to dive first on the documentation: https://stable-baselines3.readthedocs.io/en/master/ and then try some tutorials.\n",
"\n",
"----"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HLlClRW37Q7e"
},
"source": [
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/sb3.png\" alt=\"Stable Baselines3\">"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HV4yiUM_9_Ka"
},
"source": [
"To solve this problem, we're going to use SB3 **PPO**. [PPO (aka Proximal Policy Optimization) is one of the SOTA (state of the art) Deep Reinforcement Learning algorithms that you'll study during this course](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#example%5D).\n",
"\n",
"PPO is a combination of:\n",
"- *Value-based reinforcement learning method*: learning an action-value function that will tell us the **most valuable action to take given a state and action**.\n",
"- *Policy-based reinforcement learning method*: learning a policy that will **give us a probability distribution over actions**."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5qL_4HeIOrEJ"
},
"source": [
"Stable-Baselines3 is easy to set up:\n",
"\n",
"1⃣ You **create your environment** (in our case it was done above)\n",
"\n",
"2⃣ You define the **model you want to use and instantiate this model** `model = PPO(\"MlpPolicy\")`\n",
"\n",
"3⃣ You **train the agent** with `model.learn` and define the number of training timesteps\n",
"\n",
"```\n",
"# Create environment\n",
"env = gym.make('LunarLander-v2')\n",
"\n",
"# Instantiate the agent\n",
"model = PPO('MlpPolicy', env, verbose=1)\n",
"# Train the agent\n",
"model.learn(total_timesteps=int(2e5))\n",
"```\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nxI6hT1GE4-A"
},
"outputs": [],
"source": [
"# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n",
"# if we had frames as input we would use CnnPolicy\n",
"model = PPO(\n",
" policy=\"MlpPolicy\",\n",
" env=env,\n",
" n_steps=1024,\n",
" batch_size=64,\n",
" n_epochs=4,\n",
" gamma=0.999,\n",
" gae_lambda=0.98,\n",
" ent_coef=0.01,\n",
" verbose=1,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ClJJk88yoBUi"
},
"source": [
"## Train the PPO agent 🏃\n",
"- Let's train our agent for 1,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~20min, but you can use fewer timesteps if you just want to try it out.\n",
"- During the training, take a ☕ break you deserved it 🤗"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qKnYkNiVp89p"
},
"outputs": [],
"source": [
"# TODO: Train it for 10,000 timesteps\n",
"model.learn(total_timesteps=5_000_000)\n",
"\n",
"# TODO: Specify file name for model and save the model to file\n",
"model_name = \"ppo-LunarLander-v2\"\n",
"model.save(model_name)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BY_HuedOoISR"
},
"source": [
"## Evaluate the agent 📈\n",
"- Remember to wrap the environment in a [Monitor](https://stable-baselines3.readthedocs.io/en/master/common/monitor.html).\n",
"- Now that our Lunar Lander agent is trained 🚀, we need to **check its performance**.\n",
"- Stable-Baselines3 provides a method to do that: `evaluate_policy`.\n",
"- To fill that part you need to [check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#basic-usage-training-saving-loading)\n",
"- In the next step, we'll see **how to automatically evaluate and share your agent to compete in a leaderboard, but for now let's do it ourselves**\n",
"\n",
"\n",
"💡 When you evaluate your agent, you should not use your training environment but create an evaluation environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yRpno0glsADy"
},
"outputs": [],
"source": [
"# TODO: Evaluate the agent\n",
"# Create a new environment for evaluation\n",
"eval_env = Monitor(gym.make(\"LunarLander-v2\"))\n",
"\n",
"# Evaluate the model with 10 evaluation episodes and deterministic=True\n",
"mean_reward, std_reward = evaluate_policy(\n",
" model, eval_env, n_eval_episodes=10, deterministic=True\n",
")\n",
"\n",
"# Print the results\n",
"print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "reBhoODwcXfr"
},
"source": [
"- In my case, I got a mean reward is `200.20 +/- 20.80` after training for 1 million steps, which means that our lunar lander agent is ready to land on the moon 🌛🥳."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BQAwLnYFPk-s"
},
"source": [
"## Some additional challenges 🏆\n",
"The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!\n",
"\n",
"Can you beat your neighbour's mean reward?\n",
"\n",
"Here are some ideas to achieve so:\n",
"* Train more steps\n",
"* Try different hyperparameters for `PPO`. You can see them at https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#parameters.\n",
"* Check the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) and try another model such as DQN.\n",
"\n",
"Is moon landing too boring for you? Try to **change the environment**, why not use MountainCar-v0, CartPole-v1 or CarRacing-v0? Check how they work [using the gym documentation](https://www.gymlibrary.dev/) and have fun 🎉."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View file

@ -1,24 +0,0 @@
FROM docker.io/library/ubuntu:22.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update \
&& apt-get install -y --no-install-recommends apt-utils build-essential g++ curl cmake zlib1g-dev libjpeg-dev xvfb xorg-dev libboost-all-dev libsdl2-dev swig python3 python3-dev python3-future python3-pip python3-setuptools python3-wheel python3-tk libatlas-base-dev cython3 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN python3 -m pip install --upgrade pip \
&& python3 -m pip install jupyterlab keras==3.3.3 matplotlib==3.9.0 numpy==1.26.4 tensorflow==2.16.1 scikit-image==0.22.0 \
&& python3 -m pip install "gymnasium[box2d]==0.29.1" "stable-baselines3[extra]==2.3.2"
RUN apt-get update && apt-get install -y wget
WORKDIR /work
COPY . /work
RUN /work/download-data.sh \
&& rm /work/*_solutions.ipynb
ENV DEBIAN_FRONTEND teletype
CMD xvfb-run -s "-screen 0 1400x900x24" \
/usr/local/bin/jupyter lab --port 8888 --ip=0.0.0.0 --allow-root

View file

@ -1,4 +0,0 @@
#!/bin/bash
wget -O cifar-10-python.tar.gz https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar -xvf cifar-10-python.tar.gz
rm cifar-10-python.tar.gz