Remove solutions
This commit is contained in:
parent
3e562267fb
commit
8b968b1676
2 changed files with 0 additions and 805 deletions
|
@ -1,359 +0,0 @@
|
||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"# Image Classification\n",
|
|
||||||
"\n",
|
|
||||||
"Simple image classification using the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html).\n",
|
|
||||||
"\n",
|
|
||||||
"The CIFAR-10 dataset has 60,000 32x32 colour images in 10 classes (6,000 per class). These are split into 50,000 training images and 10,000 testing images.\n",
|
|
||||||
"\n",
|
|
||||||
"Here are the classes:\n",
|
|
||||||
"1. Airplane\n",
|
|
||||||
"2. Car\n",
|
|
||||||
"3. Bird\n",
|
|
||||||
"4. Cat\n",
|
|
||||||
"5. Deer\n",
|
|
||||||
"6. Dog\n",
|
|
||||||
"7. Frog\n",
|
|
||||||
"8. Horse\n",
|
|
||||||
"9. Ship\n",
|
|
||||||
"10. Truck"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import matplotlib.pyplot as plt\n",
|
|
||||||
"import keras\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import os\n",
|
|
||||||
"from keras.src.datasets.cifar import load_batch\n",
|
|
||||||
"from keras import backend\n",
|
|
||||||
"from skimage.transform import resize\n",
|
|
||||||
"\n",
|
|
||||||
"classes = [\n",
|
|
||||||
" \"airplane\",\n",
|
|
||||||
" \"car\",\n",
|
|
||||||
" \"bird\",\n",
|
|
||||||
" \"cat\",\n",
|
|
||||||
" \"deer\",\n",
|
|
||||||
" \"dog\",\n",
|
|
||||||
" \"frog\",\n",
|
|
||||||
" \"horse\",\n",
|
|
||||||
" \"ship\",\n",
|
|
||||||
" \"truck\",\n",
|
|
||||||
"]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Load the dataset 💿"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"num_train_samples = 50000\n",
|
|
||||||
"\n",
|
|
||||||
"x_train = np.empty((num_train_samples, 3, 32, 32), dtype=\"uint8\")\n",
|
|
||||||
"y_train = np.empty((num_train_samples,), dtype=\"uint8\")\n",
|
|
||||||
"\n",
|
|
||||||
"for i in range(1, 6):\n",
|
|
||||||
" file_path = os.path.join(\"cifar-10-batches-py\", f\"data_batch_{i}\")\n",
|
|
||||||
" (\n",
|
|
||||||
" x_train[(i - 1) * 10000 : i * 10000, :, :, :],\n",
|
|
||||||
" y_train[(i - 1) * 10000 : i * 10000],\n",
|
|
||||||
" ) = load_batch(file_path)\n",
|
|
||||||
"\n",
|
|
||||||
"file_path = os.path.join(\"cifar-10-batches-py\", \"test_batch\")\n",
|
|
||||||
"x_test, y_test = load_batch(file_path)\n",
|
|
||||||
"\n",
|
|
||||||
"y_train = np.reshape(y_train, (len(y_train), 1))\n",
|
|
||||||
"y_test = np.reshape(y_test, (len(y_test), 1))\n",
|
|
||||||
"\n",
|
|
||||||
"if backend.image_data_format() == \"channels_last\":\n",
|
|
||||||
" x_train = x_train.transpose(0, 2, 3, 1)\n",
|
|
||||||
" x_test = x_test.transpose(0, 2, 3, 1)\n",
|
|
||||||
"\n",
|
|
||||||
"x_test = x_test.astype(x_train.dtype)\n",
|
|
||||||
"y_test = y_test.astype(y_train.dtype)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Exploring 🔎"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"print(x_train.shape)\n",
|
|
||||||
"print(y_train.shape)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"`x_train` is the actual images in the dataset. You can see they are 32x32 and the 3 is for red, green and blue values.\n",
|
|
||||||
"`y_train` is the category for each image, this is just a single number between 0 and 9."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"x_train[1]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"plt.imshow(x_train[1])\n",
|
|
||||||
"print(y_train[1])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Processing 🫧\n",
|
|
||||||
"\n",
|
|
||||||
"Our neural network works with decimal numbers between 0 and 1, so we need to convert the categories into 0s and 1s. We take an array of 0s and set a 1 for the category.\n",
|
|
||||||
"\n",
|
|
||||||
"For example, the number 2 would get encoded to `[0, 0, 1, ...]`."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"y_train_one_hot = keras.src.utils.numerical_utils.to_categorical(y_train, 10)\n",
|
|
||||||
"y_test_one_hot = keras.src.utils.numerical_utils.to_categorical(y_test, 10)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# task: can you print out the one hot encoded label for the truck above?\n",
|
|
||||||
"print(y_train_one_hot[1])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"At the moment each pixel is represented by a number from 0 to 255. We also need to convert these to be between 0 and 1."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"x_train = x_train.astype(\"float32\")\n",
|
|
||||||
"x_test = x_test.astype(\"float32\")\n",
|
|
||||||
"x_train = x_train / 255\n",
|
|
||||||
"x_test = x_test / 255"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"x_train[0]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Build and Train CNN 🔨"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from keras.models import Sequential\n",
|
|
||||||
"from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D\n",
|
|
||||||
"\n",
|
|
||||||
"model = Sequential()\n",
|
|
||||||
"model.add(\n",
|
|
||||||
" Conv2D(32, (3, 3), activation=\"relu\", padding=\"same\", input_shape=(32, 32, 3))\n",
|
|
||||||
")\n",
|
|
||||||
"model.add(MaxPooling2D(pool_size=(2, 2)))\n",
|
|
||||||
"model.add(Dropout(0.25))\n",
|
|
||||||
"model.add(Conv2D(64, (3, 3), activation=\"relu\", padding=\"same\"))\n",
|
|
||||||
"model.add(MaxPooling2D(pool_size=(2, 2)))\n",
|
|
||||||
"model.add(Dropout(0.25))\n",
|
|
||||||
"model.add(Flatten())\n",
|
|
||||||
"model.add(Dense(512, activation=\"relu\"))\n",
|
|
||||||
"model.add(Dropout(0.5))\n",
|
|
||||||
"model.add(Dense(10, activation=\"softmax\"))\n",
|
|
||||||
"\n",
|
|
||||||
"model.summary()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"hist = model.fit(\n",
|
|
||||||
" x_train, y_train_one_hot, batch_size=32, epochs=1, validation_split=0.2\n",
|
|
||||||
")"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Evaluate 🧪"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"model.evaluate(x_test, y_test_one_hot)[1]"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"~50% accuracy... not great"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"### What about for something it's not been trained on?\n",
|
|
||||||
"\n",
|
|
||||||
"Let's try and feed a picture of a cat to the model, and see what it thinks... As a reminder, the model hasn't been trained on pictures of cats."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"cat = plt.imread(\"cat.jpg\")\n",
|
|
||||||
"cat_resized = resize(cat, (32, 32, 3))"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"plt.imshow(cat_resized)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"probabilities = model.predict(\n",
|
|
||||||
" np.array(\n",
|
|
||||||
" [\n",
|
|
||||||
" cat_resized,\n",
|
|
||||||
" ]\n",
|
|
||||||
" )\n",
|
|
||||||
")\n",
|
|
||||||
"probabilities"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"index = np.argsort(probabilities[0, :])\n",
|
|
||||||
"print(f\"Most likely: {classes[index[9]]}, probability={probabilities[0,index[9]]}\")"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"## Additional Challenges 🏆\n",
|
|
||||||
"\n",
|
|
||||||
"- Try adding in some more layers to the neural network, adding a second `Conv2D` layer under both of the existing ones.\n",
|
|
||||||
"- Try increasing the number of `epochs` when training.\n",
|
|
||||||
"- Save/load your model with `model.save('mymodel.h5')` and `keras.models.load_model('mymodel.h5')`."
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3 (ipykernel)",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python3"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.10.12"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 4
|
|
||||||
}
|
|
|
@ -1,446 +0,0 @@
|
||||||
{
|
|
||||||
"cells": [
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "njb_ProuHiOe"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"# Intro to Reinforcement Learning\n",
|
|
||||||
"\n",
|
|
||||||
"This notebook is modified from [Unit 1 of Hugging Face's Deep RL course](https://github.com/huggingface/deep-rl-class/blob/main/notebooks/unit1/unit1.ipynb), it has been simplified."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "wrgpVFqyENVf"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"## Import the packages 📦"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "cygWLPGsEQ0m"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"from stable_baselines3 import PPO\n",
|
|
||||||
"from stable_baselines3.common.env_util import make_vec_env\n",
|
|
||||||
"from stable_baselines3.common.evaluation import evaluate_policy\n",
|
|
||||||
"from stable_baselines3.common.monitor import Monitor"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "-TzNN0bQ_j-3"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"At each step:\n",
|
|
||||||
"- Our Agent receives a **state (S0)** from the **Environment** — we receive the first frame of our game (Environment).\n",
|
|
||||||
"- Based on that **state (S0),** the Agent takes an **action (A0)** — our Agent will move to the right.\n",
|
|
||||||
"- The environment transitions to a **new** **state (S1)** — new frame.\n",
|
|
||||||
"- The environment gives some **reward (R1)** to the Agent — we’re not dead *(Positive Reward +1)*.\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"With Gymnasium:\n",
|
|
||||||
"\n",
|
|
||||||
"1️⃣ We create our environment using `gymnasium.make()`\n",
|
|
||||||
"\n",
|
|
||||||
"2️⃣ We reset the environment to its initial state with `observation = env.reset()`\n",
|
|
||||||
"\n",
|
|
||||||
"At each step:\n",
|
|
||||||
"\n",
|
|
||||||
"3️⃣ Get an action using our model (in our example we take a random action)\n",
|
|
||||||
"\n",
|
|
||||||
"4️⃣ Using `env.step(action)`, we perform this action in the environment and get\n",
|
|
||||||
"- `observation`: The new state (st+1)\n",
|
|
||||||
"- `reward`: The reward we get after executing the action\n",
|
|
||||||
"- `terminated`: Indicates if the episode terminated (agent reach the terminal state)\n",
|
|
||||||
"- `truncated`: Introduced with this new version, it indicates a timelimit or if an agent go out of bounds of the environment for instance.\n",
|
|
||||||
"- `info`: A dictionary that provides additional information (depends on the environment).\n",
|
|
||||||
"\n",
|
|
||||||
"For more explanations check this 👉 https://gymnasium.farama.org/api/env/#gymnasium.Env.step\n",
|
|
||||||
"\n",
|
|
||||||
"If the episode is terminated:\n",
|
|
||||||
"- We reset the environment to its initial state with `observation = env.reset()`\n",
|
|
||||||
"\n",
|
|
||||||
"**Let's look at an example!** Make sure to read the code\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "w7vOFlpA_ONz"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"import gymnasium as gym\n",
|
|
||||||
"\n",
|
|
||||||
"# First, we create our environment called LunarLander-v2\n",
|
|
||||||
"env = gym.make(\"LunarLander-v2\")\n",
|
|
||||||
"\n",
|
|
||||||
"# Then we reset this environment\n",
|
|
||||||
"observation, info = env.reset()\n",
|
|
||||||
"\n",
|
|
||||||
"for _ in range(20):\n",
|
|
||||||
" # Take a random action\n",
|
|
||||||
" action = env.action_space.sample()\n",
|
|
||||||
" print(\"Action taken:\", action)\n",
|
|
||||||
"\n",
|
|
||||||
" # Do this action in the environment and get\n",
|
|
||||||
" # next_state, reward, terminated, truncated and info\n",
|
|
||||||
" observation, reward, terminated, truncated, info = env.step(action)\n",
|
|
||||||
"\n",
|
|
||||||
" # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
|
|
||||||
" if terminated or truncated:\n",
|
|
||||||
" # Reset the environment\n",
|
|
||||||
" print(\"Environment is reset\")\n",
|
|
||||||
" observation, info = env.reset()\n",
|
|
||||||
"\n",
|
|
||||||
"env.close()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "XIrKGGSlENZB"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"## Create the LunarLander environment 🌛 and understand how it works\n",
|
|
||||||
"\n",
|
|
||||||
"### [The environment 🎮](https://gymnasium.farama.org/environments/box2d/lunar_lander/)\n",
|
|
||||||
"\n",
|
|
||||||
"In this first tutorial, we’re going to train our agent, a [Lunar Lander](https://gymnasium.farama.org/environments/box2d/lunar_lander/), **to land correctly on the moon**. To do that, the agent needs to learn **to adapt its speed and position (horizontal, vertical, and angular) to land correctly.**\n",
|
|
||||||
"\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"💡 A good habit when you start to use an environment is to check its documentation\n",
|
|
||||||
"\n",
|
|
||||||
"👉 https://gymnasium.farama.org/environments/box2d/lunar_lander/\n",
|
|
||||||
"\n",
|
|
||||||
"---\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "poLBgRocF9aT"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"Let's see what the Environment looks like:\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "ZNPG0g_UGCfh"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# We create our environment with gym.make(\"<name_of_the_environment>\")\n",
|
|
||||||
"env = gym.make(\"LunarLander-v2\")\n",
|
|
||||||
"env.reset()\n",
|
|
||||||
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
|
|
||||||
"print(\"Observation Space Shape\", env.observation_space.shape)\n",
|
|
||||||
"print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "2MXc15qFE0M9"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"We see with `Observation Space Shape (8,)` that the observation is a vector of size 8, where each value contains different information about the lander:\n",
|
|
||||||
"- Horizontal pad coordinate (x)\n",
|
|
||||||
"- Vertical pad coordinate (y)\n",
|
|
||||||
"- Horizontal speed (x)\n",
|
|
||||||
"- Vertical speed (y)\n",
|
|
||||||
"- Angle\n",
|
|
||||||
"- Angular speed\n",
|
|
||||||
"- If the left leg contact point has touched the land (boolean)\n",
|
|
||||||
"- If the right leg contact point has touched the land (boolean)\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "We5WqOBGLoSm"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
|
|
||||||
"print(\"Action Space Shape\", env.action_space.n)\n",
|
|
||||||
"print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "MyxXwkI2Magx"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮:\n",
|
|
||||||
"\n",
|
|
||||||
"- Action 0: Do nothing,\n",
|
|
||||||
"- Action 1: Fire left orientation engine,\n",
|
|
||||||
"- Action 2: Fire the main engine,\n",
|
|
||||||
"- Action 3: Fire right orientation engine.\n",
|
|
||||||
"\n",
|
|
||||||
"Reward function (the function that will gives a reward at each timestep) 💰:\n",
|
|
||||||
"\n",
|
|
||||||
"After every step a reward is granted. The total reward of an episode is the **sum of the rewards for all the steps within that episode**.\n",
|
|
||||||
"\n",
|
|
||||||
"For each step, the reward:\n",
|
|
||||||
"\n",
|
|
||||||
"- Is increased/decreased the closer/further the lander is to the landing pad.\n",
|
|
||||||
"- Is increased/decreased the slower/faster the lander is moving.\n",
|
|
||||||
"- Is decreased the more the lander is tilted (angle not horizontal).\n",
|
|
||||||
"- Is increased by 10 points for each leg that is in contact with the ground.\n",
|
|
||||||
"- Is decreased by 0.03 points each frame a side engine is firing.\n",
|
|
||||||
"- Is decreased by 0.3 points each frame the main engine is firing.\n",
|
|
||||||
"\n",
|
|
||||||
"The episode receive an **additional reward of -100 or +100 points for crashing or landing safely respectively.**\n",
|
|
||||||
"\n",
|
|
||||||
"An episode is **considered a solution if it scores at least 200 points.**"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "dFD9RAFjG8aq"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"#### Vectorized Environment\n",
|
|
||||||
"\n",
|
|
||||||
"- We create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments, this way, **we'll have more diverse experiences during the training.**"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "99hqQ_etEy1N"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# Create the environment\n",
|
|
||||||
"env = make_vec_env(\"LunarLander-v2\", n_envs=16)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "VgrE86r5E5IK"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"## Create the Model 🤖\n",
|
|
||||||
"- We have studied our environment and we understood the problem: **being able to land the Lunar Lander to the Landing Pad correctly by controlling left, right and main orientation engine**. Now let's build the algorithm we're going to use to solve this Problem 🚀.\n",
|
|
||||||
"\n",
|
|
||||||
"- To do so, we're going to use our first Deep RL library, [Stable Baselines3 (SB3)](https://stable-baselines3.readthedocs.io/en/master/).\n",
|
|
||||||
"\n",
|
|
||||||
"- SB3 is a set of **reliable implementations of reinforcement learning algorithms in PyTorch**.\n",
|
|
||||||
"\n",
|
|
||||||
"---\n",
|
|
||||||
"\n",
|
|
||||||
"💡 A good habit when using a new library is to dive first on the documentation: https://stable-baselines3.readthedocs.io/en/master/ and then try some tutorials.\n",
|
|
||||||
"\n",
|
|
||||||
"----"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "HLlClRW37Q7e"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/sb3.png\" alt=\"Stable Baselines3\">"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "HV4yiUM_9_Ka"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"To solve this problem, we're going to use SB3 **PPO**. [PPO (aka Proximal Policy Optimization) is one of the SOTA (state of the art) Deep Reinforcement Learning algorithms that you'll study during this course](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#example%5D).\n",
|
|
||||||
"\n",
|
|
||||||
"PPO is a combination of:\n",
|
|
||||||
"- *Value-based reinforcement learning method*: learning an action-value function that will tell us the **most valuable action to take given a state and action**.\n",
|
|
||||||
"- *Policy-based reinforcement learning method*: learning a policy that will **give us a probability distribution over actions**."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "5qL_4HeIOrEJ"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"Stable-Baselines3 is easy to set up:\n",
|
|
||||||
"\n",
|
|
||||||
"1️⃣ You **create your environment** (in our case it was done above)\n",
|
|
||||||
"\n",
|
|
||||||
"2️⃣ You define the **model you want to use and instantiate this model** `model = PPO(\"MlpPolicy\")`\n",
|
|
||||||
"\n",
|
|
||||||
"3️⃣ You **train the agent** with `model.learn` and define the number of training timesteps\n",
|
|
||||||
"\n",
|
|
||||||
"```\n",
|
|
||||||
"# Create environment\n",
|
|
||||||
"env = gym.make('LunarLander-v2')\n",
|
|
||||||
"\n",
|
|
||||||
"# Instantiate the agent\n",
|
|
||||||
"model = PPO('MlpPolicy', env, verbose=1)\n",
|
|
||||||
"# Train the agent\n",
|
|
||||||
"model.learn(total_timesteps=int(2e5))\n",
|
|
||||||
"```\n",
|
|
||||||
"\n"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "nxI6hT1GE4-A"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n",
|
|
||||||
"# if we had frames as input we would use CnnPolicy\n",
|
|
||||||
"model = PPO(\n",
|
|
||||||
" policy=\"MlpPolicy\",\n",
|
|
||||||
" env=env,\n",
|
|
||||||
" n_steps=1024,\n",
|
|
||||||
" batch_size=64,\n",
|
|
||||||
" n_epochs=4,\n",
|
|
||||||
" gamma=0.999,\n",
|
|
||||||
" gae_lambda=0.98,\n",
|
|
||||||
" ent_coef=0.01,\n",
|
|
||||||
" verbose=1,\n",
|
|
||||||
")"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "ClJJk88yoBUi"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"## Train the PPO agent 🏃\n",
|
|
||||||
"- Let's train our agent for 1,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~20min, but you can use fewer timesteps if you just want to try it out.\n",
|
|
||||||
"- During the training, take a ☕ break you deserved it 🤗"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "qKnYkNiVp89p"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# TODO: Train it for 10,000 timesteps\n",
|
|
||||||
"model.learn(total_timesteps=5_000_000)\n",
|
|
||||||
"\n",
|
|
||||||
"# TODO: Specify file name for model and save the model to file\n",
|
|
||||||
"model_name = \"ppo-LunarLander-v2\"\n",
|
|
||||||
"model.save(model_name)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "BY_HuedOoISR"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"## Evaluate the agent 📈\n",
|
|
||||||
"- Remember to wrap the environment in a [Monitor](https://stable-baselines3.readthedocs.io/en/master/common/monitor.html).\n",
|
|
||||||
"- Now that our Lunar Lander agent is trained 🚀, we need to **check its performance**.\n",
|
|
||||||
"- Stable-Baselines3 provides a method to do that: `evaluate_policy`.\n",
|
|
||||||
"- To fill that part you need to [check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#basic-usage-training-saving-loading)\n",
|
|
||||||
"- In the next step, we'll see **how to automatically evaluate and share your agent to compete in a leaderboard, but for now let's do it ourselves**\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
|
||||||
"💡 When you evaluate your agent, you should not use your training environment but create an evaluation environment."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {
|
|
||||||
"id": "yRpno0glsADy"
|
|
||||||
},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"# TODO: Evaluate the agent\n",
|
|
||||||
"# Create a new environment for evaluation\n",
|
|
||||||
"eval_env = Monitor(gym.make(\"LunarLander-v2\"))\n",
|
|
||||||
"\n",
|
|
||||||
"# Evaluate the model with 10 evaluation episodes and deterministic=True\n",
|
|
||||||
"mean_reward, std_reward = evaluate_policy(\n",
|
|
||||||
" model, eval_env, n_eval_episodes=10, deterministic=True\n",
|
|
||||||
")\n",
|
|
||||||
"\n",
|
|
||||||
"# Print the results\n",
|
|
||||||
"print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "reBhoODwcXfr"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"- In my case, I got a mean reward is `200.20 +/- 20.80` after training for 1 million steps, which means that our lunar lander agent is ready to land on the moon 🌛🥳."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {
|
|
||||||
"id": "BQAwLnYFPk-s"
|
|
||||||
},
|
|
||||||
"source": [
|
|
||||||
"## Some additional challenges 🏆\n",
|
|
||||||
"The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!\n",
|
|
||||||
"\n",
|
|
||||||
"Can you beat your neighbour's mean reward?\n",
|
|
||||||
"\n",
|
|
||||||
"Here are some ideas to achieve so:\n",
|
|
||||||
"* Train more steps\n",
|
|
||||||
"* Try different hyperparameters for `PPO`. You can see them at https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#parameters.\n",
|
|
||||||
"* Check the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) and try another model such as DQN.\n",
|
|
||||||
"\n",
|
|
||||||
"Is moon landing too boring for you? Try to **change the environment**, why not use MountainCar-v0, CartPole-v1 or CarRacing-v0? Check how they work [using the gym documentation](https://www.gymlibrary.dev/) and have fun 🎉."
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"kernelspec": {
|
|
||||||
"display_name": "Python 3 (ipykernel)",
|
|
||||||
"language": "python",
|
|
||||||
"name": "python3"
|
|
||||||
},
|
|
||||||
"language_info": {
|
|
||||||
"codemirror_mode": {
|
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.9.19"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 0
|
|
||||||
}
|
|
Loading…
Reference in a new issue