{ "cells": [ { "cell_type": "markdown", "id": "84957ddd", "metadata": {}, "source": [ "# Neural Networks\n", "\n", "In this notebook we will learn to build and train an MLP in Pytorch" ] }, { "cell_type": "code", "execution_count": null, "id": "8b1ca0d9", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "import torch.utils.data as data\n", "\n", "import torchvision.transforms as transforms\n", "import torchvision.datasets as datasets\n", "\n", "from sklearn import metrics\n", "from sklearn import decomposition\n", "from sklearn import manifold\n", "from tqdm.notebook import trange, tqdm\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "import copy\n", "import random\n", "import time" ] }, { "cell_type": "markdown", "id": "103993e4", "metadata": {}, "source": [ "## Ensure reproducibility\n", "\n", "We need to set a random seed to ensure consistent results" ] }, { "cell_type": "code", "execution_count": null, "id": "98f8f004", "metadata": {}, "outputs": [], "source": [ "SEED = 1234\n", "\n", "random.seed(SEED)\n", "np.random.seed(SEED)\n", "torch.manual_seed(SEED)\n", "torch.cuda.manual_seed(SEED)\n", "torch.backends.cudnn.deterministic = True" ] }, { "cell_type": "markdown", "id": "5a998804", "metadata": {}, "source": [ "## Get the data\n", "\n", "Download the dataset - we will use `FahionMNIST`" ] }, { "cell_type": "code", "execution_count": null, "id": "7c501c3c", "metadata": {}, "outputs": [], "source": [ "ROOT = '.data'\n", "\n", "train_data = datasets.FashionMNIST(root=ROOT,\n", " train=True,\n", " download=True)" ] }, { "cell_type": "markdown", "id": "269fca16", "metadata": {}, "source": [ "## Normalize the data\n", "\n", "Recall that normalising data is important to avoid spurious biases. Here we divide the pixel vlaues by 255 to get a range of 0 - 1. We will use a `transform` to do this. A `transform` is an object that applies modifications to data as it is loaded and fed to the network for training.\n", "\n", "We set up a `train_transform` and a `test_transform` for the different datasets. We then set up the data objects `train_data` and `test_data`." ] }, { "cell_type": "code", "execution_count": null, "id": "7a33adb8", "metadata": {}, "outputs": [], "source": [ "mean = train_data.data.float().mean() / 255\n", "std = train_data.data.float().std() / 255\n", "\n", "train_transforms = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[mean], std=[std])\n", " ])\n", "\n", "test_transforms = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[mean], std=[std])\n", " ])\n", "\n", "train_data = datasets.FashionMNIST(root=ROOT,\n", " train=True,\n", " download=True,\n", " transform=train_transforms)\n", "\n", "test_data = datasets.FashionMNIST(root=ROOT,\n", " train=False,\n", " download=True,\n", " transform=test_transforms)" ] }, { "cell_type": "markdown", "id": "b27123ed", "metadata": {}, "source": [ "## Take a look at the data\n", "\n", "Its always a good idea to look a bit at the data. Here is a helper function to plot a set of the data." ] }, { "cell_type": "code", "execution_count": null, "id": "80119c21", "metadata": {}, "outputs": [], "source": [ "def plot_images(images):\n", "\n", " n_images = len(images)\n", "\n", " rows = int(np.sqrt(n_images))\n", " cols = int(np.sqrt(n_images))\n", "\n", " fig = plt.figure()\n", " for i in range(rows*cols):\n", " ax = fig.add_subplot(rows, cols, i+1)\n", " ax.imshow(images[i].view(28, 28).cpu().numpy(), cmap='bone')\n", " ax.axis('off')" ] }, { "cell_type": "code", "execution_count": null, "id": "8df26e87", "metadata": {}, "outputs": [], "source": [ "N_IMAGES = 25\n", "\n", "images = [image for image, label in [train_data[i] for i in range(N_IMAGES)]]\n", "\n", "plot_images(images)" ] }, { "cell_type": "markdown", "id": "221f3054", "metadata": {}, "source": [ "## Now get a validation set\n", "\n", "We further split up the training data into training and validation sets.\n", "Recall that the validation set is different from the test set. The validation set can be used to select hyperparameters, so is strictly part of the model selection process." ] }, { "cell_type": "code", "execution_count": null, "id": "ef4ff5bb", "metadata": {}, "outputs": [], "source": [ "VALID_RATIO = 0.9\n", "\n", "n_train_examples = int(len(train_data) * VALID_RATIO)\n", "n_valid_examples = len(train_data) - n_train_examples\n", "\n", "train_data, valid_data = data.random_split(train_data,\n", " [n_train_examples, n_valid_examples])\n", "\n", "print(f'Number of training examples: {len(train_data)}')\n", "print(f'Number of validation examples: {len(valid_data)}')\n", "print(f'Number of testing examples: {len(test_data)}')" ] }, { "cell_type": "markdown", "id": "d7703c9f", "metadata": {}, "source": [ "## 1. Build the network architecture\n", "\n", "Our first DNN will be a simple multi-layer perceptron with only one hidden layer, as shown in the following figure:\n", "\n", "\n", "![dense.jpeg](https://github.com/stfc-sciml/sciml-workshop/blob/master/course_3.0_with_solutions/markdown_pic/dnn.png?raw=1)\n", "\n", "\n", "In general, a network of this kind should include an input layer, some hidden layers and an output layer. In this example, all the layers will be `Dense` layers.\n", "\n", "\n", "### The input layer\n", "\n", "We first need to determine the dimensionality of the input layer. In this case, we flatten (using a `Flatten` layer) the images and feed them to the network. As the images are 28 $\\times$ 28 in pixels, the input size will be 784.\n", "\n", "\n", "### The hidden layers\n", "\n", "We use one hidden layer in this case and use `ReLU` as its activation function:\n", "\n", "> $R(x)=\\max(0,x)$\n", "\n", "**NOTE**: Different activation functions are used for different tasks. Remember that `ReLU` generally performs well for training a network, but it can *only* be used in the hidden layers.\n", "\n", "\n", "### The output layer\n", "\n", "We usually encode categorical data as a \"one-hot\" vector. In this case, we have a vector of length 10 on the output side, where each element corresponds to a class of apparel. Ideally, we hope the values to be either 1 or 0, with 1 for the correct class and 0 for the others, so we use `sigmoid` as the activation function for the output layer:\n", "\n", "> $S(x) = \\dfrac{1}{1 + e^{-x}}$\n", "\n", "**Note**: PyTorch combines activation functions to be applied on the output with the functions which calculate the loss, also known as error or cost, of a neural network. This is done for numerical stability. So we do not explicitly declare the `Sigmoid` functions, rather this is done by the loss function later." ] }, { "cell_type": "markdown", "id": "099b28c6", "metadata": {}, "source": [ "## Set up the network\n", "\n", "In `pytorch` we build networks as a class. The example below is the minimal format for setting up a network in `pytorch`\n", "\n", "* Declare the class - it should be a subclass of the `nn.Module` class from `pytorch`\n", "* Define what inputs it takes upon declaration - in this case `input_dim` and `output_dim`\n", "* `super` makes sure it inherits attributes from `nn.Module`\n", "* We then define the different types of layers that we will use in this case three different linear layers\n", "* Then we define a method `forward` which is what gets called when data is passed through the network, this basically moves the data `x` through the layers\n", "\n", "```python\n", "class MLP(nn.Module):\n", " def __init__(self, input_dim, output_dim):\n", " super().__init__()\n", "\n", " self.input_fc = nn.Linear(input_dim, 250) # Here we set the first hidden layer to have 250 neurons\n", " self.hidden_fc = nn.Linear(250, 100) # The second hidden layer has 100 neurons\n", " self.output_fc = nn.Linear(100, output_dim)# The output layer size is declared when setting up the network\n", "\n", " def forward(self, x): \n", "\n", " batch_size = x.shape[0]\n", " x = x.view(batch_size, -1)\n", " h_1 = F.relu(self.input_fc(x)) # First pass through the Linear neuron, then pass through a ReLU\n", " h_2 = F.relu(self.hidden_fc(h_1)) # First pass through the Linear neuron, then pass through a ReLU\n", " y_pred = self.output_fc(h_2) # First pass through the Linear neuron, don't declare activation for the output layer, that is implicit in the loss function\n", "\n", " return y_pred\n", "```\n", "\n", "**Note** we did not set an activation function for the final layer - actually `PyTorch` does this automatically, based on the loss function that you choose." ] }, { "cell_type": "code", "execution_count": null, "id": "ca8c7f59", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "51a8543a", "metadata": {}, "source": [ "Now use this class to build a network" ] }, { "cell_type": "code", "execution_count": null, "id": "9f5cdf9b", "metadata": {}, "outputs": [], "source": [ "INPUT_DIM = 28 * 28\n", "OUTPUT_DIM = 10\n", "\n", "model = MLP(INPUT_DIM, OUTPUT_DIM)" ] }, { "cell_type": "markdown", "id": "a99b7972", "metadata": {}, "source": [ "## Training the Model\n", "\n", "Next, we'll define our optimizer. This is the algorithm we will use to update the parameters of our model with respect to the loss calculated on the data.\n", "\n", "We aren't going to go into too much detail on how neural networks are trained (see [this](http://neuralnetworksanddeeplearning.com/) article if you want to know how) but the gist is:\n", "- pass a batch of data through your model\n", "- calculate the loss of your batch by comparing your model's predictions against the actual labels\n", "- calculate the gradient of each of your parameters with respect to the loss\n", "- update each of your parameters by subtracting their gradient multiplied by a small *learning rate* parameter\n", "\n", "We use the *Adam* algorithm with the default parameters to update our model. Improved results could be obtained by searching over different optimizers and learning rates, however default Adam is usually a good starting off point. Check out [this](https://ruder.io/optimizing-gradient-descent/) article if you want to learn more about the different optimization algorithms commonly used for neural networks.\n", "\n", "Then, we define a *criterion*, PyTorch's name for a loss/cost/error function. This function will take in your model's predictions with the actual labels and then compute the loss/cost/error of your model with its current parameters.\n", "\n", "`CrossEntropyLoss` both computes the *softmax* activation function on the supplied predictions as well as the actual loss via *negative log likelihood*. \n", "\n", "Briefly, the softmax function is:\n", "\n", "$$\\text{softmax }(\\mathbf{x}) = \\frac{e^{x_i}}{\\sum_j e^{x_j}}$$ \n", "\n", "This turns out 10 dimensional output, where each element is an unbounded real number, into a probability distribution over 10 elements. That is, all values are between 0 and 1, and together they all sum to 1. \n", "\n", "Why do we turn things into a probability distribution? So we can use negative log likelihood for our loss function, as it expects probabilities. PyTorch calculates negative log likelihood for a single example via:\n", "\n", "$$\\text{negative log likelihood }(\\mathbf{\\hat{y}}, y) = -\\log \\big( \\text{softmax}(\\mathbf{\\hat{y}})[y] \\big)$$\n", "\n", "$\\mathbf{\\hat{y}}$ is the $\\mathbb{R}^{10}$ output, from our neural network, whereas $y$ is the label, an integer representing the class. The loss is the negative log of the class index of the softmax. For example:\n", "\n", "$$\\mathbf{\\hat{y}} = [5,1,1,1,1,1,1,1,1,1]$$\n", "\n", "$$\\text{softmax }(\\mathbf{\\hat{y}}) = [0.8585, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157]$$\n", "\n", "If the label was class zero, the loss would be:\n", "\n", "$$\\text{negative log likelihood }(\\mathbf{\\hat{y}}, 0) = - \\log(0.8585) = 0.153 \\dots$$\n", "\n", "If the label was class five, the loss would be:\n", "\n", "$$\\text{negative log likelihood }(\\mathbf{\\hat{y}}, 5) = - \\log(0.0157) = 4.154 \\dots$$\n", "\n", "So, intuitively, as your model's output corresponding to the correct class index increases, your loss decreases." ] }, { "cell_type": "code", "execution_count": null, "id": "a66d8a0b", "metadata": {}, "outputs": [], "source": [ "optimizer = optim.Adam(model.parameters())\n", "criterion = nn.CrossEntropyLoss()" ] }, { "cell_type": "markdown", "id": "644212a7", "metadata": {}, "source": [ "## Look for GPUs\n", "\n", "In toorch the code automatically defaults to run on cpu. You can check for avialible gpus, then move all of the code across to GPU if you like." ] }, { "cell_type": "code", "execution_count": null, "id": "259c8658", "metadata": {}, "outputs": [], "source": [ "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "model = model.to(device)\n", "criterion = criterion.to(device)" ] }, { "cell_type": "code", "execution_count": null, "id": "0575156e", "metadata": {}, "outputs": [], "source": [ "def calculate_accuracy(y_pred, y):\n", " top_pred = y_pred.argmax(1, keepdim=True)\n", " correct = top_pred.eq(y.view_as(top_pred)).sum()\n", " acc = correct.float() / y.shape[0]\n", " return acc" ] }, { "cell_type": "markdown", "id": "d655752a", "metadata": {}, "source": [ "## Set up the batches\n", "\n", "We will do mini-batch gradient descent with Adam. So we can set up the batch sizes " ] }, { "cell_type": "code", "execution_count": null, "id": "ad8760e9", "metadata": {}, "outputs": [], "source": [ "BATCH_SIZE = 64\n", "\n", "train_iterator = data.DataLoader(train_data,\n", " shuffle=True,\n", " batch_size=BATCH_SIZE)\n", "\n", "valid_iterator = data.DataLoader(valid_data,\n", " batch_size=BATCH_SIZE)\n", "\n", "test_iterator = data.DataLoader(test_data,\n", " batch_size=BATCH_SIZE)" ] }, { "cell_type": "markdown", "id": "d53e1335", "metadata": {}, "source": [ "## Define a training loop\n", "\n", "This will:\n", "\n", " put our model into train mode\n", " iterate over our dataloader, returning batches of (image, label)\n", " place the batch on to our GPU, if we have one\n", " clear the gradients calculated from the last batch\n", " pass our batch of images, x, through to model to get predictions, y_pred\n", " calculate the loss between our predictions and the actual labels\n", " calculate the accuracy between our predictions and the actual labels\n", " calculate the gradients of each parameter\n", " update the parameters by taking an optimizer step\n", " update our metrics\n", "\n", "Some layers act differently when training and evaluating the model that contains them, hence why we must tell our model we are in \"training\" mode. The model we are using here does not use any of those layers, however it is good practice to get used to putting your model in training mode.\n", "\n", "```python\n", "def train(model, iterator, optimizer, criterion, device):\n", "\n", " epoch_loss = 0\n", " epoch_acc = 0\n", "\n", " model.train()\n", "\n", " for (x, y) in tqdm(iterator, desc=\"Training\", leave=False):\n", "\n", " x = x.to(device) # Move the data to the device where you want to compute\n", " y = y.to(device)\n", "\n", " optimizer.zero_grad() # Initialise the optimiser\n", "\n", " y_pred = model(x) # Obtain initial predictions\n", "\n", " loss = criterion(y_pred, y) # Calculate the loss\n", "\n", " acc = calculate_accuracy(y_pred, y)\n", "\n", " loss.backward() # Backprop the loss to update the weights\n", "\n", " optimizer.step()\n", "\n", " epoch_loss += loss.item()\n", " epoch_acc += acc.item()\n", "\n", " return epoch_loss / len(iterator), epoch_acc / len(iterator)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "036639b1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bf601a6b", "metadata": {}, "source": [ "## Setr up an evaluation loop\n", "\n", "This is very similar to the training loop, except that we do not pass the gradients back to updated the weights. " ] }, { "cell_type": "code", "execution_count": null, "id": "963eab9e", "metadata": {}, "outputs": [], "source": [ "def evaluate(model, iterator, criterion, device):\n", "\n", " epoch_loss = 0\n", " epoch_acc = 0\n", "\n", " model.eval()\n", "\n", " with torch.no_grad():\n", "\n", " for (x, y) in tqdm(iterator, desc=\"Evaluating\", leave=False):\n", "\n", " x = x.to(device)\n", " y = y.to(device)\n", "\n", " y_pred = model(x)\n", "\n", " loss = criterion(y_pred, y)\n", "\n", " acc = calculate_accuracy(y_pred, y)\n", "\n", " epoch_loss += loss.item()\n", " epoch_acc += acc.item()\n", "\n", " return epoch_loss / len(iterator), epoch_acc / len(iterator)" ] }, { "cell_type": "code", "execution_count": null, "id": "8de2a6bb", "metadata": {}, "outputs": [], "source": [ "def epoch_time(start_time, end_time):\n", " elapsed_time = end_time - start_time\n", " elapsed_mins = int(elapsed_time / 60)\n", " elapsed_secs = int(elapsed_time - (elapsed_mins * 60))\n", " return elapsed_mins, elapsed_secs" ] }, { "cell_type": "markdown", "id": "3a32f301", "metadata": {}, "source": [ "## Run the training\n", "\n", "Here we will train for 10 epochs.\n", "At the end of each epoch we check the validation loss, if it is better than the previous best, then we save the model" ] }, { "cell_type": "code", "execution_count": null, "id": "f317bd45", "metadata": {}, "outputs": [], "source": [ "EPOCHS = 10\n", "\n", "best_valid_loss = float('inf')\n", "history = []\n", "\n", "for epoch in trange(EPOCHS):\n", "\n", " start_time = time.monotonic()\n", "\n", " train_loss, train_acc = train(model, train_iterator, optimizer, criterion, device)\n", " valid_loss, valid_acc = evaluate(model, valid_iterator, criterion, device)\n", "\n", " if valid_loss < best_valid_loss:\n", " best_valid_loss = valid_loss\n", " torch.save(model.state_dict(), 'mlp-model.pt')\n", "\n", " end_time = time.monotonic()\n", "\n", " epoch_mins, epoch_secs = epoch_time(start_time, end_time)\n", " history.append({'epoch': epoch, 'epoch_time': epoch_time, \n", " 'valid_acc': valid_acc, 'train_acc': train_acc})\n", "\n", " print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')\n", " print(f'\\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')\n", " print(f'\\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}%')" ] }, { "cell_type": "markdown", "id": "80a70cee", "metadata": {}, "source": [ "## Plot the model training" ] }, { "cell_type": "code", "execution_count": null, "id": "e4696b74", "metadata": {}, "outputs": [], "source": [ "epochs = [x[\"epoch\"] for x in history]\n", "train_loss = [x[\"train_acc\"] for x in history]\n", "valid_loss = [x[\"valid_acc\"] for x in history]\n", "\n", "fig, ax = plt.subplots()\n", "ax.plot(epochs, train_loss, label=\"train\")\n", "ax.plot(epochs, valid_loss, label=\"valid\")\n", "ax.set(xlabel=\"Epoch\", ylabel=\"Acc.\")\n", "plt.legend()" ] }, { "cell_type": "markdown", "id": "1866af9d", "metadata": {}, "source": [ "## Try on the test set\n", "\n", "Now we can try it on the test set" ] }, { "cell_type": "code", "execution_count": null, "id": "82c39c6f", "metadata": {}, "outputs": [], "source": [ "model.load_state_dict(torch.load('mlp-model.pt'))\n", "\n", "test_loss, test_acc = evaluate(model, test_iterator, criterion, device)\n", "print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')" ] }, { "cell_type": "markdown", "id": "7b543a39", "metadata": {}, "source": [ "# Exercises\n", "\n", "1. Add rotations to the test data and see how well the model performs. Compare this to the CNNs with rotations added.\n", "You add rotations in the transforms, so declare a new transform and a new test data set:\n", "\n", "```python\n", "\n", "roated_test_transforms = transforms.Compose([\n", " transforms.RandomRotation(25, fill=(0,)),\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[mean], std=[std])\n", " ])\n", "\n", "rotated_test_data = datasets.FashionMNIST(root=ROOT,\n", " train=False,\n", " download=True,\n", " transform=roated_test_transforms)\n", "\n", "rotated_test_iterator = data.DataLoader(rotated_test_data,\n", " batch_size=BATCH_SIZE)\n", "\n", "```\n", "\n", "2. Add some regularisation using dropout:\n", "Dropout is defined like this:\n", "```\n", " # Define proportion or neurons to dropout\n", " self.dropout = nn.Dropout(0.25)\n", "```\n", "And then you add it between layers, before passing through the activation function. Add the dropout between the two hidden layers.\n", " \n", "**Suggested Answer** - if you are having trouble, you can look at the [hints notebook](solutions/hints.ipynb) for a suggestion.\n", "\n", "\n", "3. Try some simple hyperparameter tuning, change the number of neurons in the hidden layer to 64 and 256. Does it make a difference?\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ce6577c9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9f04b7f7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "29889db7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }