{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "adea866f",
   "metadata": {},
   "source": [
    "# Convolutional Neural Networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8629fe2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torch.optim as optim\n",
    "import torch.utils.data as data\n",
    "\n",
    "import torchvision.transforms as transforms\n",
    "import torchvision.datasets as datasets\n",
    "\n",
    "from sklearn import metrics\n",
    "from sklearn import decomposition\n",
    "from sklearn import manifold\n",
    "from tqdm.notebook import trange, tqdm\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "import copy\n",
    "import random\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "989678b3",
   "metadata": {},
   "source": [
    "We need to set a random seed to ensure consistent results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c09a6aca",
   "metadata": {},
   "outputs": [],
   "source": [
    "SEED = 1234\n",
    "\n",
    "random.seed(SEED)\n",
    "np.random.seed(SEED)\n",
    "torch.manual_seed(SEED)\n",
    "torch.cuda.manual_seed(SEED)\n",
    "torch.backends.cudnn.deterministic = True"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "914530df",
   "metadata": {},
   "source": [
    "Download the dataset - we will use `FashionMNIST`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a48e15b",
   "metadata": {},
   "outputs": [],
   "source": [
    "ROOT = '.data'\n",
    "\n",
    "train_data = datasets.FashionMNIST(root=ROOT,\n",
    "                            train=True,\n",
    "                            download=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deede458",
   "metadata": {},
   "source": [
    "## Normalize the data\n",
    "\n",
    "<Descriptions>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16c66e48",
   "metadata": {},
   "outputs": [],
   "source": [
    "mean = train_data.data.float().mean() / 255\n",
    "std = train_data.data.float().std() / 255\n",
    "\n",
    "train_transforms = transforms.Compose([\n",
    "                            transforms.ToTensor(),\n",
    "                            transforms.Normalize(mean=[mean], std=[std])\n",
    "                                      ])\n",
    "\n",
    "test_transforms = transforms.Compose([\n",
    "                           transforms.ToTensor(),\n",
    "                           transforms.Normalize(mean=[mean], std=[std])\n",
    "                                      ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "699a59e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data = datasets.FashionMNIST(root=ROOT,\n",
    "                            train=True,\n",
    "                            download=True,\n",
    "                            transform=train_transforms)\n",
    "\n",
    "test_data = datasets.FashionMNIST(root=ROOT,\n",
    "                           train=False,\n",
    "                           download=True,\n",
    "                           transform=test_transforms)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d0974b7",
   "metadata": {},
   "source": [
    "## Take a look at the data\n",
    "\n",
    "Its always a good idea to look a bit at the data. Here is a helper function to plot a set of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a68b2c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_images(images):\n",
    "\n",
    "    n_images = len(images)\n",
    "\n",
    "    rows = int(np.sqrt(n_images))\n",
    "    cols = int(np.sqrt(n_images))\n",
    "\n",
    "    fig = plt.figure()\n",
    "    for i in range(rows*cols):\n",
    "        ax = fig.add_subplot(rows, cols, i+1)\n",
    "        ax.imshow(images[i].view(28, 28).cpu().numpy(), cmap='bone')\n",
    "        ax.axis('off')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "471aa05b",
   "metadata": {},
   "outputs": [],
   "source": [
    "N_IMAGES = 25\n",
    "\n",
    "images = [image for image, label in [train_data[i] for i in range(N_IMAGES)]]\n",
    "\n",
    "plot_images(images)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e64357e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "VALID_RATIO = 0.9\n",
    "\n",
    "n_train_examples = int(len(train_data) * VALID_RATIO)\n",
    "n_valid_examples = len(train_data) - n_train_examples\n",
    "\n",
    "train_data, valid_data = data.random_split(train_data,\n",
    "                                           [n_train_examples, n_valid_examples])\n",
    "\n",
    "print(f'Number of training examples: {len(train_data)}')\n",
    "print(f'Number of validation examples: {len(valid_data)}')\n",
    "print(f'Number of testing examples: {len(test_data)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2f5c745",
   "metadata": {},
   "source": [
    "## 1. Build the network architecture\n",
    "\n",
    "Our CNN contains two convolutional layers, each followed by a max-pooling layer and a batch-normalisation layer, and then a dense layer with dropout and finally the output layer. The network architecture is shown in the following figure ([figure source](https://mc.ai/the-convolution-parameters-calculation/)):\n",
    "\n",
    "<img src=\"https://github.com/stfc-sciml/sciml-workshop/blob/master/course_3.0_with_solutions/markdown_pic/layer.jpeg?raw=1\">\n",
    "\n",
    "\n",
    "We have understood how a convolutional filter works. A convolutional layer is simply a collection of convolutional filters whose kernel values form the trainable parameters. The most frequently used kernel sizes are 3$\\times$3, 5$\\times$5 and 7$\\times$7. The number of filters in each layer is a key network parameter governing the model size. \n",
    "\n",
    "The *max-pooling* layers are aimed for dimensionality reduction, containing no trainable parameters. It can be easily understood with the following illustration ([figure source](https://medium.com/ai-in-plain-english/pooling-layer-beginner-to-intermediate-fa0dbdce80eb)). The most common size for max pooling is 2$\\times$2.\n",
    "\n",
    "<img src=\"https://github.com/stfc-sciml/sciml-workshop/blob/master/course_3.0_with_solutions/markdown_pic/maxpooling.png?raw=1\" width=\"50%\">\n",
    "\n",
    "The *batch-normalisation* layers can help the CNN to converge faster and become more stable through normalisation of the input layer by re-centering and re-scaling."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2facbac7",
   "metadata": {},
   "source": [
    "## Set up the network\n",
    "\n",
    "In `pytorch` we build networks as a class. The example below is the minimal format for setting up a network in `pytorch`\n",
    "\n",
    "* Declare the class - it should be a subclass of the `nn.Module` class from `pytorch`\n",
    "* Define what inputs it takes upon declaration - in this case `input_dim` and `output_dim`\n",
    "* `super` makes sure it inherits attributes from `nn.Module`\n",
    "* We then define the different types of layers that we will use in this case three different linear layers\n",
    "* Then we define a method `forward` which is what gets called when data is passed through the network, this basically moves the data `x` through the layers\n",
    "\n",
    "Below we start the class, now complete the `forward` function, we provide the first command, after this you should:\n",
    "* perform a `maxpool` using a $2\\times2$ filter\n",
    "* pass through ReLU\n",
    "* apply the second convolutional filter\n",
    "* perform a `maxpool` using a $2\\times2$ filter\n",
    "* pass through ReLU\n",
    "* reshape, using: `x = x.view(x.shape[0], -1)`\n",
    "* pass through the first fully connected layer\n",
    "* pass through `ReLU`\n",
    "* pass through the second fully connected layer\n",
    "* pass through `ReLU`\n",
    "* pass through the third fully connected layer\n",
    "\n",
    "\n",
    "**Suggested Answer** - if you are having trouble, you can look at the [hints notebook](solutions/hints.ipynb) for a suggestion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b030290",
   "metadata": {},
   "outputs": [],
   "source": [
    "class LeNet(nn.Module):\n",
    "    def __init__(self, output_dim):\n",
    "        super().__init__()\n",
    "\n",
    "        self.conv1 = nn.Conv2d(in_channels=1,\n",
    "                               out_channels=6,\n",
    "                               kernel_size=5)\n",
    "\n",
    "        self.conv2 = nn.Conv2d(in_channels=6,\n",
    "                               out_channels=16,\n",
    "                               kernel_size=5)\n",
    "\n",
    "        self.fc_1 = nn.Linear(16 * 4 * 4, 120)  # 16 = number of out_channels, 4x4 =size of resulting outputs\n",
    "        self.fc_2 = nn.Linear(120, 84)\n",
    "        self.fc_3 = nn.Linear(84, output_dim)\n",
    "\n",
    "    def forward(self, x):\n",
    "\n",
    "        x = self.conv1(x)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be9d5f12",
   "metadata": {},
   "source": [
    "Now use this class to build a network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39d7b29b",
   "metadata": {},
   "outputs": [],
   "source": [
    "OUTPUT_DIM = 10\n",
    "\n",
    "model = LeNet(OUTPUT_DIM)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d281062",
   "metadata": {},
   "source": [
    "## Training the Model\n",
    "\n",
    "Next, we'll define our optimizer. This is the algorithm we will use to update the parameters of our model with respect to the loss calculated on the data.\n",
    "\n",
    "We aren't going to go into too much detail on how neural networks are trained (see [this](http://neuralnetworksanddeeplearning.com/) article if you want to know how) but the gist is:\n",
    "- pass a batch of data through your model\n",
    "- calculate the loss of your batch by comparing your model's predictions against the actual labels\n",
    "- calculate the gradient of each of your parameters with respect to the loss\n",
    "- update each of your parameters by subtracting their gradient multiplied by a small *learning rate* parameter\n",
    "\n",
    "We use the *Adam* algorithm with the default parameters to update our model. Improved results could be obtained by searching over different optimizers and learning rates, however default Adam is usually a good starting off point. Check out [this](https://ruder.io/optimizing-gradient-descent/) article if you want to learn more about the different optimization algorithms commonly used for neural networks.\n",
    "\n",
    "Then, we define a *criterion*, PyTorch's name for a loss/cost/error function. This function will take in your model's predictions with the actual labels and then compute the loss/cost/error of your model with its current parameters.\n",
    "\n",
    "`CrossEntropyLoss` both computes the *softmax* activation function on the supplied predictions as well as the actual loss via *negative log likelihood*. \n",
    "\n",
    "Briefly, the softmax function is:\n",
    "\n",
    "$$\\text{softmax }(\\mathbf{x}) = \\frac{e^{x_i}}{\\sum_j e^{x_j}}$$ \n",
    "\n",
    "This turns out 10 dimensional output, where each element is an unbounded real number, into a probability distribution over 10 elements. That is, all values are between 0 and 1, and together they all sum to 1. \n",
    "\n",
    "Why do we turn things into a probability distribution? So we can use negative log likelihood for our loss function, as it expects probabilities. PyTorch calculates negative log likelihood for a single example via:\n",
    "\n",
    "$$\\text{negative log likelihood }(\\mathbf{\\hat{y}}, y) = -\\log \\big( \\text{softmax}(\\mathbf{\\hat{y}})[y] \\big)$$\n",
    "\n",
    "$\\mathbf{\\hat{y}}$ is the $\\mathbb{R}^{10}$ output, from our neural network, whereas $y$ is the label, an integer representing the class. The loss is the negative log of the class index of the softmax. For example:\n",
    "\n",
    "$$\\mathbf{\\hat{y}} = [5,1,1,1,1,1,1,1,1,1]$$\n",
    "\n",
    "$$\\text{softmax }(\\mathbf{\\hat{y}}) = [0.8585, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157, 0.0157]$$\n",
    "\n",
    "If the label was class zero, the loss would be:\n",
    "\n",
    "$$\\text{negative log likelihood }(\\mathbf{\\hat{y}}, 0) = - \\log(0.8585) = 0.153 \\dots$$\n",
    "\n",
    "If the label was class five, the loss would be:\n",
    "\n",
    "$$\\text{negative log likelihood }(\\mathbf{\\hat{y}}, 5) = - \\log(0.0157) = 4.154 \\dots$$\n",
    "\n",
    "So, intuitively, as your model's output corresponding to the correct class index increases, your loss decreases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ad142d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = optim.Adam(model.parameters())\n",
    "criterion = nn.CrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16c56eed",
   "metadata": {},
   "source": [
    "## Look for GPUs\n",
    "\n",
    "In toorch the code automatically defaults to run on cpu. You can check for avialible gpus, then move all of the code across to GPU if you like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09bb3d97",
   "metadata": {},
   "outputs": [],
   "source": [
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "model = model.to(device)\n",
    "criterion = criterion.to(device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c925838",
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_accuracy(y_pred, y):\n",
    "    top_pred = y_pred.argmax(1, keepdim=True)\n",
    "    correct = top_pred.eq(y.view_as(top_pred)).sum()\n",
    "    acc = correct.float() / y.shape[0]\n",
    "    return acc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4527bf44",
   "metadata": {},
   "source": [
    "## Set up the batches\n",
    "\n",
    "We will do mini-batch gradient descent with Adam. So we can set up the batch sizes "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9afbfd3e",
   "metadata": {},
   "outputs": [],
   "source": [
    "BATCH_SIZE = 64\n",
    "\n",
    "train_iterator = data.DataLoader(train_data,\n",
    "                                 shuffle=True,\n",
    "                                 batch_size=BATCH_SIZE)\n",
    "\n",
    "valid_iterator = data.DataLoader(valid_data,\n",
    "                                 batch_size=BATCH_SIZE)\n",
    "\n",
    "test_iterator = data.DataLoader(test_data,\n",
    "                                batch_size=BATCH_SIZE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d51f8889",
   "metadata": {},
   "source": [
    "We finally define our training loop.\n",
    "\n",
    "This will:\n",
    "\n",
    "    put our model into train mode\n",
    "    iterate over our dataloader, returning batches of (image, label)\n",
    "    place the batch on to our GPU, if we have one\n",
    "    clear the gradients calculated from the last batch\n",
    "    pass our batch of images, x, through to model to get predictions, y_pred\n",
    "    calculate the loss between our predictions and the actual labels\n",
    "    calculate the accuracy between our predictions and the actual labels\n",
    "    calculate the gradients of each parameter\n",
    "    update the parameters by taking an optimizer step\n",
    "    update our metrics\n",
    "\n",
    "Some layers act differently when training and evaluating the model that contains them, hence why we must tell our model we are in \"training\" mode. The model we are using here does not use any of those layers, however it is good practice to get used to putting your model in training mode.\n",
    "\n",
    "**reuse the training/evaluation loops from the neural nets notebook**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac71dae5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b06edfe6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0bfe6785",
   "metadata": {},
   "outputs": [],
   "source": [
    "def epoch_time(start_time, end_time):\n",
    "    elapsed_time = end_time - start_time\n",
    "    elapsed_mins = int(elapsed_time / 60)\n",
    "    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))\n",
    "    return elapsed_mins, elapsed_secs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03dc35f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "EPOCHS = 10\n",
    "\n",
    "best_valid_loss = float('inf')\n",
    "history = []\n",
    "\n",
    "for epoch in trange(EPOCHS):\n",
    "\n",
    "    start_time = time.monotonic()\n",
    "\n",
    "    train_loss, train_acc = train(model, train_iterator, optimizer, criterion, device)\n",
    "    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion, device)\n",
    "\n",
    "    if valid_loss < best_valid_loss:\n",
    "        best_valid_loss = valid_loss\n",
    "        torch.save(model.state_dict(), 'tut1-model.pt')\n",
    "\n",
    "    end_time = time.monotonic()\n",
    "    history.append({'epoch': epoch, 'epoch_time': epoch_time, \n",
    "                    'valid_acc': valid_acc, 'train_acc': train_acc})\n",
    "\n",
    "    epoch_mins, epoch_secs = epoch_time(start_time, end_time)\n",
    "\n",
    "    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')\n",
    "    print(f'\\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')\n",
    "    print(f'\\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26042e24",
   "metadata": {},
   "source": [
    "## Plot the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad29349c",
   "metadata": {},
   "outputs": [],
   "source": [
    "epochs = [x[\"epoch\"] for x in history]\n",
    "train_loss = [x[\"train_acc\"] for x in history]\n",
    "valid_loss = [x[\"valid_acc\"] for x in history]\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(epochs, train_loss, label=\"train\")\n",
    "ax.plot(epochs, valid_loss, label=\"valid\")\n",
    "ax.set(xlabel=\"Epoch\", ylabel=\"Acc.\")\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e64e2658",
   "metadata": {},
   "source": [
    "## Try on the test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46af7137",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.load_state_dict(torch.load('tut1-model.pt'))\n",
    "test_loss, test_acc = evaluate(model, test_iterator, criterion, device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4167264f",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b7db8ba",
   "metadata": {},
   "source": [
    "## Try out on a rotated test set\n",
    "\n",
    "1. Add rotations to the test data and see how well the model performs. \n",
    "\n",
    "You add rotations in the transforms, so declare a new transform and a new test data set:\n",
    "\n",
    "```python\n",
    "\n",
    "roated_test_transforms = transforms.Compose([\n",
    "                           transforms.RandomRotation(25, fill=(0,)),\n",
    "                           transforms.ToTensor(),\n",
    "                           transforms.Normalize(mean=[mean], std=[std])\n",
    "                                      ])\n",
    "\n",
    "rotated_test_data = datasets.FashionMNIST(root=ROOT,\n",
    "                           train=False,\n",
    "                           download=True,\n",
    "                           transform=roated_test_transforms)\n",
    "\n",
    "rotated_test_iterator = data.DataLoader(rotated_test_data,\n",
    "                                batch_size=BATCH_SIZE)\n",
    "```\n",
    "2. Now go to the exercises in the DNN notebook, do the same rotations and compare the performance of the two models.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb62ff78",
   "metadata": {},
   "source": [
    "## Competition \n",
    "\n",
    "Try to get the best possible accuracy on the test set, some things you can try:\n",
    "\n",
    "* Hyperparameter tuning - change the dense layers in the CNN\n",
    "* Train for longer\n",
    "* Change batch size\n",
    "* Play with learning rates `optimizer = optim.Adam(model.parameters(), lr=<lr>)` - default is `0.01`\n",
    "* Try adding dropout in the dense layers of the CNN\n",
    "* Try altering the convolutional filters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cee07b90",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdb21a9e",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}