{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "683469c0", "metadata": {}, "outputs": [], "source": [ "# helpers\n", "import skimage\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')" ] }, { "cell_type": "markdown", "id": "525b363a", "metadata": {}, "source": [ "# Convolutional Filter\n", "\n", "A convolutional filter is the basic element (or neuron) of a CNN. To better understand CNN, we first learn how a convolutional filter works by hand coding it.\n", "\n", "### The kernel\n", "\n", "A convolutional filter extracts a part of the input image and inner-product it with a **kernel** to fill one pixel in the output image. The process is illustrated in the following figure. The behaviour of a convolutional filter is predominated by its kernel. *For image processing, we need to specify the kernel as an input parameter. In a CNN, however, we only specify the size of the kernels whereas their values are learnt by training.*\n", "\n", "\n", "\n", "### Padding and stride\n", "\n", "In addition to the kernel, there are some other useful parameters, such as:\n", "\n", "* **Padding**: padding zeros around the input image to preserve (or even increase) the image size, e.g., when padding = 1:\n", "\n", "\n", "\n", "* **Stride**: it controls how fast the kernel moves over the input image and thus the size of the output image, e.g., when stride = 2:\n", "\n", "" ] }, { "cell_type": "markdown", "id": "c8ff69fd", "metadata": {}, "source": [ "### Implement a convolutional filter\n", "#### Input\n", "* `input_image`: an input image with shape (nx, ny, nchannel)\n", "* `kernel`: a square matrix with shape (k, k)\n", "* `padding`: a non-negative integer\n", "* `stride`: a positive integer; to sample the right edge of the input image, it must divide (nx + padding * 2 - k), similarly for the bottom edge; it also controls the output resolution and the computational cost\n", "\n", "#### Output\n", "* `return`: an output image with shape (nx_out, ny_out, nchannel), where nx_out = (nx + padding * 2 - k) // stride + 1 and ny_out = (ny + padding * 2 - k) // stride + 1\n", "\n", "**NOTE**: For readability, the code is a dry implementation without much optimisation, so its performance is not high. Increase `stride` to speedup the processing at the cost of a downsampled output image." ] }, { "cell_type": "code", "execution_count": null, "id": "7f18820e", "metadata": {}, "outputs": [], "source": [ "# a 2D convolutonal filter\n", "def convolve2D(input_image, kernel, padding=1, stride=1):\n", " # padding\n", " nx = input_image.shape[0]\n", " ny = input_image.shape[1]\n", " nchannel = input_image.shape[2]\n", " if padding > 0:\n", " padded_image = np.zeros((nx + padding * 2, ny + padding * 2, nchannel))\n", " padded_image[padding:-padding, padding:-padding, :] = input_image\n", " else:\n", " padded_image = input_image\n", " \n", " # allocate output\n", " k = kernel.shape[0]\n", " nx_out = (nx + padding * 2 - k) // stride + 1 # must use // instead of /\n", " ny_out = (ny + padding * 2 - k) // stride + 1\n", " output_image = np.zeros((nx_out, ny_out, nchannel))\n", " \n", " # compute output pixel by pixel\n", " for ix_out in np.arange(nx_out):\n", " for iy_out in np.arange(ny_out):\n", " ix_in = ix_out * stride\n", " iy_in = iy_out * stride\n", " # the inner product\n", " output_image[ix_out, iy_out, :] = \\\n", " np.tensordot(kernel, padded_image[ix_in:(ix_in + k), iy_in:(iy_in + k), :], axes=2)\n", " \n", " # truncate to [0, 1]\n", " output_image = np.maximum(output_image, 0)\n", " output_image = np.minimum(output_image, 1)\n", " return output_image" ] }, { "cell_type": "markdown", "id": "ddf8ab32", "metadata": {}, "source": [ "### Apply our convolutional filter\n", "\n", "Next, we load an image from `skimage.data` and apply our convolutional filter to it. Here we will use the 3 $\\times$ 3 Sobel kernel, which is good at edge detection:\n", "\n", ">$k=\\begin{bmatrix}\n", " 1 & 0 & -1\\\\ \n", " 2 & 0 & -2\\\\\n", " 1 & 0 & -1\n", "\\end{bmatrix}$ " ] }, { "cell_type": "code", "execution_count": null, "id": "8b4f53f4", "metadata": {}, "outputs": [], "source": [ "# load some image\n", "input_image = skimage.data.coffee()\n", "input_image = input_image / 255.\n", "\n", "# print image size\n", "print('Image pixels: %d x %d' % (input_image.shape[0], input_image.shape[1]))\n", "print('Channels (RGB): %d' % (input_image.shape[2]))" ] }, { "cell_type": "code", "execution_count": null, "id": "0736d66e", "metadata": {}, "outputs": [], "source": [ "# vertical Sobel kernel\n", "kernel = np.array([\n", " [1, 0, -1],\n", " [2, 0, -2],\n", " [1, 0, -1]])\n", "\n", "##################################\n", "# Also try the following kernels #\n", "##################################\n", "\n", "# # horizontal Sobel kernel\n", "# kernel = np.array([\n", "# [1, 2, 1],\n", "# [0, 0, 0],\n", "# [-1, -2, -1]])\n", "\n", "# # smoothening\n", "# kernel = np.array([\n", "# [1, 1, 1],\n", "# [1, 1, 1],\n", "# [1, 1, 1]]) / 9\n", "\n", "# # sharpening\n", "# kernel = np.array([\n", "# [0, -1, 0],\n", "# [-1, 5, -1],\n", "# [0, -1, 0]])\n", "\n", "\n", "#######################\n", "# Try a larger stride #\n", "#######################\n", "# do convolution\n", "output_image = convolve2D(input_image, kernel, padding=1, stride=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "c41e690c", "metadata": {}, "outputs": [], "source": [ "# plot original image\n", "plt.figure(dpi=100, figsize=(12, 4))\n", "plt.subplot(1, 2, 1)\n", "plt.imshow(input_image)\n", "plt.axis('off')\n", "plt.title('Original (%d x %d)' % (input_image.shape[0], input_image.shape[1]))\n", "\n", "# plot convolved image\n", "plt.subplot(1, 2, 2)\n", "plt.imshow(output_image)\n", "plt.axis('off')\n", "plt.title('Convolved (%d x %d)' % (output_image.shape[0], output_image.shape[1]))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "83a1def3", "metadata": {}, "source": [ "The above results show that the Sobel kernel can depict the outline of the objects. The capability of detecting object features by associating neighboring pixels makes CNNs powerful in analysing image data.\n", "\n", "\n", "### Exercise\n", "\n", "The vertical and the horizontal Sobel kernels can be superposed to make an inclined-edge detecting kernel:\n", "\n", "$k(\\theta)=\\cos(\\theta)\\begin{bmatrix}\n", " 1 & 2 & 1\\\\ \n", " 0 & 0 & 0\\\\\n", " -1 & -2 & -1\n", "\\end{bmatrix}+\n", "\\sin(\\theta)\\begin{bmatrix}\n", " 1 & 0 & -1\\\\ \n", " 2 & 0 & -2\\\\\n", " 1 & 0 & -1\n", "\\end{bmatrix}$\n", "\n", "> where $\\theta$ is the angle from horizontal.\n", "\n", "Find a kernel to erase most of the stripes on the table.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "6854b6d8", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "alignn-2", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 5 }