Convolutional Filter

Convolutional Filter#

A convolutional filter is the basic element (or neuron) of a CNN. To better understand CNN, we first learn how a convolutional filter works by hand coding it.

The kernel#

A convolutional filter extracts a part of the input image and inner-product it with a kernel to fill one pixel in the output image. The process is illustrated in the following figure. The behaviour of a convolutional filter is predominated by its kernel. For image processing, we need to specify the kernel as an input parameter. In a CNN, however, we only specify the size of the kernels whereas their values are learnt by training.

Padding and stride#

In addition to the kernel, there are some other useful parameters, such as:

Padding: padding zeros around the input image to preserve (or even increase) the image size, e.g., when padding = 1:

Stride: it controls how fast the kernel moves over the input image and thus the size of the output image, e.g., when stride = 2:

Implement a convolutional filter#

Input#

input_image: an input image with shape (nx, ny, nchannel)
kernel: a square matrix with shape (k, k)
padding: a non-negative integer
stride: a positive integer; to sample the right edge of the input image, it must divide (nx + padding * 2 - k), similarly for the bottom edge; it also controls the output resolution and the computational cost

Output#

return: an output image with shape (nx_out, ny_out, nchannel), where nx_out = (nx + padding * 2 - k) // stride + 1 and ny_out = (ny + padding * 2 - k) // stride + 1

NOTE: For readability, the code is a dry implementation without much optimisation, so its performance is not high. Increase stride to speedup the processing at the cost of a downsampled output image.

# a 2D convolutonal filter
def convolve2D(input_image, kernel, padding=1, stride=1):
    # padding
    nx = input_image.shape[0]
    ny = input_image.shape[1]
    nchannel = input_image.shape[2]
    if padding > 0:
        padded_image = np.zeros((nx + padding * 2, ny + padding * 2, nchannel))
        padded_image[padding:-padding, padding:-padding, :] = input_image
    else:
        padded_image = input_image
    
    # allocate output
    k = kernel.shape[0]
    nx_out = (nx + padding * 2 - k) // stride + 1 # must use // instead of /
    ny_out = (ny + padding * 2 - k) // stride + 1
    output_image = np.zeros((nx_out, ny_out, nchannel))
    
    # compute output pixel by pixel
    for ix_out in np.arange(nx_out):
        for iy_out in np.arange(ny_out):
            ix_in = ix_out * stride
            iy_in = iy_out * stride
            # the inner product
            output_image[ix_out, iy_out, :] = \
            np.tensordot(kernel, padded_image[ix_in:(ix_in + k), iy_in:(iy_in + k), :], axes=2)
    
    # truncate to [0, 1]
    output_image = np.maximum(output_image, 0)
    output_image = np.minimum(output_image, 1)
    return output_image

Apply our convolutional filter#

Next, we load an image from skimage.data and apply our convolutional filter to it. Here we will use the 3 \(\times\) 3 Sobel kernel, which is good at edge detection:

\(k=\begin{bmatrix} 1 & 0 & -1\\ 2 & 0 & -2\\ 1 & 0 & -1 \end{bmatrix}\)

# load some image
input_image = skimage.data.coffee()
input_image = input_image / 255.

# print image size
print('Image pixels: %d x %d' % (input_image.shape[0], input_image.shape[1]))
print('Channels (RGB): %d' % (input_image.shape[2]))

# vertical Sobel kernel
kernel = np.array([
    [1, 0, -1],
    [2, 0, -2],
    [1, 0, -1]])

##################################
# Also try the following kernels #
##################################

# # horizontal Sobel kernel
# kernel = np.array([
#     [1, 2, 1],
#     [0, 0, 0],
#     [-1, -2, -1]])

# # smoothening
# kernel = np.array([
#     [1, 1, 1],
#     [1, 1, 1],
#     [1, 1, 1]]) / 9

# # sharpening
# kernel = np.array([
#     [0, -1, 0],
#     [-1, 5, -1],
#     [0, -1, 0]])


#######################
# Try a larger stride #
#######################
# do convolution
output_image = convolve2D(input_image, kernel, padding=1, stride=1)

# plot original image
plt.figure(dpi=100, figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.imshow(input_image)
plt.axis('off')
plt.title('Original (%d x %d)' % (input_image.shape[0], input_image.shape[1]))

# plot convolved image
plt.subplot(1, 2, 2)
plt.imshow(output_image)
plt.axis('off')
plt.title('Convolved (%d x %d)' % (output_image.shape[0], output_image.shape[1]))
plt.show()

The above results show that the Sobel kernel can depict the outline of the objects. The capability of detecting object features by associating neighboring pixels makes CNNs powerful in analysing image data.

Exercise#

The vertical and the horizontal Sobel kernels can be superposed to make an inclined-edge detecting kernel:

\(k(\theta)=\cos(\theta)\begin{bmatrix} 1 & 2 & 1\\ 0 & 0 & 0\\ -1 & -2 & -1 \end{bmatrix}+ \sin(\theta)\begin{bmatrix} 1 & 0 & -1\\ 2 & 0 & -2\\ 1 & 0 & -1 \end{bmatrix}\)

where \(\theta\) is the angle from horizontal.

Find a kernel to erase most of the stripes on the table.