Back to Problems

Simple Convolutional 2D Layer

In this problem, you need to implement a 2D convolutional layer in Python. This function will process an input matrix using a specified convolutional kernel, padding, and stride.

Example

Example:
import numpy as np
input_matrix = np.array([
    [1., 2., 3., 4., 5.],
    [6., 7., 8., 9., 10.],
    [11., 12., 13., 14., 15.],
    [16., 17., 18., 19., 20.],
    [21., 22., 23., 24., 25.],
])
kernel = np.array([
    [1., 2.],
    [3., -1.],
])
padding, stride = 0, 1
# Expected Output:
np.array([
    [ 16., 21., 26., 31.],
    [ 41., 46., 51., 56.],
    [ 66., 71., 76., 81.],
    [ 91., 96., 101., 106.],
])

Simple Convolutional 2D Layer

Convolutional layer is widely used in Computer Vision tasks. Here is crucial parameters:

Parameters:

  1. input_matrix: is a 2D NumPy array representing the input data, such as an image. Each element in this array corresponds to a pixel or a feature value in the input space. The dimensions of the input matrix are typically denoted as ${height, width}$.
  2. kernel: is another 2D NumPy array representing the convolutional filter. The kernel is smaller than the input matrix and slides over it to perform the convolution operation. Each element in the kernel represents a weight that modifies the input as it is convolved over it. The kernel size is denoted as ${kernel}$_${height}$, ${kernel}$_${width}$.
  3. padding: is an integer specifying the number of rows and columns of zeros added around the input matrix. Padding helps control the spatial dimensions of the output. It can be used to maintain the same output size as the input or to allow the kernel to process edge elements in the input matrix more effectively.
  4. stride: is an integer representing the number of steps the kernel moves across the input matrix for each convolution operation. A stride greater than one results in a smaller output size, as the kernel skips over some elements.

Implementation:

  1. Padding the Input: The input matrix is padded with zeros based on the `padding` value. This increases the input size, allowing the kernel to cover elements at the borders and corners that would otherwise be skipped.
  2. Calculating Output Dimensions:The height and width of the output matrix are calculated using the formula: \[ \text{output}_\text{height} = \left(\frac{\text{input}_{\text{height, padded}} - \text{kernel}_\text{height}}{\text{stride}}\right) + 1 \] \[ \text{output}_\text{width} = \left(\frac{\text{input}_\text{width, padded} - \text{kernel}_\text{width}}{\text{stride}}\right) + 1 \]
  3. Performing Convolution:
    • A nested loop iterates over each position where the kernel can be placed on the padded input matrix.
    • At each position, a region of the input matrix the same size as the kernel is selected.
    • Element-wise multiplication between the kernel and the input region is performed, followed by summation to produce a single value. This value is stored in the corresponding position in the output matrix.
  4. Output: The function returns the output matrix, which contains the results of the convolution operation applied across the entire input.
import numpy as np

def simple_conv2d(input_matrix: np.ndarray, kernel: np.ndarray, padding: int, stride: int):
    input_height, input_width = input_matrix.shape
    kernel_height, kernel_width = kernel.shape

    padded_input = np.pad(input_matrix, ((padding, padding), (padding, padding)), mode='constant')
    input_height_padded, input_width_padded = padded_input.shape

    output_height = (input_height_padded - kernel_height) // stride + 1
    output_width = (input_width_padded - kernel_width) // stride + 1

    output_matrix = np.zeros((output_height, output_width))

    for i in range(output_height):
        for j in range(output_width):
            region = padded_input[i*stride:i*stride + kernel_height, j*stride:j*stride + kernel_width]
            output_matrix[i, j] = np.sum(region * kernel)

    return output_matrix

There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.

Your Solution

Output will be shown here.