You are provided with a base Layer
class that defines the structure of a neural network layer. Your task is to implement a subclass called Dense
, which represents a fully connected neural network layer. The Dense
class should extend the Layer
class and implement the following methods:
__init__
):
n_units
) and an optional input shape (input_shape
).W
), biases (w0
), and optimizers.initialize
):
W
using a uniform distribution with a limit of 1 / sqrt(input_shape[0])
, and bias w0
should be set to zeroW
and w0
.parameters
):
W
and w0
.forward_pass
):
X
and the weight matrix W
, and then adding the bias w0
.backward_pass
):
output_shape
):
(self.n_units,)
.Objective: Extend the Layer
class by implementing the Dense
class to ensure it functions correctly within a neural network framework.
Example Usage: # Initialize a Dense layer with 3 neurons and input shape (2,) dense_layer = Dense(n_units=3, input_shape=(2,)) # Define a mock optimizer with a simple update rule class MockOptimizer: def update(self, weights, grad): return weights - 0.01 * grad optimizer = MockOptimizer() # Initialize the Dense layer with the mock optimizer dense_layer.initialize(optimizer) # Perform a forward pass with sample input data X = np.array([[1, 2]]) output = dense_layer.forward_pass(X) print("Forward pass output:", output) # Perform a backward pass with sample gradient accum_grad = np.array([[0.1, 0.2, 0.3]]) back_output = dense_layer.backward_pass(accum_grad) print("Backward pass output:", back_output) Expected Output: Forward pass output: [[-0.00655782 0.01429615 0.00905812]] Backward pass output: [[ 0.00129588 0.00953634]]
The Dense layer, also known as a fully connected layer, is a fundamental building block in neural networks. It connects each input neuron to each output neuron, hence the term "fully connected."
In the `initialize` method, weights are typically initialized using a uniform distribution within a certain range. For a Dense layer, a common practice is to set this range as:
$$ ext{limit} = \frac{1}{\sqrt{ ext{input_shape}}} $$This initialization helps in maintaining a balance in the distribution of weights, preventing issues like vanishing or exploding gradients during training.
During the forward pass, the input data \(X\) is multiplied by the weight matrix \(W\) and added to the bias \(w0\) to produce the output:
$$ ext{output} = X \cdot W + w0 $$The backward pass computes the gradients of the loss function with respect to the input data, weight, and bias. If the layer is trainable, it updates the weights and biases using the optimizer's update rule:
$$ W = W - \eta \cdot \text{grad}_W $$ $$ w0 = w0 - \eta \cdot \text{grad}_{w0} $$where \(\eta\) is the learning rate and \(\text{grad}_W\) and \(\text{grad}_{w0}\) are the gradients of the weights and biases, respectively.
The shape of the output from a Dense layer is determined by the number of neurons in the layer. If a layer has `n_units` neurons, the output shape will be `(n_units,)`.
Resources:
class Dense(Layer): def __init__(self, n_units, input_shape=None): self.layer_input = None self.input_shape = input_shape self.n_units = n_units self.trainable = True self.W = None self.w0 = None def initialize(self, optimizer): limit = 1 / math.sqrt(self.input_shape[0]) self.W = np.random.uniform(-limit, limit, (self.input_shape[0], self.n_units)) self.w0 = np.zeros((1, self.n_units)) self.W_opt = copy.copy(optimizer) self.w0_opt = copy.copy(optimizer) def parameters(self): return np.prod(self.W.shape) + np.prod(self.w0.shape) def forward_pass(self, X, training=True): self.layer_input = X return X.dot(self.W) + self.w0 def backward_pass(self, accum_grad): W = self.W if self.trainable: grad_w = self.layer_input.T.dot(accum_grad) grad_w0 = np.sum(accum_grad, axis=0, keepdims=True) self.W = self.W_opt.update(self.W, grad_w) self.w0 = self.w0_opt.update(self.w0, grad_w0) accum_grad = accum_grad.dot(W.T) return accum_grad def output_shape(self): return (self.n_units, )