Example: X = np.array([[1, 2], [3, 4], [5, 6]]) output = calculate_correlation_matrix(X) print(output) # Output: # [[1. 1.] # [1. 1.]] Reasoning: The function calculates the correlation matrix for the dataset X. In this example, the correlation between the two features is 1, indicating a perfect linear relationship.
A correlation matrix is a table showing the correlation coefficients between variables. Each cell in the table shows the correlation between two variables. The value is between -1 and 1, indicating the strength and direction of the linear relationship between the variables.
The correlation coefficient between two variables \(X\) and \(Y\) is given by:
\[ \text{corr}(X, Y) = \frac{\text{cov}(X, Y)}{\sigma_X \sigma_Y} \]Where:
In this problem, you will write a function to calculate the correlation matrix for a given dataset. The function will take in a 2D numpy array \(X\) and an optional 2D numpy array \(Y\). If \(Y\) is not provided, the function will calculate the correlation matrix of \(X\) with itself.
import numpy as np def calculate_correlation_matrix(X, Y=None): # Helper function to calculate standard deviation def calculate_std_dev(A): return np.sqrt(np.mean((A - A.mean(0))**2, axis=0)) if Y is None: Y = X n_samples = np.shape(X)[0] # Calculate the covariance matrix covariance = (1 / n_samples) * (X - X.mean(0)).T.dot(Y - Y.mean(0)) # Calculate the standard deviations std_dev_X = np.expand_dims(calculate_std_dev(X), 1) std_dev_y = np.expand_dims(calculate_std_dev(Y), 1) # Calculate the correlation matrix correlation_matrix = np.divide(covariance, std_dev_X.dot(std_dev_y.T)) return np.array(correlation_matrix, dtype=float)
There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.