R-squared, also known as the coefficient of determination, is a measure that indicates how well the independent variables explain the variability of the dependent variable in a regression model. Your task is to implement the function r_squared(y_true, y_pred)
that calculates the R-squared value, given arrays of true values y_true
and predicted values y_pred
.
Example: import numpy as np y_true = np.array([1, 2, 3, 4, 5]) y_pred = np.array([1.1, 2.1, 2.9, 4.2, 4.8]) print(r_squared(y_true, y_pred)) Output: 0.989
R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. It provides insight into how well the model fits the data.
The R-squared value is calculated using the following formula:
\[ R^2 = 1 - \frac{\text{SSR}}{\text{SST}} \]Where:
To calculate SSR and SST, we use the following formulas:
Where:
R-squared is a key metric for evaluating how well a regression model performs. A higher R-squared value indicates a better fit for the model, meaning it can explain more variability in the data. However, it’s important to note that a high R-squared does not always imply that the model is good; it can sometimes be misleading if overfitting occurs. Therefore, it should be used in conjunction with other metrics for comprehensive model evaluation.
In this problem, you will implement a function to calculate R-squared given arrays of true and predicted values from a regression task. The results should be rounded to three decimal places.
In the solution, the implemented r_squared()
function calculates R-squared by first determining SSR and SST, then applying them to compute R². It handles edge cases such as perfect predictions and situations where all true values are identical.
You can refer to this resource for more information: Coefficient of Determination.
import numpy as np def r_squared(y_true, y_pred): """ Calculate the R-squared (R²) coefficient of determination. Args: y_true (numpy.ndarray): Array of true values y_pred (numpy.ndarray): Array of predicted values Returns: float: R-squared value rounded to 3 decimal places """ if np.array_equal(y_true, y_pred): return 1.0 # Calculate mean of true values y_mean = np.mean(y_true) # Calculate Sum of Squared Residuals (SSR) ssr = np.sum((y_true - y_pred) ** 2) # Calculate Total Sum of Squares (SST) sst = np.sum((y_true - y_mean) ** 2) try: # Calculate R-squared r2 = 1 - (ssr / sst) if np.isinf(r2): return 0.0 return round(r2, 3) except ZeroDivisionError: return 0.0
There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.