Back to Problems

Calculate R-squared for Regression Analysis

Task: Compute the R-squared Value in Regression Analysis

R-squared, also known as the coefficient of determination, is a measure that indicates how well the independent variables explain the variability of the dependent variable in a regression model. Your task is to implement the function r_squared(y_true, y_pred) that calculates the R-squared value, given arrays of true values y_true and predicted values y_pred.

Example

Example:
import numpy as np

y_true = np.array([1, 2, 3, 4, 5])
y_pred = np.array([1.1, 2.1, 2.9, 4.2, 4.8])
print(r_squared(y_true, y_pred))
Output: 0.989

Understanding R-squared (R²) in Regression Analysis

R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. It provides insight into how well the model fits the data.

Mathematical Definition

The R-squared value is calculated using the following formula:

\[ R^2 = 1 - \frac{\text{SSR}}{\text{SST}} \]

Where:

  • SSR \((\)Sum of Squared Residuals\()\): The sum of the squares of the differences between the actual values and the predicted values.
  • SST \((\)Total Sum of Squares\()\): The sum of the squares of the differences between the actual values and the mean of the actual values.

Equations for SSR and SST

To calculate SSR and SST, we use the following formulas:

  • SSR: \[ \text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
  • SST: \[ \text{SST} = \sum_{i=1}^{n} (y_i - \bar{y})^2 \]

Where:

  • \( y_i \): Actual value
  • \( \hat{y}_i \): Predicted value
  • \( \bar{y} \): Mean of the actual values

Significance of R-squared

R-squared is a key metric for evaluating how well a regression model performs. A higher R-squared value indicates a better fit for the model, meaning it can explain more variability in the data. However, it’s important to note that a high R-squared does not always imply that the model is good; it can sometimes be misleading if overfitting occurs. Therefore, it should be used in conjunction with other metrics for comprehensive model evaluation.

Implementing R-squared Calculation

In this problem, you will implement a function to calculate R-squared given arrays of true and predicted values from a regression task. The results should be rounded to three decimal places.

In the solution, the implemented r_squared() function calculates R-squared by first determining SSR and SST, then applying them to compute R². It handles edge cases such as perfect predictions and situations where all true values are identical.

Reference

You can refer to this resource for more information: Coefficient of Determination.

import numpy as np

def r_squared(y_true, y_pred):
    """
    Calculate the R-squared (R²) coefficient of determination.
    
    Args:
        y_true (numpy.ndarray): Array of true values
        y_pred (numpy.ndarray): Array of predicted values
    
    Returns:
        float: R-squared value rounded to 3 decimal places
    """
    if np.array_equal(y_true, y_pred):
        return 1.0

    # Calculate mean of true values
    y_mean = np.mean(y_true)

    # Calculate Sum of Squared Residuals (SSR)
    ssr = np.sum((y_true - y_pred) ** 2)

    # Calculate Total Sum of Squares (SST)
    sst = np.sum((y_true - y_mean) ** 2)

    try:
        # Calculate R-squared
        r2 = 1 - (ssr / sst)
        if np.isinf(r2):
            return 0.0
        return round(r2, 3)
    except ZeroDivisionError:
        return 0.0

There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.

Your Solution

Output will be shown here.