Back to Problems

Calculate Jaccard Index for Binary Classification

Task: Implement the Jaccard Index

Your task is to implement a function jaccard_index(y_true, y_pred) that calculates the Jaccard Index, a measure of similarity between two binary sets. The Jaccard Index is widely used in binary classification tasks to evaluate the overlap between predicted and true labels.

The function should handle cases where there is no overlap or when both arrays contain only zeros.

Example

Example:
y_true = np.array([1, 0, 1, 1, 0, 1])
y_pred = np.array([1, 0, 1, 0, 0, 1])
print(jaccard_index(y_true, y_pred))
Output: 0.75

Understanding Jaccard Index in Classification

The Jaccard Index, also known as the Jaccard Similarity Coefficient, is a statistic used to measure the similarity between sets. In the context of binary classification, it measures the overlap between predicted and actual positive labels.

Mathematical Definition

The Jaccard Index is defined as the size of the intersection divided by the size of the union of two sets:

\[ \text{Jaccard Index} = \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cap B|}{|A| + |B| - |A \cap B|} \]

In the context of binary classification:

  • Intersection (A ∩ B): The number of positions where both the predicted and true labels are 1 (True Positives)
  • Union (A ∪ B): The number of positions where either the predicted or true labels (or both) are 1

Key Properties

  • Range: The Jaccard Index always falls between 0 and 1 (inclusive)
  • Perfect Match: A value of 1 indicates identical sets
  • No Overlap: A value of 0 indicates disjoint sets
  • Symmetry: The index is symmetric, meaning J(A,B) = J(B,A)

Example

Consider two binary vectors:

  • True labels: [1, 0, 1, 1, 0, 1]
  • Predicted labels: [1, 0, 1, 0, 0, 1]

In this case:

  • Intersection (positions where both are 1): 3
  • Union (positions where either is 1): 4
  • Jaccard Index = 3/4 = 0.75

Usage in Machine Learning

The Jaccard Index is particularly useful in:

  • Evaluating clustering algorithms
  • Comparing binary classification results
  • Document similarity analysis
  • Image segmentation evaluation

When implementing the Jaccard Index, it's important to handle edge cases, such as when both sets are empty (in which case the index is typically defined as 0).

import numpy as np

def jaccard_index(y_true, y_pred):
    intersection = np.sum((y_true == 1) & (y_pred == 1))
    union = np.sum((y_true == 1) | (y_pred == 1))
    result = intersection / union
    if np.isnan(result):
        return 0.0
    return round(result, 3)

There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.

Your Solution

Output will be shown here.