Back to Problems

Calculate Cosine Similarity Between Vectors

Task: Implement Cosine Similarity

In this task, you need to implement a function cosine_similarity(v1, v2) that calculates the cosine similarity between two vectors. Cosine similarity measures the cosine of the angle between two vectors, indicating their directional similarity.

Input:

  • v1 and v2: Numpy arrays representing the input vectors.

Output:

  • A float representing the cosine similarity, rounded to three decimal places.

Constraints:

  • Both input vectors must have the same shape.
  • Input vectors cannot be empty or have zero magnitude.

Example:

Input:
v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6])
print(cosine_similarity(v1, v2))
Output:
1.0

Example

Example:
import numpy as np

v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6])
print(cosine_similarity(v1, v2))
Output:
1.0

Cosine Similarity

Measures the cosine of the angle between two vectors. Doesn't take into consideration the magnitude of the vectors but focuesses on the angle

Cosine Similarity

\[ cos(\theta) = \frac{\sum_{i=1}^{p} A_i B_i}{\sqrt{\sum_{i=1}^{p} A_i^2} \sqrt{\sum_{i=1}^{p} B_i^2}} \]

Implementation Steps for Cosine Similarity:

  1. Handle Input : Ensure input vectors have same dimensions and handle edge cases (zero vectors).
  2. Dot Product : Compute \( \sum_{i=1}^{p} A_i B_i \) between vectors.
  3. Magnitudes : Compute L2 norms \( \sqrt{\sum_{i=1}^{p} A_i^2} \) for both vectors.
  4. Final Result : Divide dot product by product of magnitudes.

Use Cases:

  1. Text and Image Similarity
  2. Recommendation System
  3. Query Matching

Pitfalls

  1. Magnitude Blindness vector1 = (1,1) vector2 = (1000, 1000) These would have cosine similarity = 1, despite very different magnitudes
  2. Sparse Data Issues: In high-dimensional spaces, where data is often sparse, cosine similarity can be less reliable.
  3. Non-Negative Data Limitation: With all positive values, can't get negative similarity May not capture certain types of inverse relationships
import numpy as np

def cosine_similarity(v1, v2):
    if v1.shape != v2.shape:
        raise ValueError("Arrays must have the same shape")

    if v1.size == 0:
        raise ValueError("Arrays cannot be empty")

    # Flatten arrays in case of 2D
    v1_flat = v1.flatten()
    v2_flat = v2.flatten()

    dot_product = np.dot(v1_flat, v2_flat)
    magnitude1 = np.sqrt(np.sum(v1_flat**2))
    magnitude2 = np.sqrt(np.sum(v2_flat**2))

    if magnitude1 == 0 or magnitude2 == 0:
        raise ValueError("Vectors cannot have zero magnitude")

    return round(dot_product / (magnitude1 * magnitude2), 3)

There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.

Your Solution

Output will be shown here.