Your task is to write a Python function that implements the k-Means clustering algorithm. This function should take specific inputs and produce a list of final centroids. k-Means clustering is a method used to partition n
points into k
clusters. The goal is to group similar points together and represent each group by its center (called the centroid).
points
: A list of points, where each point is a tuple of coordinates (e.g., (x, y)
for 2D points)k
: An integer representing the number of clusters to forminitial_centroids
: A list of initial centroid points, each a tuple of coordinatesmax_iterations
: An integer representing the maximum number of iterations to performA list of the final centroids of the clusters, where each centroid is rounded to the nearest fourth decimal.
Example: input: points = [(1, 2), (1, 4), (1, 0), (10, 2), (10, 4), (10, 0)], k = 2, initial_centroids = [(1, 1), (10, 1)], max_iterations = 10 output: [(1, 2), (10, 2)] reasoning: Given the initial centroids and a maximum of 10 iterations, the points are clustered around these points, and the centroids are updated to the mean of the assigned points, resulting in the final centroids which approximate the means of the two clusters. The exact number of iterations needed may vary, but the process will stop after 10 iterations at most.
Use the provided initial_centroids
as your starting point. This step is already done for you in the input.
For each point in your dataset:
Hint: Consider creating a helper function to calculate Euclidean distance between two points.
For each cluster:
Hint: Be careful with potential empty clusters. Decide how you'll handle them (e.g., keep the previous centroid).
Repeat steps 2 and 3 until either:
max_iterations
limitHint: You might want to keep track of the previous centroids to check for significant changes.
Return the list of final centroids, ensuring each coordinate is rounded to the nearest fourth decimal.
For a visual understanding of how k-Means clustering works, check out this helpful YouTube video.
import numpy as np def euclidean_distance(a, b): return np.sqrt(((a - b) ** 2).sum(axis=1)) def k_means_clustering(points, k, initial_centroids, max_iterations): points = np.array(points) centroids = np.array(initial_centroids) for iteration in range(max_iterations): # Assign points to the nearest centroid distances = np.array([euclidean_distance(points, centroid) for centroid in centroids]) assignments = np.argmin(distances, axis=0) new_centroids = np.array([points[assignments == i].mean(axis=0) if len(points[assignments == i]) > 0 else centroids[i] for i in range(k)]) # Check for convergence if np.all(centroids == new_centroids): break centroids = new_centroids centroids = np.round(centroids,4) return [tuple(centroid) for centroid in centroids]
There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.