Example: X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) y = np.array([1, 2, 3, 4]) output: (array([[5, 6], [1, 2], [7, 8], [3, 4]]), array([3, 1, 4, 2]))
Random shuffling of a dataset is a common preprocessing step in machine learning to ensure that the data is randomly distributed before training a model. This helps to avoid any potential biases that may arise from the order in which data is presented to the model.
Here's a step-by-step method to shuffle a dataset:
This method ensures that the correspondence between X and y is maintained after shuffling.
import numpy as np def shuffle_data(X, y, seed=None): if seed: np.random.seed(seed) idx = np.arange(X.shape[0]) np.random.shuffle(idx) return X[idx], y[idx]
There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.