Example: x = np.array([0, 1, 2, 1, 0]) output = to_categorical(x) print(output) # Output: # [[1. 0. 0.] # [0. 1. 0.] # [0. 0. 1.] # [0. 1. 0.] # [1. 0. 0.]] Reasoning: Each element in the input array is transformed into a one-hot encoded vector, where the index corresponding to the value in the input array is set to 1, and all other indices are set to 0.
One-hot encoding is a method used to represent categorical variables as binary vectors. This technique is useful in machine learning when dealing with categorical data that has no ordinal relationship.
In one-hot encoding, each category is represented by a binary vector with a length equal to the number of categories. The vector has a value of 1 at the index corresponding to the category and 0 at all other indices.
For example, if you have three categories: 0, 1, and 2, the one-hot encoded vectors would be:
This method ensures that the model does not assume any ordinal relationship between categories, which is crucial for many machine learning algorithms. The one-hot encoding process can be mathematically represented as follows:
Given a category \( x_i \) from a set of categories \( \{0, 1, \ldots, n-1\} \), the one-hot encoded vector \( \mathbf{v} \) is:
\[ \mathbf{v}_i = \begin{cases} 1 & \text{if } i = x_i \\ 0 & \text{otherwise} \end{cases} \]This vector \( \mathbf{v} \) will have a length equal to the number of unique categories.
import numpy as np def to_categorical(x, n_col=None): # One-hot encoding of nominal values # If n_col is not provided, determine the number of columns from the input array if not n_col: n_col = np.amax(x) + 1 # Initialize a matrix of zeros with shape (number of samples, n_col) one_hot = np.zeros((x.shape[0], n_col)) # Set the appropriate elements to 1 one_hot[np.arange(x.shape[0]), x] = 1 return one_hot