Back to Problems
## One-Hot Encoding of Nominal Values

#### Example

## Understanding One-Hot Encoding

Write a Python function to perform one-hot encoding of nominal values. The function should take in a 1D numpy array x of integer values and an optional integer n_col representing the number of columns for the one-hot encoded array. If n_col is not provided, it should be automatically determined from the input array.

Example: x = np.array([0, 1, 2, 1, 0]) output = to_categorical(x) print(output) # Output: # [[1. 0. 0.] # [0. 1. 0.] # [0. 0. 1.] # [0. 1. 0.] # [1. 0. 0.]] Reasoning: Each element in the input array is transformed into a one-hot encoded vector, where the index corresponding to the value in the input array is set to 1, and all other indices are set to 0.

One-hot encoding is a method used to represent categorical variables as binary vectors. This technique is useful in machine learning when dealing with categorical data that has no ordinal relationship.

In one-hot encoding, each category is represented by a binary vector with a length equal to the number of categories. The vector has a value of 1 at the index corresponding to the category and 0 at all other indices.

For example, if you have three categories: 0, 1, and 2, the one-hot encoded vectors would be:

- 0: \( \left[1, 0, 0\right] \)
- 1: \( \left[0, 1, 0\right] \)
- 2: \( \left[0, 0, 1\right] \)

This method ensures that the model does not assume any ordinal relationship between categories, which is crucial for many machine learning algorithms. The one-hot encoding process can be mathematically represented as follows:

Given a category \( x_i \) from a set of categories \( \{0, 1, \ldots, n-1\} \), the one-hot encoded vector \( \mathbf{v} \) is:

\[ \mathbf{v}_i = \begin{cases} 1 & \text{if } i = x_i \\ 0 & \text{otherwise} \end{cases} \]This vector \( \mathbf{v} \) will have a length equal to the number of unique categories.

import numpy as np def to_categorical(x, n_col=None): # One-hot encoding of nominal values # If n_col is not provided, determine the number of columns from the input array if not n_col: n_col = np.amax(x) + 1 # Initialize a matrix of zeros with shape (number of samples, n_col) one_hot = np.zeros((x.shape[0], n_col)) # Set the appropriate elements to 1 one_hot[np.arange(x.shape[0]), x] = 1 return one_hot

Output will be shown here.

Solution copied to clipboard!