Back to Problems

One-Hot Encoding of Nominal Values

Write a Python function to perform one-hot encoding of nominal values. The function should take in a 1D numpy array x of integer values and an optional integer n_col representing the number of columns for the one-hot encoded array. If n_col is not provided, it should be automatically determined from the input array.

Example

Example:
    x = np.array([0, 1, 2, 1, 0])
    output = to_categorical(x)
    print(output)
    # Output:
    # [[1. 0. 0.]
    #  [0. 1. 0.]
    #  [0. 0. 1.]
    #  [0. 1. 0.]
    #  [1. 0. 0.]]
    
    Reasoning:
    Each element in the input array is transformed into a one-hot encoded vector,
    where the index corresponding to the value in the input array is set to 1, 
    and all other indices are set to 0.
    

Understanding One-Hot Encoding

One-hot encoding is a method used to represent categorical variables as binary vectors. This technique is useful in machine learning when dealing with categorical data that has no ordinal relationship.

In one-hot encoding, each category is represented by a binary vector with a length equal to the number of categories. The vector has a value of 1 at the index corresponding to the category and 0 at all other indices.

For example, if you have three categories: 0, 1, and 2, the one-hot encoded vectors would be:

  • 0: \( \left[1, 0, 0\right] \)
  • 1: \( \left[0, 1, 0\right] \)
  • 2: \( \left[0, 0, 1\right] \)

This method ensures that the model does not assume any ordinal relationship between categories, which is crucial for many machine learning algorithms. The one-hot encoding process can be mathematically represented as follows:

Given a category \( x_i \) from a set of categories \( \{0, 1, \ldots, n-1\} \), the one-hot encoded vector \( \mathbf{v} \) is:

\[ \mathbf{v}_i = \begin{cases} 1 & \text{if } i = x_i \\ 0 & \text{otherwise} \end{cases} \]

This vector \( \mathbf{v} \) will have a length equal to the number of unique categories.

import numpy as np

def to_categorical(x, n_col=None):
    # One-hot encoding of nominal values
    # If n_col is not provided, determine the number of columns from the input array
    if not n_col:
        n_col = np.amax(x) + 1
    # Initialize a matrix of zeros with shape (number of samples, n_col)
    one_hot = np.zeros((x.shape[0], n_col))
    # Set the appropriate elements to 1
    one_hot[np.arange(x.shape[0]), x] = 1
    return one_hot
    
Contributors

No current contributors.

Want to help out? Contribute here.

Your Solution

Output will be shown here.