Back to Problems

Implement Compressed Column Sparse Matrix Format (CSC)

Task: Create a Compressed Column Sparse Matrix Representation

Your task is to implement a function that converts a dense matrix into its Compressed Column Sparse (CSC) representation. The CSC format stores only non-zero elements of the matrix and is efficient for matrices with a high number of zero elements.

Write a function compressed_col_sparse_matrix(dense_matrix) that takes in a two-dimensional list dense_matrix and returns a tuple of three lists:

  • values: List of non-zero elements, stored in column-major order.
  • row indices: List of row indices corresponding to each value in the values array.
  • column pointer: List that indicates the starting index of each column in the values array.

Example

Example:
input:
dense_matrix = [
    [0, 0, 3, 0],
    [1, 0, 0, 4],
    [0, 2, 0, 0]
]

vals, row_idx, col_ptr = compressed_col_sparse_matrix(dense_matrix)
Output: 
[1, 2, 3, 4] [1, 2, 0, 1] [0, 1, 2, 3, 4]

Understanding the Compressed Row Sparse Matrix Format

The Compressed Row Sparse, CSR, format is a data-efficient representation of sparse matrices, where most of the elements are zero. This format is particularly useful in large-scale scientific computing and machine learning applications, where memory efficiency is critical.

Concepts

A sparse matrix is a matrix that contains a large number of zero elements. Storing such matrices in their full form can be inefficient, both in terms of memory and computational resources. The CSR format addresses this problem by storing only the non-zero elements and their positions in the matrix. In the CSR format, a matrix is represented by three one-dimensional arrays:
  • Values array: Contains all the non-zero elements of the matrix, stored row by row.
  • Column indices array: Stores the column index corresponding to each value in the values array.
  • Row pointer array: Stores the cumulative number of non-zero elements in each row, allowing quick access to each row's data. This means that it points to the position within the column indices array at which the row starts.

Structure

Given a matrix: \[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 3 & 0 & 4 & 0 \\ 1 & 0 & 0 & 5 \end{bmatrix} \] The CSR representation would be:
  • Values array: [1, 2, 3, 4, 1, 5]
  • Column indices array: [0, 1, 0, 2, 0, 3]
  • Row pointer array: [0, 1, 2, 4, 6]

Explanation:

  • The values array holds the non-zero elements in the matrix, in row-major order.
  • The column indices array stores the corresponding column index of each non-zero element.
  • The row pointer array keeps track of where each row starts in the values array. For example, row 1 starts at index 0, row 2 starts at index 1, row 3 starts at index 2, within the columns indices array, and so on.

Applications

The CSR format is widely used in high-performance computing applications such as:
  • Finite element analysis (FEA)
  • Solving large sparse linear systems (e.g., in numerical simulations)
  • Machine learning algorithms (e.g., support vector machines with sparse input)
  • Graph-based algorithms where adjacency matrices are often sparse
The CSR format improves both memory efficiency and the speed of matrix operations by focusing only on non-zero elements.
def compressed_col_sparse_matrix(dense_matrix):
    vals = []
    row_idx = []
    col_ptr = [0]

    rows, cols = len(dense_matrix), len(dense_matrix[0])

    for i in range(cols):
        for j in range(rows):
            val = dense_matrix[j][i]
            if val != 0:
                vals.append(val)
                row_idx.append(j)
        col_ptr.append(len(vals))

    return vals, row_idx, col_ptr

There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.

Your Solution

Output will be shown here.