Implement Compressed Column Sparse Matrix Format (CSC)
✔
Task: Create a Compressed Column Sparse Matrix Representation
Your task is to implement a function that converts a dense matrix into its Compressed Column Sparse (CSC) representation. The CSC format stores only non-zero elements of the matrix and is efficient for matrices with a high number of zero elements.
Write a function compressed_col_sparse_matrix(dense_matrix) that takes in a two-dimensional list dense_matrix and returns a tuple of three lists:
values: List of non-zero elements, stored in column-major order.
row indices: List of row indices corresponding to each value in the values array.
column pointer: List that indicates the starting index of each column in the values array.
Understanding the Compressed Row Sparse Matrix Format
The Compressed Row Sparse, CSR, format is a data-efficient representation of
sparse matrices, where most of the elements are zero. This format is
particularly useful in large-scale scientific computing and machine learning
applications, where memory efficiency is critical.
Concepts
A sparse matrix is a matrix that contains a large number of zero elements.
Storing such matrices in their full form can be inefficient, both in terms of
memory and computational resources. The CSR format addresses this problem by
storing only the non-zero elements and their positions in the matrix. In the CSR
format, a matrix is represented by three one-dimensional arrays:
Values array: Contains all the non-zero elements of the
matrix, stored row by row.
Column indices array: Stores the column index corresponding
to each value in the values array.
Row pointer array: Stores the cumulative number of non-zero
elements in each row, allowing quick access to each row's data. This means
that it points to the position within the column indices array at which the
row starts.
The values array holds the non-zero elements in the matrix,
in row-major order.
The column indices array stores the corresponding column
index of each non-zero element.
The row pointer array keeps track of where each row starts
in the values array. For example, row 1 starts at index 0, row 2 starts at
index 1, row 3 starts at index 2, within the columns indices array, and so
on.
Applications
The CSR format is widely used in high-performance computing applications such
as:
Finite element analysis (FEA)
Solving large sparse linear systems (e.g., in numerical simulations)
Machine learning algorithms (e.g., support vector machines with sparse
input)
Graph-based algorithms where adjacency matrices are often sparse
The CSR format improves both memory efficiency and the speed of matrix
operations by focusing only on non-zero elements.
def compressed_col_sparse_matrix(dense_matrix):
vals = []
row_idx = []
col_ptr = [0]
rows, cols = len(dense_matrix), len(dense_matrix[0])
for i in range(cols):
for j in range(rows):
val = dense_matrix[j][i]
if val != 0:
vals.append(val)
row_idx.append(j)
col_ptr.append(len(vals))
return vals, row_idx, col_ptr
There’s no video solution available yet 😔, but you can be the first to submit one at: GitHub link.