SparseMatrix

Sparse Matrix

There is no strict definition how many elements need to be zero for a matrix to be considered sparse but a common criterion is that the number of non-zero elements is roughly the number of rows or columns.

When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix.(Don’t waste your precious memory on non-sense zeros!) Specialized computers have been made for sparse matrices, as they are common in the machine learning field.

Storage strategy

Formats can be divided into two groups:

  • Those that support efficient modification, such as DOK (Dictionary of keys), LIL (List of lists), or COO (Coordinate list). These are typically used to construct the matrices.
  • Those that support efficient access and matrix operations, such as CSR (Compressed Sparse Row) or CSC (Compressed Sparse Column).

快速转置

原稀疏矩阵$m\times n$,共有非0元素$t$个,则算法复杂度为$O(n+t)$

  • 转置后行号为转置前的列号,即转置前列号在前的,转置后行号也在前
  • 转置后的行号(转置前的列号)的存储位置为所有转置前该列前所有非0元素个数和
    • 注意递归转化为迭代

根据以上信息,即可构造出转置后的压缩稀疏矩阵表

用空间$(S(n))$换时间,先从原压缩稀疏矩阵表中提取出以上信息$O(t)$,(忽略初始化时间$O(n)$),迭代法构造出可以直接使用的下标($O(n)$),再用这些信息去构造结果$O(n)$(遍历信息矩阵空间),总共$O(n+t)$

注意在使用下标的时候不要忘记++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <stdio.h>
#define Type int
#define N 8
#define columnLength 6
typedef struct Item{
int row ;
int col ;
Type element ;
}Item ;

typedef struct Info{
int number_of_elements_of_current_col ;
int location;
}Info;

int main(){
Item compressed [N] = {{0,0,16} , {0,3,22} , {0,5,16} , {1,1,12}, {1,2,3} ,{2,3,-8},{4,0,91} , {6,2,15}};
Info infomation [columnLength] = {{0,0} };
Item transposed [N] = {{0,0,0},};

//loop over compressed to get number of elements of current column
for (int i = 0 ; i < N ; i++){
infomation[compressed[i].col].number_of_elements_of_current_col++;
}
//loop over information to generate usable location
for (int i = 1 ; i < columnLength ; i++){
infomation[i].location = infomation[i-1].location + infomation[i-1].number_of_elements_of_current_col;
}
//loop over compressed & transposed to do the transposing
for (int i = 0 ; i < N ; i++){
int index = infomation[compressed[i].col].location++;
//++ means in case of one col contains multiple elements, move the location 1 step forward. So the output location is the last element's location instead of the first one's.


transposed[index].row = compressed[i].col;
transposed[index].col = compressed[i].row;
transposed[index].element = compressed[i].element;
}
//print out the result
for (int i = 0 ; i < columnLength ; i++) printf("%d %d\n" , infomation[i].number_of_elements_of_current_col, infomation[i].location);
for (int i = 0 ; i < N ; i++) printf("{%d,%d,%d}\n" , transposed[i].row , transposed[i].col , transposed[i].element);
}
/*
2 2
1 3
2 5
2 7
0 7
1 8
{0,0,16}
{0,4,91}
{1,1,12}
{2,1,3}
{2,6,15}
{3,0,22}
{3,2,-8}
{5,0,16}
*/