Grammar-Based Representations of Large Sparse Binary Matrices
Large sparse matrices representation is a fundamental problem in big data processing and analysis. To reduce the requirement of memory bandwidth, it is important to develop alternative compact representations of large sparse matrices, while facilitating, if possible, matrix operations.
In this work, we propose two grammar based methods to compactly represent a sparse binary matrix with the capability of random accessing an element in the matrix. The first approach combines dimension coding (proposed by Yang) with one of raster scan or Hilbert scan, where the so-called directionless grammar is applied. With the power of scanning, dimension coding’s capability of representing 1-D sparse signals can be extended to 2-D sparse matrices. This approach inherits the random accessibility of dimension coding. In the second approach, we will introduce a new concept called Context-free Bipartite Grammar (CFBG) and present a framework wherein large sparse binary matrices can be represented by CFBG. Similar to the traditional concept of Context-free Grammar (CFG), a CFBG consists of a set of production rules. Unlike CFGs, however, the right member of each production rule in a CFBG is a labeled bipartite graph with each edge labeled either as a variable or terminal symbol. As the right hand side of a production rule is an ordered edge set, CFBG is also directionless. Two bipartite grammar transforms, a Sequential D-Neighborhood Pairing Transform (SNPT) and an Iterative Pairing Transform (IPT), are further presented to convert any binary matrix into a CFBG representing it.
Experiments show that compared with popular sparse matrix storage methods such as compressed row storage and quadtree, grammar-based sparse binary matrix representations can reduce the storage requirement of sparse matrices significantly (by a factor of as much as 70).
200 University Avenue West
Waterloo, ON N2L 3G1