Statistics and Biostatistics seminar series
Elena
Tuzhilina Room: M3 3127 |
Statistical curve models for inferring 3D chromatin architecture
Reconstruction of the chromatin three-dimensional spatial structure is a critical task in modern computational biology, as chromatin folding plays a crucial role in vital cellular processes, such as transcription and DNA repair. While it is impossible to observe the chromatin architecture directly inside the cell, some modern molecular biology methods are able to extract partial information on the chromatin geometry. In particular, chromatin conformation capture assays, such as 3C and Hi-C, enable the computing of so-called contact matrices that represent the frequency of contacts between each pair of genomic loci.
The central goal of this study is to infer the chromatin 3D structure, i.e. genomic loci spatial coordinates, from a contact matrix. It is common to use the following heuristic to link the contact counts and the chromatin architecture: genomic loci that are closely adjacent in the three-dimensional space should have a high contact frequency.
The majority of existing methods operating on contact matrices are based on multidimensional scaling and produce reconstructed 3D configurations in the form of a polygonal chain. No strategy, however, takes advantage of the fact that the target solution should be a smooth curve. The smoothness attribute is either ignored or indirectly addressed by introducing highly non-convex penalties in the model. This typically leads to increased computational complexity and instability of the reconstruction algorithm.
In this work, we develop a novel technique that models chromatin directly by a smooth curve. The baseline method, which we call principal curve metric scaling (PCMS), combines the advantages of multidimensional scaling and smoothness penalties while being computationally efficient. We subsequently use PCMS as a building block to create more complex distribution-based models for contact matrices. In particular, we propose the Poisson metric scaling (PoisMS) technique that assumes the Poisson distribution for the contact counts. The performance of the methods is illustrated on real Hi-C data computed for chromosome 20 and evaluated by means of orthogonal multiplex FISH imaging.