Priyank
Jaini,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
At their core, many unsupervised learning models provide a compact representation of homogeneous density mixtures, but their similarities and differences are not always clearly understood. In this work, we formally establish the relationships among latent tree graphical models (including special cases such as hidden Markov models and tensorial mixture models), hierarchical tensor formats and sum-product networks. Based on this connection, we then give a unified treatment of exponential separation in exact representation size between deep mixture architectures and shallow ones. In contrast, for approximate representation, we show that the conditional gradient algorithm can approximate any homogeneous mixture within epsilon accuracy by combining O(1/ epsilon^2) "shallow" architectures, where the hidden constant may decrease (exponentially) with respect to the depth. Our experiments on both synthetic and real datasets confirm the benefits of depth in density estimation.
This is a joint work with Pascal Poupart and Yaoliang Yu.