Information-Theoretic Foundations of Deep Learning | MultiCom Research Group

One central research direction applies information-theoretic tools to analyze and improve deep learning.

Deep neural networks can be viewed as nonlinear information-processing systems that transform inputs into internal representations and predictions. This perspective enables the use of information-theoretic tools to analyze how learning systems extract, concentrate, and transmit useful information.

A central concept in this work is conditional mutual information (CMI), which provides a principled way to characterize the information concentration within each class and the separation between classes achieved by a deep neural network. CMI serves as a unifying instrument across our research: it measures the quality of a model's internal representations, governs its susceptibility to knowledge distillation, and connects classification performance to the geometry of the learned output distribution.

Building on this perspective, our research has developed several algorithmic frameworks, including

conditional mutual information constrained deep learning,

conditional mutual information minimized learning, and

information-theoretic knowledge distillation.

These approaches improve accuracy, robustness, and model security — including resistance to adversarial attacks and protection of model intellectual property — while providing deeper theoretical insight into how modern learning systems process and represent information.

A broader long-term goal is to develop information-theoretic principles for understanding model complexity, including the embedding dimension of large language models. Embedding dimension is a key factor controlling both the computational cost and the representational power of modern AI systems, yet it is currently chosen largely through empirical trial and error. Bringing information-theoretic reasoning to bear on this question is one of the lab's ongoing directions.