PhD Seminar Notice - Information-Theoretic Paradigms in Deep Learning

Friday, December 8, 2023 1:00 pm - 2:00 pm EST (GMT -05:00)

Candidate: Shayan Mohajer Hamidi
Date: December 8, 2023
Time: 1:00 PM - 2:00 PM
Place: Remote Attendance
Supervisor(s): Yang, En-Hui

Abstract:

In this seminar, delve into the integration of information-theoretic concepts into the inner workings of deep learning (DL). Our focus lies in compressing the model size of deep neural networks (DNNs) and enhancing their performance.

Compression of DNNs: The success of deep learning (DL) is achieved at the expense of large model sizes and high computation complexity of trained deep neural networks (DNNs). To significantly compress model sizes and reduce computation complexity, we introduce information-theoretic coding ideas into the inner workings of DL, yielding a new form of DL dubbed coded deep learning (CDL). In CDL, (i) a DNN is referred to as a coded DNN; (ii) the parameter weights of a coded DNN are constrained to take values over a structured discrete space; (iii) a compression algorithm is applied to encode the discrete weights; and (iv) training a coded DNN is equivalent to solving a minimization problem over the structured discrete space, the objective function of which is a linear combination of the loss function used in DL and the model size of the coded DNN in terms of the compression rate of its weights.

Improving the DNNs’ performance: The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification DNN in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard DL framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL).  A novel alternating learning algorithm is proposed to solve such a constrained optimization problem.