Candidate: Yanbing Jiang
Title: New Convolutional Neural NetworkTopology with Compressed Information to Enhance Accuracy for Image Classification Task
Date: September 18, 2019
Place: EIT 3145
Supervisor(s): Yang, En-Hui
Source coding and deep learning are two major branches in the field of information processing. However, source coding itself makes great contributions to the field of deep learning. Source coding encodes information that can be summarised with patterns into certain representation without semantic consideration. On the other hand, deep learning utilizes multi-layers of representations with increasing levels of abstraction to learn the patterns that cannot be summarised easily. The key that makes deep learning successful is the inclusion of cascading non-linear layers that help the network to abstract multi-level features.
Source coding, such as image compression, contains fundamental non-linear operations such as quantization and rounding. How the non-linearity from the compression could further help deep learning is the inspiration of this research even though common sense tells us that compression usually results in a worse ability to recognition.
Image classification is one of the most popular tasks in the field of deep learning. Based on human vision’s perception to classify object(s) in images, when the images are compressed, such as by JPEG, the human’s recognition ability would become worse while it is not usually the case in machine's perspective. To make the accuracy performance better, input processing was focused in this thesis and we proposed a new Convolutional Neural Network (CNN) topology, which absorbs original input along with its various compressed versions. Since it is obvious that JPEG-type image compression is friendly for human when the images are compressed with higher quality, what level of the compressed image is machine friendly is unknown. This type of topology facilitates the compressed information across the compression inputs range from the best to the worst and lets the machine to learn from all potential compressed information by itself. We trained the model with proposed block-by-block training method and were able to increase the accuracy of state-of-art CNN for image classification: 0.374% increase in Top-1 accuracy, 0.304% increase in Top-5 accuracy in terms of Inception V3 model and 0.39% increase in Top-1 accuracy and 0.228% increase in Top-5 accuracy in terms of ResNet-50 V2 model.
Based on the results and observations from the experiment, certain insights into why our new designed topology with compressed inputs could aid the task of image classification are illustrated. Compression may be helpful since it can help highlighting objects and discard interference information for the machine. Furthermore, we believed if this technique applied to the state of art EfficientNet (published May 2019), the performance could be even better.
200 University Avenue West
Kitchener, ON N2L 3G1