Thursday, May 11, 2023 11:00 am
-
11:45 am
EDT (GMT -04:00)
Abstract
Deep neural networks have revolutionized various domains, including natural language processing, image and video processing, and robotics. However, their high computational cost and the rise of unsupervised pretraining for larger networks have made it challenging to run them in compute-constrained environments, such as edge devices. Consequently, size reduction and improving inference latency are significant focuses in neural network research.This work aims to enhance deep neural network efficiency in terms of inference latency, model size, and latent representation size by investigating redundant representations in neural networks. We explore this across text classification, image classification, and generative models, hypothesizing that current networks contain representational redundancy that, if removed, could improve their efficiency.
For image classification, we hypothesize that convolution kernels contain redundancy, and test this by introducing additional weight sharing, preserving or increasing classification performance while requiring fewer parameters. We demonstrate the benefits on CIFAR and Imagenet datasets and various models.
In generative models, we show that it's possible to reduce the latent representation size while preserving generated image quality through unsupervised disentanglement of shape and orientation. We introduce the affine variational autoencoder and demonstrate its effectiveness on generating 2D images and 3D voxel representations of objects.
Lastly, we address the transformer model's mismatch between pretraining tasks and downstream tasks by creating task-specific networks using neural architecture search and learned downsampling. These networks achieve superior performance in terms of inference latency and accuracy tradeoff without requiring additional pretraining.
Presenter
Rene Bidart, PhD candidate in Systems Design Engineering