PhD Seminar: Perceptual Quality Assessment of Compressed Videos

Wednesday, November 21, 2018 3:00 pm - 3:00 pm EST (GMT -05:00)

Candidate: Wentao Liu

Title: Perceptual Quality Assessment of Compressed Videos

Date: November 21, 2018

Time: 3:00 PM

Place: EIT 3145

Supervisor(s): Wang, Zhou

Abstract:

Video quality assessment (VQA) aims to predict perceptual quality of a video, and is a fundamental problem in many video processing tasks, such as video compression, denoising, superresolution etc. Existing VQA methods can be classified into full-reference (FR-VQA), reduced-reference (RR-VQA) and blind VQA (BVQA) based on the accessibility of the corresponding pristine reference when estimating a video’s quality. Compared to FR-VQA and RR-VQA which require all or part of the information from reference videos, BVQA is highly desirable when the reference video is not available, not of pristine quality, or temporally misaligned with the test video. Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space that predicts the perceptual quality of the video. Unlike the traditional BVQA methods, we propose a Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages into one, where the feature extractor and the regressor are jointly optimized. Our model uses a multi-task DNN framework that not only estimates the perceptual quality of the test video but also provides a probabilistic prediction of its codec type. This framework allows us to train the network with two complementary sets of labels, both of which can be obtained at low cost. The training process is composed of two steps. In the first step, early convolutional layers are pre-trained to extract spatiotemporal quality-related features with the codec classification subtask. In the second step, initialized with the pre-trained feature extractor, the whole network is jointly optimized with the two subtasks together. An additional critical step is the adoption of 3D convolutional layers, which creates novel spatiotemporal features that lead to a significant performance boost. Experimental results show that the proposed model clearly outperforms state-of-the-art BVQA methods.