MASc Seminar Notice: "On the Design of Efficient Deep Learning Methods for Human Activity Recognition in Resource Constrained Devices" by Sheikh Nooruddin

Wednesday, March 22, 2023 10:00 am - 10:00 am EDT (GMT -04:00)

Name: Sheikh Nooruddin

Date: Wednesday, March 22nd 2023

Time: 10AM - 11AM

Location: online

Supervisor(s): Fakhri Karray, Ali Elkamel, Mark Crowley

Title: On the Design of Efficient Deep Learning Methods for Human Activity Recognition in Resource Constrained Devices

Abstract:

Human Activity Recognition (HAR) is the process of automatic recognition of Activities of Daily Life (ADL) from human motion data captured in various data modalities by wearable and ambient sensors. Advances in deep learning, especially Convolutional Neural Networks (CNN) have revolutionized intelligent frameworks such as HAR systems by effectively and efficiently inferring human activity from various modalities of data. However, the training and inference of CNNs are often resource-intensive. Recent research developments are focused on bringing the effectiveness of CNNs in resource constrained edge devices through Tiny Machine Learning (TinyML). TinyML aims to optimize these models in terms of compute and memory requirements - aiming to make them suitable for always-on resource constrained devices - leading to a reduction in communication latency and network traffic for HAR frameworks. In this thesis, at first, we provide a benchmark to understand these trade-offs among variations of CNN network architectures, different training methodologies, and different modalities of data in the context of HAR, TinyML, and edge devices. We tested and reported the performance of CNN and Depthwise Separable Convolutional Neural Network (DSCNN) models as well as two training methodologies: Quantization Aware Training (QAT) and Post Training Quantization (PTQ) on five commonly used benchmark datasets containing image and time-series data: UP-Fall, Fall Detection Dataset (FDD), PAMAP2, UCI-HAR, and WISDM. We also deployed and tested the performance of the model-based standalone applications on multiple commonly available resource constrained edge devices in terms of inference time and power consumption. Later, we focus on HAR from video data sources. We proposed a two-stream multi-resolution fusion architecture for HAR from video data modality. The context stream takes a resized image as input and the fovea stream takes the cropped center portion of the resized image as input, reducing the overall dimensionality. Due to camera bias, objects of interest are often situated in the center of the frame. We tested two quantization methods: PTQ and QAT to optimize these models for deployment in edge devices and tested the performance in two challenging video datasets: KTH and UCF11. We performed ablation studies to validate the two-stream model performance. We deployed the proposed architecture in commercial resource constrained devices and monitored their performance in terms of inference latency and power consumption. The results indicate that the proposed architecture clearly outperforms other relevant single-stream models tested in this work in terms of accuracy, precision, recall, and F1 score while also reducing the overall model size. The experimental results in this thesis demonstrate the effectiveness and feasibility of TinyML for HAR from multimodal data sources in edge devices