Please note: This PhD defence will be given online.
Masoumeh
Shafieinejad, PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Florian Kerschbaum
“Big data” applications are collecting data from various aspects of our lives more and more every day. This fast transition has surpassed the development pace of data protection techniques, and has resulted in innumerable data breaches and privacy violations. To prevent that, it is important to ensure the data is protected while at rest, in transit, in use, as well as during computation or dispersal. We investigate data protection issues in big data analysis. We address a security or privacy problem in each phase in the data science pipeline. These phases are: i) data cleaning and preparation, ii) data management, iii) data modelling and analysis, and iv) data dissemination and visualization. In each of these phases, we either address an existing problem and propose a resolving design, or evaluate a current solution for a problem and analyze whether it meets the expected security/privacy goal.
Starting with privacy in data preparation, we investigate providing privacy in query analysis leveraging differential privacy techniques. We initiate this investigation from contextual outlier analysis, as a challenging query that require releasing direct information about members of the dataset. Our second contribution is in the data modelling and analysis phase. We investigate the effect of data properties and application requirements on the successful implementation of privacy techniques. We in particular shed some light on the effects of data correlation on data protection guarantees of differential privacy. Our third contribution lies in the data management phase. We probe into the problem of efficiently protecting the data that is outsourced to a database management system (DBMS) provider for storage and join operation purposes. We provide an encryption method to minimize the leakage, and to guarantee security for the data efficiently. Our last contribution is in the data dissemination phase. We inspect the ownership/contract protection for the prediction models trained on the data. We initiate this inspection with evaluating the backdoor-based watermarking in deep neural networks, to focus on an important and recent line of the work in model ownership/contract protection.
To join this PhD defence on Zoom, please go to https://us02web.zoom.us/j/86777160539?pwd=RWlBd1pMd1VHdCtma3BVMFJJcVRLQT09.