Chang
Ge,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Data profiling is an important task to understand data semantics and is an essential pre-processing step in many tools. Due to privacy constraints, data is often partitioned into silos, with different access control. Discovering functional dependencies (FDs) usually requires access to all data partitions to find constraints that hold on the whole dataset. Simply applying general secure multi-party computation protocols incurs high computation and communication cost.
In this work, we formulates the FD discovery problem in the secure multi-party scenario. We propose secure constructions for validating candidate FDs, and present efficient cryptographic protocols to discover FDs over distributed partitions. Experimental results show that our solution is practically efficient over non-secure distributed FD discovery, and can significantly outperform general purpose multi-party computation frameworks. To the best of our knowledge, our work is the first to tackle this problem.
This work is to appear in VLDB 2020.