Researchers have developed a system that allows data owners to regulate how much of their privacy may be breached when personal information is being analyzed.
The novel system, APEx, also lessens the burden on data scientists who traditionally have had to compromise the accuracy of their analysis in order to give their clients certain privacy guarantees.
APEx translates data scientists’ queries and accuracy boundaries into appropriate private mechanisms that satisfy differential privacy – a rigorous mathematical definition of privacy which by looking at the output it cannot be determined whether any individual's data was included in the original dataset. The mechanism then incurs the least privacy leakage possible and returns a noisy answer to the data scientist who would have specified beforehand their accuracy guarantee.
“While general purpose differentially private query answering systems exist, they are not really meant to support interactive querying, and they fall short in two key aspects,” said Chang Ge, a PhD candidate in Waterloo’s David R. Cheriton School of Computer Science.
“In order to achieve high accuracy, the analyst has to be familiar with the privacy literature to understand how the system adds noise and to identify if the desired results can be achieved in the first place. And somewhat ironically, these systems do not provide any guarantees to the data analyst on the quality they really care about, namely correctness of query answers.”
APEx solves these two issues by choosing the suitable differentially private mechanism with the least privacy loss that answers an input query under a specified accuracy guarantee. Data analysts can then reliably explore data while ensuring a provable guarantee of privacy to data owners.
In developing APEx, Ge, professor Ihab Ilyas and assistant professor Xi He of Waterloo’s Cheriton School of Computer Science, and Duke University’s associate professor Ashwin Machanavajjhala conducted a comprehensive empirical evaluation on real datasets with benchmark queries and a case study on entity resolution. They found that APEx can answer a variety of queries accurately with moderate to small privacy loss, and can support data exploration for entity resolution with high accuracy under reasonable privacy settings.
“This system could help prevent future data breaches if policymakers were to pass legislation that would require APEx to be implemented by companies,” said Ge. “The policymaker will determine the privacy budget for a particular dataset. Once this is determined you can just leave the rest to APEx and customers could, in turn, be more confident that their data is protected.”
A paper detailing the new system titled “APEx: Accuracy-Aware Differentially Private Data Exploration”, authored by Ge, Ilyas and He of Waterloo’s Faculty of Mathematics and Duke University’s Machanavajjhala is slated to be presented at the 2019 SIGMOD conference in June.