Written by: E. George McCutcheon
In the academic world, it is often easy to overlook the holistic aspects of the research created and the researchers themselves. One’s citation list and CV is perhaps the most common lens through which researchers are viewed, as the pressure to publish is always very high and funding is rarely endless. The reality is that every researcher creates far more than these two metrics might suggest and Xi He is a prime example of this. As a University of Waterloo researcher and CPI member, Xi is always cognizant of the real-world impacts of her research, as well as the influence she exerts in the developing journeys of the students that she supervises.
During our chat for this article, she repeatedly referenced the need to protect the individual rights and privacy of every person that engages with the technology we are surrounded by in the modern world. In reality, we may be given the choice of what we consent to in terms of privacy and our data usage for certain things, but that choice is not all-encompassing and most of society has a minimal understanding of how their data and lives are being tracked and potentially vulnerable to misuse. Xi He knows that her research in differential privacy can help improve the privacy, anonymity, and security of myriad complex systems, such as databases, while maintaining the ability of those systems to effectively fulfill their purpose. In doing so, she is protecting the countless individuals that these systems impact, helping to ensure that when consent is given for data to be collected, that it is as secure as possible.
Xi and her graduate student Shufan Zhang collaborated on a paper titled "DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance", which introduces the DProvDB framework. This framework offers a fine-grained privacy provenance approach for scenarios involving multiple data analysts, enabling the tracking of privacy loss attributed to each individual analyst. The paper's proposal has sparked interest and discussion within the academic community, prompting responses that reflect a growing recognition of the significance of addressing privacy concerns in multi-analyst scenarios. In short, it further refines the ability of a differentially private system to efficiently allocate resources and trust, thereby making the system more secure and effective at its role.
Xi delves into the concept of ‘privacy budget’ as a metric for measuring the extent of privacy loss associated with sensitive data disclosure to the public. Traditionally, privacy budgets have been aggregated across all data analyses, leading to rapid depletion as more queries are executed on the sensitive data. To address this issue, a more nuanced and fine-grained framework has been developed to cater to scenarios where multiple data analysts collaborate, allowing for increased analytical capabilities while maintaining data privacy. Ongoing efforts are focused on refining the framework by expanding its applicability to complex datasets beyond single tables, incorporating joint queries, and enhancing algorithms to strike a better balance between privacy preservation and data utility. The research team remains committed to further developing the framework to optimize its performance and broaden its scope of application in data analysis and privacy protection, as Xi stresses that they want to keep improving the algorithm so they can get a better trade off between privacy and utility.
It's not like we're likely to have a solution that is 100% private and 100% secure, that's never going to happen.
When asked about her mention of the intricate balance between privacy and security in cybersecurity discussions, she acknowledges the impossibility of achieving both 100% privacy and 100% security simultaneously. Essentially, you cannot have a system that is completely secure without significantly impacting its ability to fulfill its role. Xi highlights the trade-offs in differential privacy, emphasizing the consideration of privacy, utility, and security, particularly in scenarios involving untrusted servers. She also mentions ongoing research efforts including generating synthetic data without relying on trusted servers and implementing secure computations or homomorphic encryptions to ensure privacy and security in such environments. Overall, the conversation underscores the dynamic nature of the privacy-security discourse and the necessity for innovative approaches to address these challenges in modern cybersecurity landscapes. Xi stresses that this is a problem every researcher faces and that working together to create any improvements will incrementally move us towards increasingly more secure and effective systems.
Xi discusses the landscape and importance of privacy-preserving solutions for data-driven companies, highlighting the challenges smaller companies face in affording dedicated privacy teams to develop such solutions. Professor He suggests the need for a generalized privacy-preserving system, analogous to widely available database systems like Oracle or Microsoft SQL, which can be easily incorporated by companies or even outsourced. The goal is to provide a solution that can be tailored to specific requirements regarding accuracy, performance, and privacy needs.
Professor He emphasizes that Cape the Cost Aware Privacy Engine aims to significantly reduce the overhead required to address these privacy needs. However, she acknowledges the potential barriers to mass adoption including concerns about profitability and the need for industry-facing informational outreach. Xi suggests that while addressing privacy needs is crucial, companies may not have allocated budgets for it, but if there is an affordable solution readily available that integrates with their existing systems, they might consider it.
Xi cites examples from larger companies like Apple and Google, which have dedicated privacy research teams, but notes that smaller companies may not have the resources for such endeavours. Professor He sees potential in the model of integrating privacy-preserving solutions into existing database systems without incurring additional costs, thus meeting privacy requirements without incurring significant financial burden. Xi also mentions other database companies exploring similar solutions, though she recognizes that some customers may not fully grasp the critical importance of privacy solutions due to the lack of direct benefits in terms of immediate profit. However, she believes that as more companies adopt such solutions, the importance of privacy preservation will become more widely recognized across industries.
As an educator, Professor He acknowledges the influence that she has in the shaping of young minds and futures; ever mindful that the support she provides them will have significant ripple effects on their lives and that the research they may create will have even farther-reaching impacts. She emphasizes the importance of guiding and inspiring the next generation of researchers, particularly undergraduate students, and highlights the significance of incorporating research components into teaching to make the university experience more enriching and impactful for students. By introducing research opportunities early on and fostering a culture of sharing research experiences through seminars, graduate courses, and research presentations, students can be encouraged to explore academic research and consider it as a viable career path.
Xi also emphasizes the value of collaboration and exposure to diverse research work within the university setting. Through interactions with students from various research labs and participation in activities like lunch sessions where research presentations are shared, students can broaden their understanding of different research areas and potentially collaborate on projects that lead to significant publications. By creating more opportunities for students to engage in research activities and discussions, universities can enhance the overall academic experience and cultivate a culture of curiosity and exploration among students at all levels of study.
Professor He’s students are actively engaged in a range of projects that center around the themes of privacy and security. One notable project involves developing a theoretical framework for effectively managing diverse cryptographic techniques within single database systems. The primary aim of this project is to assess the feasibility of achieving specific privacy and security assurances when employing less robust encryption methods. By investigating the efficacy and safety of these approaches, the team seeks to enhance data protection measures within database systems and advance the understanding of cryptographic protocols in real-world applications.
Further, another student is dedicated to exploring the realms of private data cleaning and the generation of privacy-preserving datasets. This research area delves into techniques and methodologies for sanitizing sensitive data while preserving the privacy of individuals and ensuring data integrity. In a parallel effort, a separate student is focusing on the deployment of models with differential privacy guarantees. The primary objective of this project is to strike a harmonious balance between maintaining data privacy and maximizing utility throughout the entirety of the machine learning pipeline. By scrutinizing the trade-offs between privacy and functionality at each stage of the process, the team aims to develop best practices for implementing differential privacy mechanisms effectively in machine learning workflows, echoing the concerns from earlier in this article. These diverse projects collectively underscore the students' commitment to addressing pressing privacy concerns and propelling advancements in data security practices within the academic research domain.
Xi He is a researcher and a professor who is mindful of the far-reaching aspects of both roles, which not only informs her efforts but spurs them onward to achieve the best outcomes for all those who will be affected by them, a mindset that should be admired as much as her many achievements.
That's why I do it. Teaching is not just part of my job; I find if I teach the right course, I can make really great impacts on how research can be something interesting for students to pursue in the future, rather than just getting an industry job.