Waterloo economics series | 2024

#24-001 --Helen Chen, Maura R. Grossman, Anindya Sen, Shu-Feng Tsao

Establishing a FAIR, CARE, and Efficient Synthetic Health Data Sharing Ecosystem for Canada

Abstract

Obtaining access to real-world health data is a significant challenge, mainly due to privacy and security
implications. Consequently, researchers and technology innovators ̶ particularly those operating in the
health data science and AI technology development spaces – increasingly resort to synthetic health data
to bridge the data gap. High-quality synthetic data has the potential to expedite research and
development of novel technologies. However, synthetic health datasets in Canada are scarce, and no
existing synthetic health datasets conform to the Findable, Accessible, Interoperable, and Reusable (FAIR)
standards. Moreover, while federated machine learning offers the advantage of protecting patient privacy
by not requiring the exchange of source data across nodes, it has yet to be optimized in Canada’s health
research environment, and there is limited use of federated learning with synthetic health data. This paper
explores the ethical considerations and value proposition of generating and sharing synthetic health data.
Our goal is to facilitate the development of a reliable and sustainable synthetic data infrastructure that
supports the ethical, responsible, and efficient use of synthetic health data. An important contribution of
this research is the establishment of a framework that balances the social benefits of innovation from
data sharing with the social costs that occur when individual privacy is compromised. The use of synthetic
data significantly reduces the potential for individual harm and is a cost-effective means to lower datasharing
costs. We believe that this framework will pave the way for a more robust and secure synthetic
data ecosystem, enabling the generation of valuable insights that can drive positive health outcomes for
Canadians.