Research data risk classification framework and guidelines

Developed by: Andriana Vanezi, Information Security Services, IST

Preamble by: Ian Milligan, Associate Vice-president, Research Oversight and Analysis

Last updated: May 20, 2025
 

Preamble

The University of Waterloo, through its Research Data Management (RDM) Institutional Strategy, aims to support research excellence through the provision of excellent RDM services, tools, and supports. In Canada, the Tri-Agencies argue that research data collected using public funds should be responsibly and securely managed and be—where ethical, legal, and commercial obligations allow—available for reuse by others. To this end, the agencies support the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles for research data management and stewardship, when appropriate. 

Data management, the storage, access, and preservation of data produced from a given research project, is thus a critical component of research activities. Data management practices cover the entire lifecycle of the data, from planning the investigation to conducting it, and from backing up data as it is created and used to long-term preservation of data deliverables after the research investigation has concluded. 

In the RDM strategy, the University notes that this strategy was “relevant to all research utilizing and producing research data in all forms (including, but not limited to, digital, analogue, paper, and physical materials)—whether funded or unfunded, published or unpublished, open or restricted.” 

Researchers may have questions about whether their data are “open or restricted,” and how they should responsibly steward this information. Canadian research funders, and their institution, want to help researchers share their data (where appropriate) for the advancement of science; we hope this guide helps researchers navigate this landscape. 

Purpose

This page provides University of Waterloo (UW) researchers with a standardized framework for classifying research data.  By standardizing the classification of research data, a mutual understanding of the associated risk levels is established. This shared framework facilitates effective communication and collaboration among impacted/interested parties, with an institutionally accepted classification. Researchers can identify and engage expertise who can provide meaningful and appropriate support tailored to the specific needs associated with the data risk classification. This collaborative and standardized approach enhances the overall data governance structure, promoting a cohesive effort in safeguarding research data and maintaining compliance with regulatory standards and contractual obligations to research sponsors, when applicable.

While the University of Waterloo’s Policy 46 Information Confidentiality Classification is aligned with the Freedom of Information and Protection of Privacy Act (FIPPA), to protect data used for teaching, learning, and research administration, it was not designed to classify research data. However, the research data risk classification will align with the confidentiality classification outlined in Policy 46, allowing the proposed security controls to be effectively applied across both classifications. See Guidance on Information Confidentiality Classification (Policy 46) | Information Systems & Technology | University of Waterloo. For more details on what constitutes research data, including its definition and examples, see Research Data and Information Not Considered Research Data.

Information Systems & Technology/Information Security Services (ISS/IST) in consultation with Research Data Management Institutional Strategy Working Group, has developed a six (6) step framework to provide researchers with a structured approach to classifying their research data by assessing the potential harm that could arise from compromises. Please see the full document:

Research data risk classification

The Research Data Classification System is based on two main factors: the type of data and the potential gravity of harm that could arise from a compromise. 

In cases of ambiguity in research data classification, the higher risk category should be applied. If uncertainty persists, please contact Information Security Services for guidance. For assistance with classifying human participant data, please contact the Office of Research Ethics.

Low

Research data are open or publicly available, a compromise to the integrity or availability of data would cause minimal harm to impacted/interested parties.

Examples (not exhaustive):

TYPE OF DATA

POTENTIAL HARM - MILD

  • Open data-available in an open repository
  • Public data sources (generally unstructured)
  • Anonymized information – the information is irrevocably stripped of direct identifiers.
  • Anonymous information – the information never had identifiers associated.
  • Open-source software source code.
  • Data not subject to agreement/contracts, sovereignty, regulations, or compliance standards
  • Non-research data classified as Public as per Policy 46
  • Temporary unavailability – inconvenience
  • Data corruption causing minor delays and/or small amount of rework
  • Minor reputational impact
  • No or very remote chance of data subject re-identification

Medium

Research data are typically confidential, even if they are not specifically potentially governed by domestic or foreign laws or industry regulations; any compromise to the confidentiality, integrity or availability could lead to mild-to-moderate harm to impacted/interested parties.


Examples (not exhaustive):

TYPE OF DATA

POTENTIAL HARM - MILD TO MODERATE

  • Research data classified as confidential by external entities (i.e., funding agencies, corporate sponsors), agreements or contracts such as an NDA if higher-risk categories are not applicable
  • Data that is an active research stage and not yet ready for publication or sharing
  • Unpublished software source code
  • Aggregate data (summary form)
  • Non-research data classified as Confidential as per Policy 46
  • Data is unavailable causing delays in research that impact timelines
  • Data corruption causing moderate delays and/or moderate amount of rework
  • Legal consequences, injunctions, fines, or penalties
  • Harm to relationships
  • Moderate reputational damage
  • Loss of competitive advantage

High

Research data are confidential and/or potentially governed by domestic or foreign laws or industry regulations; a compromise to the confidentiality, integrity or availability of data would cause significant harm to impacted/interested parties

Examples (not exhaustive):

TYPE OF DATA

POTENTIAL HARM – SIGNIFICANT

  • Data is unavailable causing major delays in research that impact timelines
  • Data corruption causing delays
  • Significant harm to study participants – discrimination, stigmatization, reputation, psychological harm, loss of autonomy
  • Ethics review and approval questioned
  • Mandatory reporting and public disclosure
  • Fines, legal liability, and compliance issues
  • Financial costs associated to a breach
  • Loss of potential future funding
  • Significant researcher and/or university reputational damage
  • University registration with the Contract Security Program or the Controlled Goods Program revoked
  • University disbarment from engaging in research involving protected government assets or information and technology subject to Canadian or U.S. export control regulations

Very High

Research data are confidential, and/or potentially governed by domestic or foreign laws or industry regulations, or Indigenous data sovereignty, and any compromise to the confidentiality, integrity or availability of data could cause serious harm to impacted/interested parties.

Examples (not exhaustive):

TYPE OF DATA

POTENTIAL HARM - SERIOUS

  • Human participant datasets that include:
    • Personal Health Information (PHI) with identifiers (direct or indirect)
    • Personal Information that contains high-risk identifiers that can be used to perpetrate identity theft
  • Severe researcher and university reputational damage
  • Data is unavailable causing serious delays in research that would seriously impact timelines
  • Data corruption would cause serious delays
  • Ethics review and approval questioned
  • Mandatory reporting and public disclosure
  • Harm to study participants – identity theft, discrimination, stigmatization, reputation, psychological harm, loss of autonomy
  • Substantial financial costs associated to a breach and possible fines
  • Harm to Indigenous communities if the data are misappropriated or misused 
  • Damage to the University’s relationship-building with Indigenous communities and risks reputational damage re: breaking commitments to reconciliation, decolonization, and Indigenization
  • Future funding limitations to the university
  • University registration with the Contract Security Program or the Controlled Goods Program revoked
  • Fines to the individual or University from $25,000 to $2,000,000 per day of non-compliance.
  • Imprisonment for up to 10 years
  • Criminal and administrative penalties may also apply for violations of U.S. export laws and regulations
  • University disbarment from engaging in research involving protected government assets or information and technology subject to Canadian or U.S. export control regulations
  • Threaten national security or public Safety
  • Ethical violations affecting national security

Roles and responsibilities

Policy 46 – Information Management outlines distinct roles and responsibilities for research team members. Under this policy, the Principal Investigator or Faculty Supervisor for research projects is the information steward for the research data while other research team members for the research project are information custodians. There may be times when this approach is different, for example, in the context of when researchers are engaging with Indigenous communities.

Engage expertise

Identify individuals or teams within the University of Waterloo who can provide guidance on various aspects of the research project regarding data management (e.g., IT specialists, legal advisors, data management experts).

Topic

UW Contact & Resources

External Resources (non-exhaustive)

Sensitive research areas identified on Annex A of the National Security Guidelines for Research Partnerships

  • Regulated Research Areas – Export Controls, Controlled Goods
  • Sensitive Dual Use Technology
  • Big Data/Large Data Sets
  • Critical Infrastructure
  • Critical Minerals

safeguardingresearch@uwaterloo.ca

Safeguarding Research Team

Safeguarding Research

National Security Guidelines for Research Partnerships

The National Security Guidelines for Research Partnerships’ Risk Assessment Form

Sensitive research areas

National Strategy for Critical Infrastructure

The Canadian Critical Minerals Strategy

A Guide to Canada's Export Control List

The Export and Brokering Controls Handbook 

Research Security

Safeguarding Your Research

Research involving

Human Participants including Indigenous

Research subject to

Research Ethics Board (REB) review or an external REB review.

researchethics@uwaterloo.ca

Research Ethics Team

Office of Research Ethics

Guideline on securing human research participant information and data

Privacy and security research risk assessment tool 

Research with Indigenous Peoples 

Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans, 2nd edition (TCPS2).

Research Partnerships

International research collaborations and sponsors

Safeguarding Research Team

Connect with the Corporate Research Partnership Team

Research Partnerships Contracts Team

Non-profit/public sector research partnerships

Senior Manager, Knowledge Mobilization and Partnerships

National Security Guidelines for Research Partnerships

Indigenous Data

Senior Manager, Indigenous Research

Resources and Guides for Indigenous Research

The First Nations Information Governance Centre: Home  

First Nations Principles of OCAP®  

CARE Principles  

Principles of Ethical Métis Research  

National Inuit Strategy on Research  

United Nations Declaration on the Rights of Indigenous Peoples  

U.S. Indigenous Data Sovereignty Network  

Research Data Management

research data management services (RDMS) at the University of Waterloo Library

Digital Research Alliance of Canada

Security Controls

Information Security Services (ISS)

Research Project Cybersecurity Planning

Research Computing Services Directory

Security policies, standards, and guidelines

IST (Information Systems & Technology) Service Catalogue: Security

Information security for research

Guidelines for secure data exchange

Canadian Centre for Cyber Security: Top 10 IT security actions

The top 18 CIS Critical Security Controls

NIST SP 800-171 - Protecting Controlled Unclassified Information in Nonfederal Systems and Organizations

NIST SP 800-53 - Security and Privacy Controls for Information Systems and Organizations

Threats & Exposures

Information Security Services (ISS)

Research Computing Services Directory

Safeguarding Research Team

Cyber Awareness training

Cyber Awareness website

Canadian Centre for Cyber Security

How to protect your organization from insider threats

Who are you at risk from?

Protect your research - Ontario

National Cyber Threat Assessments

Research Computing

Research computing infrastructure

Research Computing Services Directory

Digital Research Alliance of Canada: Advanced Research Computing

Faculty Computing  

Data definitions

Research data

The research data risk classification applies to all research data used/created while conducting research under the auspices of University of Waterloo. The data may be or has been collected and/or stored in paper or electronic form. This could include mobile devices, personal computers, portable media, and online storage. These can be privately- or University-owned and located on or off University premises.

The term "research data" is defined according to the definition adopted by the Tri-Agency Research Data Management Policy and the Digital Research Alliance of Canada:

Data that are used as primary sources to support technical or scientific inquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data.

Examples of research data (non-exhaustive)

  • Human participant data: Data from or about humans that are collected, obtained, and/or used as part of the research processes and outputs and/or used to answer the research question(s). Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans, 2nd edition (TCPS2) Chapter 5: Privacy and Confidentiality outlines the following categories and provide guidance for assessing the extent to which information could be used to identify an individual:
    • Directly identifying information – the information identifies a specific individual through direct identifiers
    • Indirectly identifying information – the information can reasonably be expected to identify an individual through a combination of indirect identifiers
    • Coded information – direct identifiers are removed from the information and replaced with a code.
    • Anonymized information – the information is irrevocably stripped of direct identifiers,
    • Anonymous information – the information never had identifiers associated

For more information about Human Participant Data and ethical considerations in being respectful stewards of their data, please review the Guideline on securing human research participant information and data and contact the Research Ethics team (researchethics@uwaterloo.ca). 

  • Indigenous Data including any data, information, and knowledge, in any format, that impacts Indigenous Peoples, Nations, and Communities at the collective and individual levels including:
    • Data about Indigenous Resources and Environments (land, water, geology, titles, air, soil, sacred sites, territories, plants, animals, etc.)
    • Data about Indigenous Peoples as Individuals (administrative, legal, health, social, commercial, corporate, services, demographics, etc.)
    • Data about Indigenous Peoples as Collectives – Nations, Peoples, and Communities (traditional and cultural information, archives, oral histories, literature, ancestral and clan knowledge, stories, belongings, etc.)

For more information about Indigenous Data Sovereignty and considerations in being respectful stewards of Indigenous data, please contact the Indigenous Research team (Indigenous.Research@uwaterloo.ca).

Definition

Indigenous Data Sovereignty: is the authority of Indigenous peoples, Nations and communities over their own data, how their data are framed, and how their data are managed. This includes sovereignty over the collection, use, control, access, possession, and sharing of these data. These rights are recognized and upheld by the United Nations Declaration on the Rights of Indigenous Peoples

  • Open data are available to the public so that anyone can view, use, modify, and share as permitted by the license. See Creative Commons for license options. Open data are typically accompanied by a license that promotes openness and transparency. Characteristics include:
    • Accessible without barriers, such as paywalls or registration requirements.
    • Formatted in a way that is easy to use and analyze, such as CSV or JSON, and not locked into proprietary formats.
    • Use the data for any purpose, including commercial use, if they comply with the terms of the open data license.
    • Promotes transparency and accountability in government and organizations, allowing citizens to engage with and understand information.
  • Publicly available can be accessed by the public but may not necessarily adhere to the principles of openness. It may be available for viewing but could have restrictions on use, modification, or redistribution. Characteristics include:
    • May be some barriers, such as the need to register or agree to terms of use.
    • Limitations on how the data can be used, such as prohibiting commercial use or requiring attribution.
    • Not always provided in a user-friendly format.
    • May require permission to access or have an associated fee.
  • Regulated Data: information that is protected by local, national, or international statute or regulation mandating certain restrictions.
  • Administrative Data collected from administrative systems (e.g., government or institutional databases) and is commonly used in fields like health and social sciences.
  • Online Services Data which encompasses a wide range of information from online activities (like search engines and e-commerce) and can be valuable for various research inquiries.
  • Aggregate Data: Data presented in summary form, where individual responses are combined to provide overall trends or statistics, such as survey results showing the average response from a group without revealing individual data points.
  • Confidential Information: where there is an expectation that such information will not be disclosed to anyone except those people requiring the information for a legitimate purpose. Confidential information must be protected against unauthorized use (as “use of information” is defined, above) or disclosure.
  • Sensitive research areas data identified in Annex A of the National Security Guidelines for Research Partnerships
    • Research subject to the Export Control List (ECL) of the Export and Import Permits Act (EIPA). Examples include:
      • conventional weapons and dual-use goods
      • missile and rocket technology, space technology and chemical and biological weapons and agents
      • nuclear programs
    • Research in the area involving or applicable to nuclear programs that are subject to the Nuclear Non-proliferation Import and Export Control Regulations.
    • Research in areas related to goods or technology identified in the Schedule (section 35) of the Defense Production Act (known as the Controlled Goods List) are sensitive and subject to the Controlled Goods Program; and/or technical data as defined by Technical Data Control Regulations also under the authority of the Defense Production Act
    • Sensitive Technology Research Areas or dual-use technologies that have both civilian and military applications, See Canada’s Policy on Sensitive Technology Research and Affiliations of Concern
    • Additional research areas data that can be considered sensitive:
      • Research involving the 31 critical minerals which are deemed critical in the Canadian Critical Minerals Strategy and play a vital role in economic security, national transition to a low-carbon economy, and strategic partnerships.
      • Research involving the 10 critical infrastructure sectors outlined in Canada’s National Strategy for Critical Infrastructure which are processes, systems, facilities, technologies, networks, assets, and services essential to the health, safety, security or economic well-being of Canadians and the effective functioning of government.
      • Big data/large datasets: Semi-structured and unstructured data in a wide variety of formats, in large volumes, and produced at high speed. “Big” data, by virtue of their volume, velocity, or variety, cannot be easily stored or analyzed with traditional methods. Things like sensors, Internet of Things (IoT) devices, and social media all create “big” data. This data can be analyzed to reveal patterns, trends, and associations, particularly concerning human behaviour and interactions. The sensitivity of these datasets depends on factors like the nature, type, and state of the information contained, as well as how the data might be utilized in the aggregate.
      • Research areas that use sensitive personal data that could be leveraged by hostile state actors to harm Canada’s national and economic security through its exploitation. a list of examples of sensitive data.

For more information about Sensitive research areas data and considerations in being responsible and compliant stewards of the data, please review Safeguarding Research and contact the Research Security team (safeguardingresearch@uwaterloo.ca).

Information not considered research data

Certain types of information are not classified as research data and are subject to the Freedom of Information and Protection of Privacy Act (FIPPA). This information should be classified according to Policy 46 Confidentiality Classifications. Examples of such information include:

  • Research Papers: Scholarly articles presenting findings of research, including analyses and conclusions, but not classified as raw research data.
  • Administrative Records: Documentation related to the planning and administration of research activities that do not involve the direct collection of data.
    • Cover Sheets
    • Grant Proposals
    • Data Management Plans (DMPs)
    • Cybersecurity plans
    • Equity, Diversity & inclusion plans
  • Financial Data: Budgetary and financial documents that do not pertain directly to specific research data.
  • Personnel Records: Information related to staff and graduate students that does not involve research data collection or analysis.
  • Human participant information: Personal information collected and used to run a research study, such as contact information (names, phone numbers, email). This is not required to answer the research question(s).
  • Contractual Agreements: Agreements with external collaborators or institutions that do not include research data.
  • Correspondence: Communication related to the administration of research but not involving research data collection or analysis.
  • Policy Documents: Institutional policies and guidelines not directly related to specific research projects.
  • Data about external partners or institutions involved in the research project, including their expertise and contributions.
  • Intellectual Property (IP): Creations of the mind (e.g., inventions, trademarks) resulting from research activities, but not considered research data. See Policy 73 for more details.

Impacted/interested parties

List all individuals or groups who have a vested interest in the research data or could be harmed by a compromise to the confidentiality, integrity, or availability. Consider not only the potential harm to those directly involved, but also consider the broader implications for the research community, the university, the public, and national security. Groups to consider include but are not limited to:

  • Researchers
    • Principal Investigators, Co-Investigators, Collaborators, Graduate Students, Undergraduate Research Assistants, Post-doctoral Fellows, Staff

Map the data lifecycle

The Research Data Lifecycle acts as a roadmap for researchers, outlining key considerations for Research Data Management (RDM) at each stage.

Researchers should also consider the extent to which the data are interconnected with other systems or datasets. This assessment helps identify the potential impact of a data breach or exploit on other parts of the research ecosystem and highlights key impacted/interested parties and their roles in managing and protecting the data throughout its lifecycle.

Summary of the research data lifecycle phases

Research data lifecycle phases

Plan: organize research data for discovery, reuse, and archiving and creating a data management plan (DMP).

Create: identify, acquire, and generate research data and metadata.

Process: data are prepared for analysis through validation and cleaning.

Analyze: analyze prepared data.

Disseminate: share findings.

Preserve: transition data to an archival state.

Reuse: ensure data are discoverable and accessible for integration into new datasets.

The following elements need to be considered at each phase of the research data lifecycle and should be integrated into the DMP; the specific implementation of these elements will vary based on the assessed risk level classification of the data.

  • Store: The active and archival storage of data, with an emphasis on accessibility and security.
  • Discover: Ensuring data discoverability to facilitate accessibility and mobilization for future research use.
  • Document and Curate: Providing rich descriptions of data context.
  • Secure: Addressing consent, ethical considerations, and maintaining integrity while utilizing established security platforms and guidance from privacy and IT security services.

Research data classification framework

​How is risk calculated?

Considering the value of research data and its appeal to malicious actors, it is essential to align handling protocols, processes, and security controls with the assigned data classification. Researchers, with support from campus experts, must identify vulnerabilities and apply targeted cybersecurity measures to ensure data protection and reliable research outcomes. These efforts are critical to maintaining public trust and preserving the integrity of the research process.

Risk assessment involves evaluating threats to research and infrastructure, data exposure, and the impact of unauthorized disclosure, modification, or inaccessibility, forming the foundation for effective risk management.

Risk = Threat x Exposure x Impact (harm)

To calculate risk effectively, researchers must assess the following key factors:

  • Threat: Any potential danger that could exploit a vulnerability to cause harm to an asset. This can include malicious actors, natural disasters, or system failures.
  • Exposure: How exposed or available are the data to threats, or potential threats.
  • Impact: The potential consequences or damage that could result from a successful exploit. This includes financial losses, reputational damage, legal implications, and operational disruptions.

Research data classification framework

For full details, please see:

.

  1. Determine the Scope: Identify the following:
  2. Identify & Engage Expertise: Identify internal or external experts to support risk assessment and mitigations.
  3. Identify Potential Harms & Impacts: Assess the consequences of data breaches or misuse on individuals, the institution, and society. Examples include (non-exhaustive:
    • Reputation: Damage to the credibility of researchers, institutions, or organizations.
    • Harm to Research Participants: Privacy violations, identity theft, and misinterpretation of data.
    • Harm to Indigenous Communities: Impact on trust, sovereignty, and ethical research practices.
    • Delayed Research: Disruptions in data availability causing research delays.
    • Financial Costs: High costs of remediation, investigations, and implementing security measures.
    • Loss of Funding: Reduced confidence in research data leading to funding challenges.
    • Theft & Illicit Transfer of Technology: Loss of competitive advantage and IP theft.
    • Debarment: Potential revocation of research permissions or funding due to compliance failures.
    • Criminal Liability: Potential legal consequences for non-compliance with regulations.
    • National Security: Risks to national security from foreign interference or misuse of sensitive data.
  4. Identify & Assess Threats (Actors & Attacks):
    • Identify possible threat actors (Advanced Persistent Threats - APT, Cybercriminals, Hacktivists, Insider Threats, Nation-States, Terrorist Groups, Thrill-Seekers) and evaluate their skills/capabilities, motive and opportunity.
    • Identify possible attacks. The Canadian Center for Cybersecurity provides a non-exhaustive list of common tools and techniques that are used by threat actors.  MITRE ATT&CK® also provides a globally accessible knowledge base of adversary tactics and techniques based on real-world observations. Researchers can consult with their respective Information Technology administrators and ISS/IST to identify applicable attacks.
  5. Identify & Analyze Exposures: Assess the level of exposure of data to threat actors, or potential threat actors
    • Data Storage and Security: Location, security controls, volume, backup, archival, compliance
    • Data Modification: User profiles, access controls, permissions, access frequency
    • Access Location: Working remotely, external institutions
    • Device Security: Endpoint protection, vulnerability management, encryption, access controls
    • Collaboration Level: Internal/external partnerships, international collaborations
    • Interfaces for Data Access: Authentication, data transfer protocols, logging
    • Data Retention: Legal requirements, destruction protocols, backup and recovery
    • Software Requirements: Licensing, vendor security, authentication methods
  6. Develop Strategies Based on Classification: Based on the assigned classification, implement tailored security measures and data management strategies to address the specific risks, ensuring appropriate levels of protection, access control, and compliance with relevant policies and regulations.