Guideline for researchers on securing research participants' data

Purpose

Scope

Data security procedures

Minimum security requirements for electronic data/information

Definitions

Related information

Purpose

One of the guiding principles of the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans, 2nd edition (TCPS2) is concern for welfare. Contributing factors to welfare are privacy and control of information about an individual and the treatment of human biological materials according to the desires of the person from whom the information or materials were collected. Researchers must have in place procedures for the protection of identifiable and/or confidential information obtained or collected during participation in a research study or for use in a research study. Identifiable information is any information that may reasonably be expected to identify an individual, alone or in combination with other available information.

These guidelines detail security measures for maintaining personally identifiable information for research purposes and the adoption of recommended data security plans provided by University of Waterloo Information Services and Technology (IST).

Scope

These guidelines apply to all research involving human participants conducted by or under the auspices of University of Waterloo faculty, students, other affiliated researchers (investigators).

The pertinent data/information containing (personally) identifiable information may be (or has been) collected and/or stored in paper or electronic form. Electronic storage can include mobile devices, personal computers, portable media, and online storage which can be privately owned or university-owned, and located on university premises or elsewhere.   

These guidelines pertain to the data or information collected from or about human participants in research or for use in research and not the storage and retention of consent/agreement forms.

Definitions

Data Set

A human research data set, hereafter referred to as a data set, is a compilation of data elements collected from or about human participants in research. The terms data and information are used interchangeably such that data are not restricted to quantitative measures. Participant consent forms are not a human research data set.

Identified data set

An identified data set contains personal identifiers which can include both direct and indirect identifiers. 

Direct Identifiers

Direct identifiers (PDF) point directly to a particular individual (e.g., name, address, Social Insurance Number, student identification number, hospital patient number). 

Indirect Identifiers

Indirect identifiers (PDF) can also point to an individual by focusing attention on unique cases, and/or in combination with other information in the data set (e.g., educational institute from which respondent graduated and year of graduation, exact occupation held, combining any of gender, race, job and location).

Highly Restricted Information

Specific direct identifiers or proprietary information are defined as highly restricted information in the Waterloo Information Security Policy 8. The use of this information in research is approved for university researchers subject to conditions established by the Office of Research and/or granting agency, and where appropriate, in consultation with the Information Security Officer:

  • Social Insurance Numbers
  • Bank Account Numbers
  • Credit Card Numbers
  • Driver’s License Numbers
  • Health Insurance Identification Numbers
  • Information considered itself to be controlled technology as regulated by Controlled Goods Regulations, and technical data as defined by Technical Data Control Regulations under the authority of the Defence Production Act.
  • Information related to Public Works and Government Services Canada contracts or other contracts governed by regulations of the Canadian and International Security Directorate

Restricted Information

Data sets defined as Restricted Information in Policy 8 and include the following not explicitly classified as Highly Restricted.  Restricted information data sets could include direct and/or indirect identifiers.

Sensitive Personal Information

The sensitivity of personal or community-level identifiable information is related to the potential for harm or stigmatization of the individual or a community if the information is released.  Personal information that may be considered sensitive could relate to[3]

  • Sexual attitudes, practices and orientation
  • Use of alcohol, drugs, or other addictive substances
  • Illegal activities
  • Suicide
  • Sexual abuse
  • Sexual harassment
  • An individual’s psychological well-being or mental health
  • Some types of genetic information
  • Any other information (e.g., religious affiliation) that, if released, might lead to social stigmatization or discrimination

De-Identified and Identity-Only Data Sets

If the research requires data elements/variables later to be linked to an individual's identity, the original data set should be partitioned into two data sets; a de-identified data set and an identity-only data set. The latter contains all direct identifying information absolutely necessary for future conduct of the research. An identity code, associated with and unique to each specific individual, can be included in both the de-identified and identity-only data sets. The identity code shall not offer any clue as to the identity of an individual. The code can later be used to link the identity data set elements back to the de-identified data set.

Coded Information

Coded information has had direct identifiers in the de-identified data set replaced with an identity code and the identity-only data set links the code to the direct identifiers. Select research team members (e.g., the principal investigator) retain the identity-only data set. 

Anonymized Information

A de-identified data set is anonymized if it is:

  • irrevocably stripped of direct identifiers,
  • the risk of re-identification of individuals through the remaining indirect identifiers is low or very low, and
  • there is no corresponding data set linking a code to identifying data.

Anonymous information

A data set that never had identifiers (i.e., direct and indirect identifiers that could potentially identify an individual, alone or in combination) associated with it is anonymous information. The risk of identifying individuals is low or very low.

Secure location

A secure location is a place (e.g., office, laboratory, filing cabinet) for storing a portable medium, computer, or equipment in which data sets, which are coded, anonymized or with personal identifiers, reside. The principal (or lead) investigator has access to the secure location through lock and key (either physical or electronic keys are acceptable). Access may be provided to other parties (e.g. co-investigators, post doctoral fellows, research assistants) with a legitimate need as disclosed in the ORE 101 application. For student research (e.g., course, Honour’s, Master’s, PhD), the student investigator may be the only investigator who has access to the location.

Data Encryption

Secure data encryption is the algorithmic transformation of a data set to an unrecognizable form from which the original data set or any part of it can be recovered only with knowledge of a secret decryption key.  

Password-Protected Access

A computer or server requiring a password to access is password protected.  Research data/information should be stored on campus servers that meet the Waterloo IST standards for secure hosting and  security guidelines. For multi-site, multi-country, or multi-investigator research projects a non-Waterloo server solution, such as the Scholars Portal Dataverse, may be required.

Data Security Procedures

Data sets with direct identifiers and identity-only data sets shall always be stored in a secure location and in secure data-encrypted form.

Not all research data sets can be completely de-identified (e.g., an audio recorded interview in which a participant identifies him or herself). In this case, the original data set must be considered an identified data set and treated accordingly.

The level of security necessary for maintaining personally identifiable information is relative to the risk posed to the participant should the data be inadvertently released or released as a result of malfeasance. Sensitive personal information requires a high level of security. Identified information, where participants have explicitly consented to identification, requires a lower level of security. 

Collect the minimum identifying data needed for the conduct of the study.  In the application describe:

  • exactly what personally identifiable data elements/variables will be collected and why they are required for the proposed research, and
  • whether the data set will be split into a de-identified data set and an identity-only data set or anonymized with no identity-only data set.

De-identify data as soon as possible after collection and/or separate identifiable variables (i.e., create identity code, destroy raw data). For purposes of later merging the identity information with other research data, an identity code assigned by the researcher may be included in both data sets, and later used to link identity data elements back to the de-identified data set.

Minimum Security Requirements for Electronic Data/Information

Security requirements for data
Type of information/data Security procedures needed

Information/data with direct identifiersIdentity-only data set

Encryption and secure location

If highly identifiable and sensitive data, also store at high level of security (e.g. on stand-alone servers, special protection for remote electronic access)

Identified information, with permission for identification Password-protected access
Coded or anonymized personal health information/data (PHI), as defined under PHIPA

Encryption and secure location(or as required by providing health information custodian)

Coded information/data(not personal health information (PHI) as defined under PHIPA)

Password-protected access and secure locationEncryption and secure location if high risk of re-identification through indirect identifiers and is sensitive information

Information with only indirect identifying information, yet risk of re-identification is greater than low(not PHI as defined under PHIPA)

Password-protected access and secure locationEncryption and secure location if high risk of re-identification and sensitive information

Anonymized information(not PHI as defined under PHIPA)

Password-protected access
Anonymous information (low or very low risk of re-identification of individuals through indirect identifiers) Password-protected access

Anonymized or anonymous information with no risk of re-identification through indirect identifiers or the risk has been removed.

Password-protected access

Personally identifiable information/data, particularly health information, obtained from organizations/custodians (e.g., the Canadian Institute for Health Information, Hospital, CCAC, longer-term care facility, a U.S federal agency) may require specific security or additional security (e.g., location of computer server, access) measures as identified in the terms and conditions of an agreement.  

The electronic transmission of identifiable data must use secured communication protocols approved by IST. E-mail is not a secured communication protocol.

Laptops and portable devices must be secured; they pose a significant risk for identifiable data because of the increased possibility of theft. Identifiable data collected on a laptop or portable equipment must be encrypted and de-identified as soon as possible or moved to secured, non-portable equipment.

When an identified or identity-only data set is stored in a personal, university-owned or university-maintained or other-source computer, investigators are strongly encouraged to ensure that the computer is professionally administered and managed according to Waterloo IST security standards; for example, by the faculty computer facility. Poor passwords provide no protection for identifiable data; see IST password standards.

The investigators shall determine and disclose in the application the individuals who will have legitimate access to an identified or identity-only data set, either through access to secure location key or to decryption key. This plan must include provision for recovery of a lost decryption key, to ensure that a data set cannot be permanently lost.

The Principal Investigator or Faculty Supervisor for sponsored research projects is the Information Steward for the research data/information while other research team members for the sponsored research project are Information Custodians. All are expected to be aware of and understand their role’s responsibilities as outlined in Waterloo Policy 8: Information Security. Research team members include:

  • Co-investigators
  • Students engaged in research activities
  • Research staff in custody of or involved in collection of research information
  • Technical support staff involved in the deployment, maintenance, and administration of information technology where research information is stored and/or transmitted

Locked filing cabinets and other storage containers are housed within a locked room; either physical or electronic keys are acceptable.

Investigators collecting or collecting and storing identifiable data/information should ensure good data management practices, including confidentially destroying paper copies of information and protecting access to data/information while working with identifiable data/information.  

Applications must include for review and clearance a detailed plan for data security as requested in section G, Anonymity of Participants and Confidentiality of Data, of the application form. The confidentiality of data and data security information outlined in the information-consent letter must be consistent with the information provided in section G of the application form.

Related information

All about passwords

Checklist on Disclosure Potential of Proposed Data Releases

CIHR Best Practices for Protecting Privacy in Health Research (September 2005)

Confidential shredding

Data encryption

Direct identifiers and treatment of variables that might act as indirect identifiers (PDF)

Electronic Media Disposal Guidelines

Guide to Social Science Data Preparation and Archiving

Laptop security

Privacy Analytics Risk Assessment Tools

Secure file transfer

U.S. Federal Information Security Act (FISMA)

University of Waterloo Policy 8: Information Security

University of Waterloo Records Management for Research

 

[1] Purpose of the Act is to: (1) provide a right of access to information under the control of institutions, and (2) protect the privacy of individuals with respect to personal information about themselves held by institutions and to provide individuals with a right of access to that information.

[2] Health Information Custodians are persons involved in delivering health care services; e.g., health care practitioners who provide health care for payment, long-term-care service providers, community care  access corporations, hospitals and other facilities, pharmacies, laboratories, a medical officer of health or a board of health, the Ministry of Health and Long-Term Care and others specifically identified in the Act.

[3] CIHR Best Practices for Protecting Privacy in Health Research (September 2005), Element 2


Freedom of Information and Protection of Privacy Act, R.S.O. 1990, c. F.31

Definition of personal information:

“personal information” means recorded information about an identifiable individual, including,

(a) information relating to the race, national or ethnic origin, colour, religion, age, sex, sexual orientation or marital or family status of the individual,

(b) information relating to the education or the medical, psychiatric, psychological, criminal or employment history of the individual or information relating to financial transactions in which the individual has been involved,

(c) any identifying number, symbol or other particular assigned to the individual,

(d) the address, telephone number, fingerprints or blood type of the individual,

(e) the personal opinions or views of the individual except where they relate to another individual,

(f) correspondence sent to an institution by the individual that is implicitly or explicitly of a private or confidential nature, and replies to that correspondence that would reveal the contents of the original correspondence,

(g) the views or opinions of another individual about the individual, and

(h) the individual’s name where it appears with other personal information relating to the individual or where the disclosure of the name would reveal other personal information about the individual; (“renseignements personnels”)


Personal Health Information Protection Act, 2004, S.O. 2004, c. 3, Sched. A

Definition of personal health information:

“personal health information”, subject to subsections (3) and (4), means identifying information about an individual in oral or recorded form, if the information,

(a) relates to the physical or mental health of the individual, including information that consists of the health history of the individual’s family,

(b) relates to the providing of health care to the individual, including the identification of a person as a provider of health care to the individual,

(c) is a plan of service within the meaning of the Home Care and Community Services Act, 1994 for the individual,

(d) relates to payments or eligibility for health care, or eligibility for coverage for health care, in respect of the individual,

(e) relates to the donation by the individual of any body part or bodily substance of the individual or is derived from the testing or examination of any such body part or bodily substance,

(f) is the individual’s health number, or

(g) identifies an individual’s substitute decision-maker.