Elements of a data sharing agreement: An example

The following is an example of the elements needed in a data sharing agreement.

Secondary use of personal health data originally collected for clinical purposes

A University of Waterloo researcher wishes to contact family physicians and pharmacists to obtain data on medications prescribed over the last three years to a certain demographic cohort. This data will include data originally collected and stored for clinical purposes. However, the researcher would now like to access the participants’ health files to extract specific pieces of data. Since this situation involves another organization(s) (i.e., the pharmacies and the physicians’ offices), and will involve the collection of restricted data, a data sharing agreement would need to be developed and signed both by the University of Waterloo and by an official at each of the participating organizations. Pharmacies and physicians are covered under the province of Ontario PHIPA legislation and regulation and the specific duties of the Health Information Custodian at either the pharmacy or within the medical practice are listed within the Act. The data sharing agreement ensures that both parties are living up to their obligations under provincial privacy statutes, Good Clinical Practice, HIPPA and TCPS2 and have considered these obligations.

The agreement or plan must contain the following elements and may be contained in either:

  • a stand-alone data management plan, or
  • described in the research protocol, or 
  • an appendix to a master research agreement

Element

Details

Description

  1. Project Description
  • A few paragraphs which describe the research project
  • Summarize the research question(s)  for which the data will be gathered
  • Clarify who will be the sender and receiver of the data
  • Mention all organizations and people who will have access to this data both within uW and at other collaborating sites
  • These individuals should be listed by name and by role or title including contact information
  • Include any prohibitions on secondary uses of this data (e.g., if it cannot be used for commercial gain)
  1. Data Description
  • A description of the information to be gathered, used or made available
  • The nature and scale of the data that will be generated or collected or re-used
  • Include a description of who owns the data (e.g., participants, partnership, consortium, local communities, researchers, research institution, funder)
  • Itemize the specific data elements to be extracted from the clinical files at a field level if possible.
  • As a general rule, only the minimum amount of personal information needed to meet the research objectives should be used
  • Clarify if data will be used at an individual level or aggregated level
  • Clarify if data will include any personal information as legally defined by privacy laws of the applicable jurisdiction (e.g., in Ontario, there are three which may need to be considered: PIPEDA, PHIPA, and FIPPA)
  • Under the provincial Freedom of Information and Protection of Privacy Act (FIPPA) personal information includes items such as:  race, national or ethnic origin, color, religion, age, marital status, education, medical, criminal or employment history, account numbers and names, opinion of a person about another person, etc.
  • Refer to the definitions of personal information in FIPPA and personal health information in PHIPA
  • Consider if there are specific obligations arising under Waterloo policies due to the type of information being transferred. In this example:
  • Clarify how the specific data elements will answer the research question (i.e., how each field is relevant)
  • Data which are not required to answer the research question should not be accessed or provided by the Health Information Custodian
  • Clarify how data will be verified to ensure they are accurate, complete, and kept up to date
  1. Existing data
  • Data relevant to the project and a discussion of whether and how these data will be integrated
  • Search web and data archives for similar datasets
  • Answer the question:  Why is there a need to create a new dataset?  What deficiencies exist with what is currently available?
  • Identify where the data exists in the health files to be accessed
  • It is preferable for an individual within the clinicalcircle of care to extract the data under the supervision of the Health Information Custodian
  • In general, an anonymized or de-identified data set is preferred and should be provided to the researcher
  • In Ontario, personal information does not include information about people who have been dead for more than 30 years        
  • If the data may need to be later re-linked to an individual’s identify, the Information Custodian should create a partitioned data set including two parts: a de-identified data set and an identity only data set
  • The codes created should not include any clues as to the identity of the individual but if in doubt, the US Safe Harbor standard should be used to de-identify the data set
  • If the data will be crossing borders, consider the requirements of the OECD guideline on transborder data flow
  • If an expert statistical determination will be used to de-identify the data, provide details on the methods to be used and the likely statistical residual risk of re-identification
  • If the Information Custodian does not have the resources to accomplish this de-identification and partitioning or aggregation, the researcher can consider paying a member of the organization’s staff to extract this data and this expense should be included in the original research budget
  1. Format
  • List data formats, standards and  conventions and apply to each data item
  • Justify the use of a particular format in terms of usability, longevity, and suitability for archiving
  • Identify how the data will be transferred to the researcher (e.g., will a memory stick, laptop, electronic transfer using secure protocol be used)
  • Ensure the participant’s file contains a notation that the data exchange has occurred
  • If the data are to be electronically transferred, Waterloo guidelines on secure data storage and transmissions should be followed including the use of recommended software such as Sendit
  • If the data is restricted per Waterloo security classifications all transmissions and storage vehicles must be encrypted
  • Encryption software can be downloaded for free from the Waterloo website
  • If a unique identifier or code has been created, provide a description of how this will be generated and who will maintain the codes if the need exists to re-identify data sets used in the research
  • It is preferable if the data set is partitioned into identifiable data and a de-identified data set where the codes are maintained by the Information Custodian
  • The Information Custodian may consider keeping the identifiable data and codes in physical or electronic formats
  • Unless the ethics application provides for access to identifiable data by the researcher any request by the researcher to re-identify the data should be accompanied by an ethics modification request
  • If identifiable data is requested, this should be included in the ethics application
  1. Metadata
  • A description of the metadata (data about data = means of the creation, purpose, time and date of creation, author) to be provided along with the generated data and a description of the metadata standards used
  • Clarify what metadata will be collected
  • Ensure metadata corresponds to Waterloo requirements and standards
  • As a minimum the following metadata should be created and maintained for each record:
    • Unique identifier: a unique identifier, allowing retrieval of this record and no other within a record keeping system
    • Name: the title or name given to the record
    • Date of creation: the date the record is first saved or entered into the system
    • Who created the record: ideally this element will identify the author (person or system) of the record, the individual in the position responsible for the action or decision documented by the record, and (if different) the person responsible for the capture of the record (i.e., this may not always be possible, so offices should capture the details they are able to from the metadata assigned automatically by the creating application)
    • What business is being conducted: this will identify the business activity and function, based on the WatCLASS records classification system and the associated records retention information (i.e., in many cases, this element may include more detailed information on business processes and workflows, required for the ongoing business use of the records)
    • Security classification of the record: using the terminology provided by Waterloo Policy 8, Information Security.
    • Creating application and version: the name and version of the software application that created the record
  1. Data organization
  • How the data will be managed during the project including information about version control, naming conventions, etc.
  • Provide a description of how the data files should be structured and organized in order to transmit them to the researcher (e.g., are there specific unique identifiers and keys for each record, in what order should the data fields appear)
  1. Quality Assurance
  • Procedures for ensuring data quality during the project
  • Methods of data collection and storage can have implications for future utility, particularly with biological specimens
  • Identify how both the sender and receiver will verify that the data which has been extracted and received is of sufficient quality
  • For example, will random audits or checks be conducted?
  1. Storage and Backup
  • Storage methods and back up procedures for the data including the facilities that will be used for the effective preservation and storage of the research data.
  • Explain the backup schedule and process, responsibility and sensitivity levels
  1. Security
  • A description of technical and procedural protections for information including confidential information, and how permissions, restrictions and embargoes will be enforced
  • Personal information should be de-identified prior to being transmitted.
  • Ontario has developed a guideline on de-identification methods.
  • Both direct and quasi-identifiers should be removed from the data set prior to transmittal or sharing unless specifically approved in the ethics application
  • If the data will be masked (i.e., only the direct identifiers removed) specify the method which will be used to mask the data
  • Consider if it is possible to destroy the key to re-identify the data effectively anonymizing the data
  • Ensure any use of highly restricted information is in line with approvals required under Waterloo Policy 8.
  • List pre-approved uses and information stewards for highly restricted information
  • Any other use of highly restricted information  must receive specific approval from the Chief Information Officer under Waterloo Policy 8
  • If data will be aggregated, describe the method to be used (e.g., Treasury Board Guidance on Preparing Information Sharing Agreements Involving Personal Information recommends federal government departments:
    • Delete table data that contains information on fewer than five people (and other table data as necessary to prevent identification based on row and column totals)
    • Combine categories
    • Review charts and graphs to ensure they do not display information on identifiable individuals
    • Thoroughly review indirect identifiers (e.g., population, age group, sex, marital status, to ensure that they cannot subsequently be linked with other information to re-identify individuals)
  • At the very least, the investigators and the holder of the key to the code (i.e., Information Custodian in most situations) should include a procedure in the agreement preventing the release of the key to anyone not party to the agreement (e.g., external agencies or organizations)
  • If there are specific circumstances under which the data may be re-identified (e.g., an adverse event or incidental finding) this should be specified in the agreement.
  • Refer to the guidelines on data security for the minimum security requirements for electronic data and information based on its category (e.g., highly restricted information and restricted health information require encryption, storage in a secure location (physically or virtually), and password protected access)
  • Consider the use of methods such as the  US Safe Harbor standard or the expert statistical determination method to de-identify data
  • Specify the detailed method(s) which will be used to de-identify the data set
  • Security breaches and incidents should be reported in accordance with the Waterloo ethics procedures
  • Identify if the data transfer will be a push or a pull as well as the frequency under which it will be supplied (e.g., batch, ad hoc requests, real time)
  1. Responsibility
  • List owners and stakeholders of the data and names of individuals responsible for data management, analysis, interpretation, and dissemination in the research project.
  • Consider the appointment of a data steward if appropriate (i.e., someone who will be responsible for the data but is not the actual owner of the data) - this could be a health information custodian
  • Ensure the ethics application and agreement disclose all the individuals who will have legitimate access to an identified or identify only data set either through access to the secure location key or to the decryption key
  • For highly restricted, restricted, or confidential data, identify an information custodian in line with Waterloo Policy 8
  • The plan should include a provision for recovery of a lost decryption key to ensure that a data set cannot be permanently lost
  • Ensure that assurances and undertakings are provided that those in receipt of the data have been properly trained to use the data and understand their responsibilities
  1. Budget
  • Include the costs for preparing data and documentation for archiving and how these costs will be paid
  • Often the time involved in documenting, writing metadata and archiving is underestimated 
  • Consider equipment and personnel costs
  • If the Information Custodian is unable to extract the required data elements, provide for the expense of hiring a member of the clinical circle of care team to extract the data and prepare the file for transfer to the researcher
  1. Intellectual Property Rights
  • Entities or persons who will hold the intellectual property rights to the data and how intellectual property will be protected if necessary
  • Any copyright constraints should be described (e.g., copyright data collection instruments)
 
  1. Legal Requirements
  • A listing of all relevant federal, provincial, or funder requirements for data management and data sharing 
  • Consider confidentiality issues
  • Address process for violations or breaches
  • Clarify what penalties will be if agreement violated
  • Indemnifications for misuse of the data
  • Warranties that the data is owned by the provider or that the provider of the data has the right to share the data
  • Warranty that the data were obtained according to applicable laws
  1. Access and allocation
  • A description of what data will be shared, how it will be shared including access procedures, embargo periods and mechanisms for dissemination
  • Clarify if access will be open or restricted
  • A timeframe for data sharing and publishing should be provided
  • If open access is envisioned, describe who will have access to the data (e.g., wider community, scientific community, research partners, secondary researchers, students, research assistants)
  • Ensure to specify any restrictions which need to be placed on subsequent use or access to the data
  • Specify who will analyze the data and interpret the results
  • Specify who the contact person will be to access the data
  • Specify when the agreement will end
  1. Audience
  • List potential secondary users of the data
  • List all current and future stakeholders
 
  1. Selection and retention periods
  • A description of how data will be selected for archiving, how long the data will be held, and plans for eventual transition or termination of the data collection in future
  • Ensure any sponsor requirements for data retention are explicitly mentioned (e.g., Health Canada requires data to be retained for 25 years and some documents for at least three years after completion of the clinical trial)
  • Waterloo has requirements for institutional record retention, storage, and destruction
  • The agreement should clarify if any data might form part of institutional records and need to be retained in a specific, identifiable manner
  1. Archiving and preservation
  • Describe how the procedures in place or envisioned for long term archiving and preservation of the data including succession plans for the data should the expected archiving entity go out of existence
  • Consider secure disposal of data and backups
  • Researchers have a responsibility to ensure that data storage format allows for audit and access if required
  1. Ethics and privacy
  • A discussion of how informed consent will be handled and how privacy will be protected including any exceptional arrangements that might be needed to protect participant confidentiality and other ethical issues that may arise
  • Detail process to be followed if breach occurs
  • Should be reported as an adverse event to the Office of Research Ethics
  1. Dissemination
  • Include the format of dissemination, (e.g., publication, website) cultural and linguistic needs, principal authors and acknowledgment
 
  1. Term & Termination
  • Describe the duration of the agreement and the procedures in place for dealing with the data once the agreement is terminated or expired
 

Acknowledgments:

This chart has been adapted from the February 2012 guideline produced by the Colorado Clinical and Translational Sciences Institute and Rocky Mountain Prevention Research Center.  Used with permission, August 2014


Back to top