Guidelines for Collecting Demographic Information from Study Participants

Developing Response OptionsUsing Open-ended or Close-ended Questions| Know Your Participant Group| Avoid Bias| Question PlacementSociodemographic Constructs| Race, Ethnicity, and Culture| Sex and GenderAge and Date of Birth| IndigeneityNeed HelpResources|  References

Why Collect Demographic Information?

By collecting demographic information, researchers can: 

  • more accurately describe their study participants to place study results in context
  • outline which groups are represented (or not) in their sample and make inferences about the generalizability of their findings to the population being studied (Dobosh, 2018)
  • document demographic diversity in perspectives or experiences related to the topic of study
  • help others replicate results (Hughes et al., 2016)

The TCPS 2 Core Principles provide an ethical framework for collecting this information. Because the collection of identity-specific information can be perceived as sensitive, demographic information must be collected in a respectful way (Region of Waterloo, 2022).

Respect for persons recognizes that individuals who participate in research do so voluntarily and that their consent is informed (TCPS 2 2022, Chapter 3).

Voluntary participation means that participants have the right to decline answering any question(s), and this includes demographic information. This right can be supported by:

  • programming questionnaires so that questions can be skipped
  • including instructions reminding participants they are free to decline answering questions
  • providing a ‘prefer not to answer’ option

Informed consent means participants are informed why the information is being collected. For example, the consent process should explain that certain demographic information is required to ensure participants in the control and intervention groups are similar.

Sample Language A: "Demographic questions (i.e., age, ethnicity, sex) are asked to describe the characteristics of participants in this study as well as to examine differences and trends across these characteristics."

Sample Language B: "You will be asked to respond to a series of demographic questions (i.e., age, ethnicity, sex) so we can further understand any individual or group differences as they relate to [insert research question]." 

Concern for welfare is the quality of a person’s experience in all aspects of their life (TCPS 2 2022, Chapter 2). Ensuring welfare is being aware of the privacy and confidentiality aspects for study participants which can be affected when some demographic questions are asked, especially if participants are asked multiple questions about their identity. This is especially important for those who may be underrepresented or less common in the overall study sample or study population (Frederick, 2020).

  • Consider how the information collected will be used to answer the research question. Each demographic question should have a clear purpose in terms of how it will help fulfill the research aims (Dobosh, 2017) or support justice by addressing questions of representation. For example, a question about sexual orientation may seem unnecessary in a study about recycling practices, but relevant in a study about close relationships (Hughes, et al., 2016).   

Justice is the obligation to treat people fairly and equitably (TCPS 2 2022, Chapter 4). Collecting demographic information can help researchers ensure their study is inclusive, so the recruitment/screening process leads to a participant group that reflects the community or population of interest (Henrich, Heine, & Norenzayan, 2010; Roberts & Mortenson, 2022). Demographics can be collected to gain insight into not only the generalizability of the results, but also to determine how or if the results can be applied to certain communities or groups.

  • Individuals and groups should not be inappropriately excluded from research based on personal characteristics including culture, age, race, disability and so forth (TCPS 2 2022, Chapter 4).


  1. Minimize the number of direct and/or indirect identifiers being collected (e.g., photos, emails, postal codes).
  2. Inform participants how the research team plans to report, share, and secure the information they collect in a way that maintains participants’ privacy and confidentiality. 
  3. Collect only the information needed to answer the research question and describe only basic characteristics of the sample (e.g., gender, age, racial identity).

Developing Response Options

A good starting point for developing response options is to look at questions or items commonly used by other researchers in the same discipline or used by organizations that have conducted research with similar groups (Dobosh, 2017).

Using open-ended or close-ended questions

Demographic questions can be open-ended with a blank space allowing the participant to provide more information or close-ended with specific choices or categories provided by the researchers.

Consider whether the goals of collecting demographic information are descriptive versus comparative. For descriptive research that seeks simply to characterize specific individuals (e.g., participants in a focus group, case studies), open-ended questions allow maximal flexibility for self-definition. However, for comparative research that seeks to extract patterns between or across groups of people, close-ended questions provide a more objective, replicable basis for defining the groups to be compared (as well as their possible intersections, if multiple response options are permitted).

In addition, to draw responsible inferences about groups (and protect privacy) in comparative research, researchers should avoid reporting and interpreting aggregated results for very small groups (i.e., those with participant counts that fall below a certain minimum, such as 5 or 10 people).  

Open-ended questions can:

  • allow for more inclusivity and collect information that may be missed when using close-ended response categories (Region of Waterloo, 2022)
  • identify additional categories when researchers are unsure if they have an exhaustive list of options (Davies, 2020)
  • provide flexibility when the categories will vary based on certain groups or locations.

Close-ended questions can:

  • lead to more efficient analysis and more objective aggregation for researchers, and
  • take less time and effort for participants to respond (Region of Waterloo, 2022).


  1. Maintain a balance between too few options and too many options. 
  1. Provide an “Option not listed, please specify” selection if there is the potential for people to provide a response that is not listed in the options provided.  
  1. Allow participants to select more than one response option if they may fit into more than one category (i.e., multi-select).
  1. Avoid using a response of “other” as this is “othering” or marginalizing individuals who do not fit into the normative group - use “Prefer to self-identify” or “Another option not specified” (Frederick, 2021). 
  1. When precise quantification is not necessary, use ranges (e.g., age range, income range) to reduce the chance of obtaining identifiable information.

Know your participant group

Consider the local context and be familiar with your participant group so you are knowledgeable about their preferred language. “It’s important to keep in mind that the respectfulness and inclusivity of language about a particular group must be determined by the group itself” (Canadian Public Health Association, 2019). For example, consider whether the group prefers person first language or identity first language.  

Person first language is language that prioritizes “someone’s identity and individuality above whatever other characteristic you might be describing (e.g., ‘person living with HIV’ rather than ‘HIV-infected’).” (Canadian Public Health Association, 2019). 

Identity first language involves leading with the characteristic you are describing within the population (e.g., ‘Autistic person’, ‘Autistic individual’). This type of language can be used when the individual sees the characteristic as an inseparable part of their identity (Brown, 2011).


  1. Consider pilot testing the questions you plan to use to help determine appropriate response options. “Researchers can pilot test using open-ended questions and can then develop closed-ended questions based off that pilot study that include the most common responses as answer choices” (Pew Research Centre, 2021).
  1. Provide responses that are inclusive wherever possible (Canadian Public Health Association, 2019). For example, rather than "salesman" consider "salesperson" or "sales representative" and consider using "they" rather than "he/she". 
  2. Examine the response options to ensure that most individuals can respond, for instance, by indicating “partner” rather than “husband/wife”.  
  3. Keep up to date on language. “As societal values change over time, so does the language that is considered acceptable.” (Canadian Public Health Association, 2019). 

Avoid bias

Consider randomizing response options or displaying them alphabetically. The listing order of response options can reinforce biases (More Than Numbers: A Guide toward Diversity, Equity, and Inclusion (DEI) in Demographic Data Collection, 2020). For example, when asking about race or ethnicity, listing "white" as the first response option could increase the number of people selecting this option, creating primacy bias. Primacy bias is the tendency to recall information presented at the start of a list than information at the middle or end. Further, placing certain response options first can unintentionally convey that they are preferred or more common. 

Question Placement

Advice varies regarding the most appropriate placement of demographic questions in a questionnaire. Many researchers include demographic questions at the end to help minimize fatigue and allow participants to answer these questions after becoming more comfortable with the overall research. In some cases, placing the demographic questions at the end can avoid response bias (Lee and Schuele, 2010). For instance, if a questionnaire asks about income and ethnicity at the start, this question placement could prime participants to think their information is being evaluated based on their income and ethnicity (Lee and Schuele, 2010).

There may be reasons to include demographic questions at the beginning of a study when:

  • responses to these questions are needed to route participants (using branching) to other questions (Pew Research Centre, 2021)
  • they fit most easily with other study procedures, such as collecting demographic information in a survey with other baseline measures prior to conducting an intervention or focus group.     

Sociodemographic Constructs

Race, Ethnicity, and Culture

Race and ethnicity have different meanings, so these dimensions should be assessed in separate questions if either is required (Markus, 2008). Researchers should consider whether they are interested in race, ethnicity, or something more specific related to cultural identity, such as language and/or religion. A person’s physical characteristics are shaped by biology and genes, whereas racial categories are social constructs (Woodward, 2022).

The Guidance on the Use of Standards for Race-Based and Indigenous Identity Data Collection and Health Reporting in Canada (Canadian Institute for Health Information, 2022) defines these constructs as:

Race is “a social construct used to judge and categorize people based on perceived differences in physical appearance in ways that create and maintain power differentials within social hierarchies.”

Ethnicity is “a multi-dimensional concept referring to community belonging and a shared cultural membership. It is related to socio-demographic characteristics, including language, religion, geographic origin, nationality, cultural traditions, ancestry and migration history, among others.  

Culture consists of “overt and subtle value systems, traditions and beliefs that influence our decisions and actions.”


  1. Use recruitment practices that encourage diversity where possible (Woodward, 2020), within the constraints of eligibility criteria. Specific recruitment strategies would depend on the research question and target study population, but examples could include local YMCA chapters, religious groups/organizations, community groups, libraries and community centres, fairs, museums, local health units, etc. In some situations, engaging with community partners during the design phase of the study is appropriate or ideal (Franck, L. & Williams, S., 2017).   
  2. Consider including an explanation of why particular categories were chosen and/or the relevant benefits of race-based data collection. This explanation can help to alleviate concerns over the collection of race-based data from people who have directly or vicariously faced discrimination (Woodward, 2022) or who have less experience with participating in survey-based research. 
  3. Report demographic comparisons with a level of caution, ideally in conjunction with members of the involved groups. Identifying the causes of inequalities due to immigration, racism, language barriers, cultural preferences, etc., is not always straightforward, therefore, it is important to be careful about making inferences (Canadian Insitute for Health Information, 2020). 

Sex and Gender

The Canadian Institute for Health Information’s Data Model Toolkit (2022) defines these constructs as:

Sex is a “biological concept that includes anatomy, physiology, genes, and hormones.”

Gender is a “social construct that encompasses gender identity and lived gender such as gender expression as a man, woman, both, neither or anywhere along the gender spectrum.”

Transgender is defined as “having a gender identity that is different from one’s sex assigned at birth whereas cisgendered individuals have a gender identity that is the same as their sex assigned at birth.”  


  1. Researchers should consider whether sex assigned at birth or current sex is most relevant to their research.    
  2. Ask separate questions about gender and sex if both are relevant. Sex and gender identity can be used together to recognize cisgender and transgender individuals, but gender and sex should be asked separately and worded in a way that makes clear what is being asked. 
  3. Learn more about gender. Gender is misunderstood if thought of as binary variable (i.e., man/woman). There is a spectrum of gender identities with which people may identify, and gender identities can change over time (Heidari, et al. 2016). 
  4. When asking about gender, show inclusivity and include a category of “Prefer to self-identify” or “Another option not specified” (Frederick, 2021). 

Age and Date of Birth

Age is a common demographic question and can be asked in various ways: using an open field or drop-down menu for the number of years, or when less specific information is needed, in broad categories of age ranges (Canadian Institute for Health Information, 2018). In accordance with collecting the minimal amount of information necessary, researchers should collect date of birth only when there is an appropriate rationale (e.g., for developmental research on children), because birth dates are considered identifiable information especially when collected along with other demographic information. The TCPS 2 2022, Chapter 5, indicates that collection of identifiable information (date of birth rather than age) requires “more stringent protections” which involves safeguarding information and de-identifying the data as soon as possible.


The current TCPS 2 definition refers to Indigenous peoples in Canada as “… persons of Indian (First Nations), Inuit, or Métis descent, regardless of where they reside and whether their names appear on an official register” (TCPS 2 (2022), Glossary). Notably, this definition is narrow and does not capture all Indigenous groups or communities. 


  1. Review the following resources:
  2. Collaborate with Indigenous groups because “the question and response categories for Indigenous identity should be decided in collaboration with Indigenous groups in the jurisdiction where data is being collected and respect fundamental principles of Indigenous data sovereignty (e.g., OCAP®, Inuit Qaujimajatuqangit).” (Candian Institute for Health Information, 2020). 

Need Help?

Need help with the ethics review process or interpreting the TCPS2? Contact: Research Ethics

Looking to discuss inclusivity? Contact: Research Equity


Canadian Institute for Health Information, (2018). In Pursuit of Health Equity: Defining Stratifiers for Measuring Health Inequality. A Focus on Age, Sex, Gender, Income, Education and Geographic Location. Ottawa: ON: CIHI.

Canadian Institute for Health Information (2022). Race-Based and Indigenous Identity Data Collection and Health Reporting in Canada, Supplementary Report. Ottawa, ON: CIHI.

Canadian Institutes of Health Research. Institute of Gender and Health. Retrieved September 19, 2022, from

Canadian Institutes of Health Research. Igh videos and webinars. CIHR. Retrieved September 19, 2022, from

Canadian Institutes of Health Research. Learning about sex and gender – video. CIHR. Retrieved September 19, 2022, from

Canadian Institutes of Health Research (2019). Online Training Modules: Integrating Sex & Gender in Health Research

Canadian Institutes of Health Research, Natural Sciences and Engineering Research Council of Canada, and Social Sciences and Humanities Research Council of Canada, Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans—TCPS 2 (2022).

Canadian Public Health Association (2019). Language Matters: Using Respectful Language in Relation to Sexual Health, Substance Use, STBBIs and Intersection Sources of Stigma.

Centre for Gender & Sexual Health Equity (2022). Gender & Sex In Methods & Measurement Toolkit. Retrieved September 19, 2022, from

FDA U.S. Food & Drug Administration (2016). Collection of Race and Ethnicity Data in Clinical Trials.

LGBT Foundation. Ethical research: good practice guide to researching LGBT communities and issues.

Survey: Demographic questions. (2017). The SAGE Encyclopedia of Communication Research Methods.

Sum Of US (2016). A Progressive’s Style Guide.

Women's Health, research, and sex as a biological variable. Ampersand. (2022, March 22). Retrieved August 23, 2022, from


Brown, L.X.Z. (2011). The significance of semantics: Person-first language: Why it matters. Autistic Hoya – Blog Retrieved June14, 2022, from

Canadian Institute for Health Information (2018). In Pursuit of Health Equity: Defining Stratifiers for Measuring Health Inequality.  Retrieved July 8, 2022, from

Canadian Institute for Health Information (2022). Guidance on the Use of Standards for Race-Based and Indigenous Identity Data Collection and Health Reporting in Canada. Ottawa, ON: CIHI.

Canadian Institute for Health Information (2022). CIHI Reference Data Model Toolkit. Ottawa, ON: CIHI.  Retrieved May 27, 2022, from

Canadian Institutes of Health Research, Natural Sciences and Engineering Research Council of Canada, and Social Sciences and Humanities Research Council of Canada, Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans—TCPS 2 (2022).

Canadian Public Health Association (2019). Language Matters: Using Respectful Language in Relation to Sexual Health, Substance Sse, STBBIs and Intersection Sources of Stigma.

Davies, R. S. (2020). Designing Surveys for Evaluations and Research. EdTech Books.

Dobosh, M. (2017). Survey: demographic questions. In M. Allen (Ed.), The SAGE encyclopedia of Research Methods. Retrieved May 27, 2022 from

Franck, L. and Williams, S. (2017, October 31). Recruitment of Underrepresented Study Populations. Q&A From Session, University of California San Francisco.

Heidari, S., Babor, T.F., De Castro, P. et al. (2016). Sex and Gender Equity in Research: rationale for the SAGER guidelines and recommended use. Research Integrity and Peer Review, 1(2).

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature, 466 (7302), 29-29.

Hughes, J. L., Camden, A. A., & Yangchen, T. (2016). Rethinking and Updating Demographic Questions: Guidance to Improve Descriptions of Research Samples. Psi Chi Journal of Psychological Research, 21(3), 138–151.

Lee, M., & Schuele, C. M. (2010). Demographics. Encyclopedia of research design, 347-348.

Frederick, J. (2021). When to ask (or not ask) demographic questions (Blog Post). Ithaka S+R. Retrieved May 12, 2022, from

Frederick, J. (2020) Four strategies for crafting inclusive and effective demographic questions (Blog Post). Ithaka S+R. Retrieved June 13, 2022, from

Markus, H. R. (2008). Pride, prejudice, and ambivalence: toward a unified theory of race and ethnicity. American Psychologist, 63(8), 651.

More than numbers. More Than Numbers: A Guide Toward Diversity, Equity and Inclusion (DEI) in Data Collection | Schusterman Family Philanthropies. (n.d.). Retrieved August 10, 2022, from

Pew Research Center (2021). Writing Survey Questions. Retrieved June 16, 2022, from

Roberts, S. O., & Mortenson, E. (2022). Challenging the White = neutral framework in psychology. Perspectives on Psychological Science, 17456916221077117.

Region of Waterloo (2022). Guidelines for Demographic Questions at the Region of Waterloo. Citizen Service Guidelines.

Woodward, L. (2022). Demographic Guideline [Unpublished paper]. Department of Philosophy, University of Waterloo

                                                                                                                                                                                                    July 2023