Department seminar by Mamadou Yauck, McGill University

Monday, January 13, 2020 10:00 am - 10:00 am EST (GMT -05:00)

Sampling 'hard-to-reach' populations: recent developments

In this talk, I will present some recent methodological developments in capture-recapture methods and Respondent-Driven Sampling (RDS).

In capture-recapture methods, our work is concerned with the analysis of marketing data on the activation of applications (apps) on mobile devices.  Each application has a hashed identification number that is specific to the device on which it has been installed.  This number can be registered by a platform at each activation of the application. Activations on the same device are linked together using the identification number.  By focusing on activations that took place at a business location one can create a capture-recapture data set about devices, or more specifically their users, that "visited" the business:  the units are owners of mobile devices and the capture occasions are time intervals such as days.  A new algorithm for estimating the parameters of a robust design with a fairly large number of capture occasions and a simple parametric bootstrap variance estimator were proposed.

RDS is a variant of link-tracing, a sampling technique for surveying hard-to-reach communities that takes advantage of community members' social networks to reach potential participants. While the RDS sampling mechanism and associated methods of adjusting for the sampling at the analysis stage are well-documented in the statistical sciences literature, methodological focus has largely been restricted to estimation of population means and proportions (e.g.~prevalence). As a network-based sampling method, RDS is faced with the fundamental problem of sampling from population networks where features such as homophily and differential activity (two measures of tendency for individuals with similar traits to share social links) are sensitive to the choice of a simulation and sampling method. In this work, (i) we present strategies for simulating RDS samples with known network and sample characteristics, so as to provide a foundation from which to expand the study of RDS analyses beyond the univariate framework and (ii) embed RDS within a causal inference framework and determine conditions under which average causal effects can be estimated. The proposed methodology will constitute a unifying approach that deals with simple estimands (means and proportions), with a natural extension to the study of associational and causal questions.