Areas of research
Longitudinal data
Longitudinal data arise when individuals are assessed repeatedly over time and responses and explanatory variables of interest are recorded at each assessment. The most suitable method for analysing data from a particular study depends on the primary scientific question, but all valid methods must address the serial correlation in the responses over time. The most common methods are based on random effect models, marginal (population-averaged) models, and transitional. Further challenges arise when the data also feature cross-sectional clustering, incomplete responses, measurement error, and other complications.
Survival and event history analysis
In many branches of science including demography, epidemiology, medicine and engineering, a considerable amount of information is collected on the nature and timing of events of interest. In the context of medical research this data may represent the time and nature of a variety of clinically important health related events occurring over the course of a patient’s life time as well as any additional explanatory variables. In the context of immunologic research, for example, this data may be the dates of infection with the HIV, diagnosis with AIDS, various opportunistic infections, and death. In cancer research it may represent the dates of diagnosis with bladder cancer, of subsequent recurrences, of metasteses, or death. Finally, in cardiovascular trials, event history data may consist of the dates and types of various types of cardiac events (i.e. angina attacks, arrythmias, myocardial infarction), strokes (left/right hemisphere, etc.), and thromboses. Event history analysis is concerned with the application of statistical methods to this type of data, typically with a view to one of the following objectives: i) to accurately reflect aspects of the natural history of the disease, ii) to identify risk factors for disease progression, iii) to provide measures of the effect of medical or surgical interventions, or iv) to provide a basis for prediction about the future course of the disease at the patient or population level.
Missing data and measurement error
Standard statistical analysis is often challenged by the “imperfectness” of data. Typically, missing data and measurement error arise ubiquitously in practice, and they have been a long concern in various studies, including longitudinal studies, survival analysis, clinical trials, and epidemiology studies.
Missingness or measurement error can considerably degrade the quality of inference. It is well know that ignoring these features in statistical analysis could result in seriously biased results. Research on missing data or measurement error models has been remarkably active over the past few decades, and a great number of methods have been developed for analysis of data with these features. Although there has been rapid development in these areas, newly challenging issues continue to emerge and many problems remain unsolved.
Community studies
Studies in which an intervention is given to an intact group of subjects, for example, students in a school, residents of a community, or patients in a hospital, are common in studying behavioural (non-therapeutic) interventions. Such designs are often called clustered designs and are used because it is not feasible or desirable to deliver the intervention (e.g. a school curriculum, a media campaign, a facility design, etc.) to individual subjects and/or to prevent contamination between comparison groups. In clustered designs, individuals within the same cluster may be more similar than individuals in different clusters, or they may interact, with the result that observations in the same cluster can be correlated. Hence, the assumption of independence between the observations on different subjects that is commonly made in many statistical models may not apply in clustered designs, and failing to take this intra-cluster correlation into account can affect the analysis of differences between comparison groups. In community studies, data are often gathered at different levels; for example, data on individuals within the communities and data on the communities themselves could be relevant for modelling the outcomes of the study. Hence, multi-level models that take the structure of the design into account must be employed.
High dimensional data
Modern high dimensional data, such as spatial data, image data, complex structured longitudinal data, and long-sequence genetic data, have generated significantt new challenges to the use of traditional statistical methods. Directly applying existing analysis methods to such data is often impossible. For instance, the nature of a huge magnitude of biomarkers relative to fairly small sample sizes make ordinary statistical tools inappropriate for data analysis. The challenges have initiated ever increasing research interest and spawned a great amount of research methods. The fundamental facets to handling high dimension data include sensible model-building strategies, valid inference methods, and feasible computing techniques.
Many scientific questions are in the form “Does factor A cause outcome B?” or “Does treatment A cure disease B?” We wish to establish whether a causal link exists between the factor and the outcome. The ideal way to address these questions is through a randomized controlled trial where subjects are randomly assigned to a level of the factor of interest and their subsequent outcome is measured. In some cases however such a trial may be impractical or unethical. The field of causal inference is concerned with answering causal hypothesis based on non randomized observational data. Directly comparing two cohorts may lead to biased results when there exist risk factors or subject characteristics that affect both the type of treatment a subject received and their subsequent outcome. Advanced statistical techniques are necessary to account for the effects of these variables and for valid causal inference.
Collaborative research
The Biostatistics group at the University of Waterloo has a diverse range of collaborative research projects with health research groups on and off campus. A considerable amount of our efforts go into collaborations on chronic disease prevention and cancer control, with colleagues at the Propel Centre for Population Health Impact, where several of us hold affiliations.
The International Tobacco Control Policy Evaluation Project represents a major research program evaluating the effectiveness of government policy changes on smoking behaviour in many different countries around the world.
Through our partnership with the Ontario Institute for Cancer Research we are engaged in a wide range of research projects related to the evaluation of cancer therapies. Graduate internships are supported by our Biostatistics Training Initiative (BTI). Please check the BTI website for further details.
The Canadian Network and Centre for Trials Internationally (CANNeCTIN) network and the associated methodology research pertaining to cardiovascular clinical trials provides links with the Population Health Research Institute, McMaster University.
Innovative methods for the assessment of neural signal processes are conducted through joint work with researchers at the Center for Theoretical Neuroscience on campus at the University of Waterloo.