Matthias Schonlau


Matthias SchonlauContact Information:
Matthias Schonlau

Matthias Schonlau's personal website

Research interests

Professor Schonlau's research interests include applied survey sampling and survey methodology, statistical machine learning from text data such as open-ended questions as well as software implementation.

Text data from open-ended questions in surveys are frequently ignored in the practice of survey research. Yet open-ended questions are important because they do not constrain respondents’ answer choices. Where open-ended questions are necessary, sometimes multiple human coders hand-code answers into one of several categories. Automated algorithms do not achieve an overall accuracy high enough to entirely replace humans. We classify open-ended questions automatically using text mining for easy-to-classify answers and humans for the remainder.  Expected accuracies guide the choice of a threshold delineating between “easy” and “hard”.

This approach has spawned a variety of related projects including: algorithms for automatic occupation coding (categorizing answers to the question “What is your job?” in official surveys); classification of open-ended questions that can take more than one label (equivalent to all-that-apply questions); an algorithm for semi-automatic classification all-that-apply type open-ended questions;  training a learning algorithm for double coded data, when such codes are available, and whether or not to purposely double code the training data when there is a fixed budget for human coders.

Earlier work includes selectivity in web-surveys, cross-sectional weighting in household surveys, the effect of following rules in household panels and respondent driven sampling.


Professor Schonlau joined the faculty in 2011. From 1999-2011 he was a statistician at RAND corporation and head of the RAND Statistical Consulting Service. He was initially located at RAND's Santa Monica (Los Angeles) headquarters and then moved to RAND's Pittsburgh office. Professor Schonlau spent the academic year 2009/2010 on sabbatical at the German Institute for economic analysis (DIW) in Berlin, Germany. The sabbatical was made possible in cooperation with the Max Planck Institute for Human Development (MPIB). From 1997-1999 Professor Schonlau held a joint appointment with the National Institute of Statistical Sciences and AT&T Labs - Research. He obtained his PhD from the University of Waterloo in 1997 and his master's from Queen's university in 1993. Professor Schonlau grew up in Germany.

Selected Recent Publications

  • Schonlau, M, Weidmer B, Kapteyn, A. Recruiting an Internet Panel Using Respondent Driven Sampling. Journal of Official Statistics, June 2014; 30 (2): 277-289. DOI: 10.2478/jos-2014-0018.
  • Schonlau, M., Toepoel V. Straightlining in Web survey panels over time. Survey Research Methods, Aug 2015, 9(2), 125-137.
  • Schonlau, M., Couper M. Semi-automated categorization of open-ended questions. Survey Research Methods. August 2016, 10(2), 143-152.
  • McLauchlan, C, Schonlau, M. Are Final Comments in Web Survey Panels Associated with Next-Wave Attrition?  Survey Research Methods, Dec 2016, 10(3), 211-224.  
  • Guenther, N., Schonlau. M. Support vector machines. The Stata Journal. Dec 2016, 16(4), 917-937.
  • Gweon, H., Schonlau, M., Kaczmirek L., Blohm, M., Steiner, S. Three Methods for Occupation Coding Based on Statistical Learning. Journal of Official Statistics. 2017, 33 (1), 101-122.
  • Schonlau, M., Couper M. Options for Conducting Web surveys. Statistical Science, May 2017, 32(2), 279-292.
  • Schonlau, M., *Guenther, N. *Sucholutsky, I. Text mining using ngram variables. The Stata Journal. Dec 2017, 17(4), 866-881.
  • Gweon, H., Schonlau, M.  Steiner S.  The conditional nearest neighbor algorithm for classification. PeerJ Computer Science (to appear 2019)
  • He, Z, Schonlau, M. Automatic Coding of Text Answers to Open-ended Questions: Should you Double Code the Training Data? Social Science Computer Review (to appear 2019)
University of Waterloo
Contact information: