Professor Schonlau's research interests include applied survey sampling and survey methodology, statistical machine learning from text data such as open-ended questions as well as software implementation.
Text data from open-ended questions in surveys are frequently ignored in the practice of survey research. Yet open-ended questions are important because they do not constrain respondents’ answer choices. Where open-ended questions are necessary, sometimes multiple human coders hand-code answers into one of several categories. Automated algorithms do not achieve an overall accuracy high enough to entirely replace humans. We classify open-ended questions automatically using text mining for easy-to-classify answers and humans for the remainder. Expected accuracies guide the choice of a threshold delineating between “easy” and “hard”.
This approach has spawned a variety of related projects including: algorithms for automatic occupation coding (categorizing answers to the question “What is your job?” in official surveys); classification of open-ended questions that can take more than one label (equivalent to all-that-apply questions); an algorithm for semi-automatic classification all-that-apply type open-ended questions; training a learning algorithm for double coded data, when such codes are available, and whether or not to purposely double code the training data when there is a fixed budget for human coders.
Earlier work includes selectivity in web-surveys, cross-sectional weighting in household surveys, the effect of following rules in household panels and respondent driven sampling.
Professor Schonlau joined the faculty in 2011. From 1999-2011 he was a statistician at RAND corporation and head of the RAND Statistical Consulting Service. He was initially located at RAND's Santa Monica (Los Angeles) headquarters and then moved to RAND's Pittsburgh office. Professor Schonlau spent the academic year 2015/2016 on sabbatical at University of Auckland (New Zealand) and the academic year 2009/2010 on sabbatical at the German Institute for economic analysis (DIW) in Berlin, Germany in cooperation with the Max Planck Institute for Human Development (MPIB). From 1997-1999 Professor Schonlau held a joint appointment with the National Institute of Statistical Sciences and AT&T Labs - Research. He obtained his PhD from the University of Waterloo in 1997 and his master's from Queen's University in 1993. Professor Schonlau grew up in Germany. Professor Schonlau is an elected Fellow of the American Statistical Association.
Selected Recent Publications
Schonlau, M., Couper M. Options for Conducting Web surveys. Statistical Science, May 2017, 32(2), 279-292.
Gweon H., Schonlau M, Wenemark, M. Semi-automated classification for multi-label open-ended questions. Survey Methodology, Dec 2020, Vol. 46, No. 2, pp. 265-282.
He Z, Schonlau M. Coding text answers to open-ended questions: human coders and statistical learning algorithms make similar mistakes. Methods, Data, Analyses. 15(1), 2021, pp. 103-120. https://mda.gesis.org/index.php/mda/article/view/2020.10
Sucholutsky I, Schonlau M. Optimal 1-NN prototypes for pathological geometries. PeerJ Computer Science, April 2021, 7, e464, 1-17, https://peerj.com/articles/cs-464
Schierholz M, Schonlau M. Machine Learning for Occupation Coding – A Comparison Study. Journal of Survey Statistics and Methodology. (to appear in print) Published Online first: November 2, 2020, https://doi.org/10.1093/jssam/smaa023
Sucholutsky I, Schonlau M. `Less than one’-shot learning: Learning N classes from M<N samples. Proceedings of the thirty-fifth Conference on Artificial Intelligence (AAAI’21). Feb 2021. (to appear) http://arxiv.org/abs/2009.08449
Sucholutsky I, Schonlau M. Soft-Label Dataset Distillation and Text Dataset Distillation. The International Joint Conference on Neural Networks (IJCNN21). 18-22 July 2021. https://arxiv.org/abs/1910.02551 (to appear)