Publications

Sort by: Author Type Year

2023

Fernando, L., H. Bindra, and K. Daudjee, "An Experimental Analysis of Quantile Sketches Over Data Streams", International Conference on Extending Database Technology (EDBT), 2023.
Hebert, L., L. Golab, P. Poupart, and R. Cohen, "FedFormer: Contextual Federation With Attention in Reinforcement Learning", International Joint Conference on Autonomous Agents & Multiagent Systems (AAMAS), 2023.
Liu, C., A. Usta, J. Zhao, and S. Salihoglu, "Governor: Turning Open Government Data Portals Into Interactive Databases", ACM Conference on Human Factors in Computing Systems (CHI), 2023.
Buchanan, G. Robert, D. McKay, and C. Clarke, "Made to Measure: A Workshop on Human-Centred Metrics for Information Seeking", Conference on Human Information Interaction and Retrieval (CHIIR), 2023.
Tamber, M. Singh, R. Pradeep, and J. Lin, "Pre-Processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering", European Conference on Information Retrieval (ECIR), 2023.
Clarke, C., F. Diaz, and N. Arabzadeh, "Preference-Based Offline Evaluation", Web Search and Data Mining (WSDM), 2023.
Pradeep, R., H. Chen, L. Gu, M. Singh Tamber, and J. Lin, "PyGaggle: A Gaggle of Resources for Open-Domain Question Answering", European Conference on Information Retrieval (ECIR), 2023.
Ma, X., T. Teofili, and J. Lin, "Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes", ArXiv, vol. abs/2304.12139, 2023.
Yang, J-H., C. Lassance, R. Sampaio de Rezende, K. Srinivasan, M. Redi, S. Clinchant, and J. Lin, "AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation", ArXiv, vol. abs/2304.01961, 2023.
Rorseth, J., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "CREDENCE: Counterfactual Explanations for Document Ranking", ArXiv, vol. abs/2302.04983, 2023.
Ozsu, T., "Data Science: A Systematic Treatment", ArXiv, vol. abs/2301.13761, 2023.
Zou, L., Y. Pang, T. Ozsu, and J. Chen, "Efficient Execution of SPARQL Queries With OPTIONAL and UNION Expressions", ArXiv, vol. abs/2303.13844, 2023.
Kamalloo, E., X. Zhang, O. Ogundepo, N. Thakur, D. Alfonso-Hermelo, M. Rezagholizadeh, and J. Lin, "Evaluating Embedding APIs for Information Retrieval", ArXiv, vol. abs/2305.06300, 2023.
Kamalloo, E., N. Dziri, C. Clarke, and D. Rafiei, "Evaluating Open-Domain Question Answering in the Era of Large Language Models", ArXiv, vol. abs/2305.06984, 2023.
Ilyas, I., J. P. Lacerda, Y. Li, U. Farooq Minhas, A. Mousavi, J. Pound, T. Rekatsinas, and C. Sumanth, "Growing and Serving Large Open-Domain Knowledge Graphs", ArXiv, vol. abs/2305.09464, 2023.
Mohoney, J., A. Pacaci, S. Rahman Chowdhury, A. Mousavi, I. Ilyas, U. Farooq Minhas, J. Pound, and T. Rekatsinas, "High-Throughput Vector Similarity Search in Knowledge Graphs", ArXiv, vol. abs/2304.01926, 2023.
Pradeep, R., K. Hui, J. Gupta, Á. Dániel Lelkes, H. Zhuang, J. Lin, D. Metzler, and V. Q. Tran, "How Does Generative Retrieval Scale to Millions of Passages?", ArXiv, vol. abs/2305.11841, 2023.
Lin, S-C., A. Asai, M. Li, B. Oguz, J. Lin, Y. Mehdad, W-tau. Yih, and X. Chen, "How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval", ArXiv, vol. abs/2302.07452, 2023.
Faggioli, G., L. Dietz, C. Clarke, G. Demartini, M. Hagen, C. Hauff, N. Kando, E. Kanoulas, M. Potthast, B. Stein, et al., "Perspectives on Large Language Models for Relevance Judgment", ArXiv, vol. abs/2304.09161, 2023.
Hebert, L., L. Golab, and R. Cohen, "Predicting Hateful Discussions on Reddit Using Graph Transformer Networks And Communal Context", ArXiv, vol. abs/2301.04248, 2023.
Hebert, L., H. Yi Chen, R. Cohen, and L. Golab, "Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content", ArXiv, vol. abs/2301.10871, 2023.
Lin, J., D. Alfonso-Hermelo, V. Jeronymo, E. Kamalloo, C. Lassance, R. Frassetto Nogueira, O. Ogundepo, M. Rezagholizadeh, N. Thakur, J-H. Yang, et al., "Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval", ArXiv, vol. abs/2304.01019, 2023.
Li, M., S-C. Lin, X. Ma, and J. Lin, "SLIM: Sparsified Late Interaction for Multi-Vector Retrieval With Inverted Indexes", ArXiv, vol. abs/2302.06587, 2023.
Seltzer, J., J. Pan, K. Cheng, Y. Sun, S. Kolagati, J. Lin, and S. Zong, "SmartProbe: A Virtual Moderator for Market Research Surveys", ArXiv, vol. abs/2305.08271, 2023.
Akiki, C., O. Ogundepo, A. Piktus, X. Zhang, A. Oladipo, J. Lin, and M. Potthast, "Spacerini: Plug-and-Play Search Engines With Pyserini and Hugging Face", ArXiv, vol. abs/2302.14534, 2023.
Zong, S., J. Seltzer, J. Pan, K. Cheng, and J. Lin, "Which Model Shall I Choose? Cost/Quality Trade-Offs for Text Classification Tasks", ArXiv, vol. abs/2301.07006, 2023.
Ma, X., X. Zhang, R. Pradeep, and J. Lin, "Zero-Shot Listwise Document Reranking With a Large Language Model", ArXiv, vol. abs/2305.02156, 2023.

2022

Trotman, A., J. Mackenzie, P. Parameswaran, and J. Lin, "A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Borgida, A., E. Franconi, D. Toman, and G. Weddell, "Accessing Document Data Sources Using Referring Expression Types", International Workshop on Description Logics (DL), 2022.
Ogundepo, O., X. Zhang, S. Sun, K. Duh, and J. Lin, "AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Devins, J., J. Tibshirani, and J. Lin, "Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini", Web Search and Data Mining (WSDM), 2022.
Ma, X., K. Sun, R. Pradeep, M. Li, and J. Lin, "Another Look at DPR: Reproduction of Training and Replication Of Retrieval", European Conference on Information Retrieval (ECIR), 2022.
Liu, Y., C. Hu, and J. Lin, "Another Look at Information Retrieval as Statistical Translation", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Zhong, W., Y. Xie, and J. Lin, "Applying Structural and Dense Semantic Matching for the ARQMath Lab 2022, Clef", Conference and Labs of the Evaluation Forum (CLEF), 2022.
Li, M., X. Zhang, J. Xin, H. Zhang, and J. Lin, "Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Chambers, O., R. Cohen, M. Grossman, and Q. Chen, "Creating a User Model to Support User-Specific Explanations of AI Systems", User Modeling, Adaptation, and Personalization (UMAP), 2022.
Shi, P., L. Song, L. Jin, H. Mi, H. Bai, J. Lin, and D. Yu, "Cross-Lingual Text-to-SQL Semantic Parsing With Representation Mixup", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Karegar, R., M. Mirsafian, P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Discovering Domain Orders via Order Dependencies", IEEE International Conference on Data Engineering (ICDE), 2022.
Ma, X., R. Pradeep, R. Nogueira, and J. Lin, "Document Expansion Baselines and Learned Sparse Lexical Representations For MS MARCO V1 and V2", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Kane, A., Y. Ki Ng, and F. Tompa, "Dowsing for Answers to Math Questions: Doing Better With Less", Conference and Labs of the Evaluation Forum (CLEF), 2022.
Shehata, D., N. Arabzadeh, and C. Clarke, "Early Stage Sparse Retrieval With Entity Linking", International Conference on Information and Knowledge Management (CIKM), 2022.
Pacaci, A., A. Bonifati, and T. Ozsu, "Evaluating Complex Queries on Streaming Graphs", IEEE International Conference on Data Engineering (ICDE), 2022.
Zhong, W., J-H. Yang, Y. Xie, and J. Lin, "Evaluating Token-Level and Passage-Level Dense Retrieval Models For Math Information Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Chen, Y., G. Xiao, T. Ozsu, Z. Tang, A. Y. Zomaya, and K. Li, "Exploiting Hierarchical Parallelism and Reusability in Tensor Kernel Processing on Heterogeneous HPC Systems", IEEE International Conference on Data Engineering (ICDE), 2022.
Jiang, Z., Y. Dai, J. Xin, M. Li, and J. Lin, "Few-Shot Non-Parametric Learning With Deep Latent Variable Model", Conference on Neural Information Processing Systems (NeurIPS), 2022.
Vezvaei, A., L. Golab, M. Kargar, D. Srivastava, J. Szlichta, and M. Zihayat, "Fine-Tuning Dependencies With Parameters", International Conference on Extending Database Technology (EDBT), 2022.
Toman, D., and G. Weddell, "First Order Rewritability in Ontology-Mediated Querying in Horn Description Logics", AAAI Conference on Artificial Intelligence (AAAI), 2022.
Seltzer, J., K. Cheng, S. Zong, and J. Lin, "Flipping the Script: Inverse Information Seeking Dialogues for Market Research", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Lin, J., D. Campos, N. Craswell, B. Mitra, and E. Yilmaz, "Fostering Coopetition While Plugging Leaks: The Design and Implementation Of the MS MARCO Leaderboards", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Chopra, S., and L. Golab, "Gender Differences in Early Career Performance Reviews: A Text Mining Study", International Conference on Extending Database Technology (EDBT), 2022.
Kalavri, V., and S. Salihoglu, "GRADES-NDA'22: 5th International Workshop on Graph Data Management Experiences and Systems (GRADES) and Network Data Analytics (NDA)", ACM International Conference on Management of Data (SIGMOD), 2022.
Jin, G., N. Anzum, and S. Salihoglu, "GRainDB: A Relational-Core Graph-Relational DBMS", Conference on Innovative Data Systems Research (CIDR), 2022.
Dehghan, M., D. Kumar, and L. Golab, "GRS: Combining Generation and Revision in Unsupervised Sentence Simplification", Association for Computational Linguistics (ACL), 2022.
Yan, X., C. Luo, C. Clarke, N. Craswell, E. M. Voorhees, and P. Castells, "Human Preferences as Dueling Bandits", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Guo, R., V. Guo, A. Kim, J. Hildred, and K. Daudjee, "Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers", Conference on Machine Learning and Systems (MLSys), 2022.
Zhong, Y., J. Xiao, T. Vetterli, M. Matin, E. Loo, J. Lin, R. Bourgon, and O. Shapira, "Improving Precancerous Case Characterization via Transformer-Based Ensemble Learning", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Li, H., S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "Improving Query Representations for Dense Retrieval With Pseudo Relevance Feedback: A Reproducibility Study", European Conference on Information Retrieval (ECIR), 2022.
Yang, M. Y. R., S. Yang, and J. Lin, "Integration of Text and Geospatial Search for Hydrographic Datasets Using the Lucene Search Library", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2022.
Zhang, D., A. Vakili Tahami, M. Abualsaud, and M. Smucker, "Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Feng, E., D. Toman, and G. Weddell, "Magic Sets in Interpolation-Based Rule Driven Query Optimization", International Web Rule Symposium (RuleML), 2022.
Peng, P., T. Ozsu, L. Zou, C. Yan, and C. Liu, "MPC: Minimum Property-Cut RDF Graph Partitioning", IEEE International Conference on Data Engineering (ICDE), 2022.
Pradeep, R., Y. Li, Y. Wang, and J. Lin, "Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Hebert, L., L. Golab, and R. Cohen, "Predicting Hateful Discussions on Reddit Using Graph Transformer Networks And Communal Context", IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2022.
Abebe, M., H. Lazu, and K. Daudjee, "Proteus: Autonomous Adaptive Storage for Mixed Workloads", ACM International Conference on Management of Data (SIGMOD), 2022.
Li, H., S. Zhuang, X. Ma, J. Lin, and G. Zuccon, "Pseudo-Relevance Feedback With Dense Retrievers in Pyserini", Australasian Document Computing Symposium (ADCS), 2022.
Ilyas, I., T. Rekatsinas, V. Konda, J. Pound, X. Qi, and M. A. Soliman, "Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale", ACM International Conference on Management of Data (SIGMOD), 2022.
Tang, R., K. Kumar, G. Yang, A. Pandey, Y. Mao, V. Belyaev, M. Emmadi, C. G. Murray, F. Türe, and J. Lin, "SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Pradeep, R., Y. Liu, X. Zhang, Y. Li, A. Yates, and J. Lin, "Squeezing Water From a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking", European Conference on Information Retrieval (ECIR), 2022.
Tang, R., K. Kumar, J. Xin, P. Vyas, W. Li, G. Yang, Y. Mao, C. G. Murray, and J. Lin, "Temporal Early Exiting for Streaming Speech Commands Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.
Abualsaud, M., and M. Smucker, "The Dark Side of Relevance: The Effect of Non-Relevant Results On Search Behavior", Conference on Human Information Interaction and Retrieval (CHIIR), 2022.
Mohapatra, S., S. Sasy, X. He, G. Kamath, and O. Thakkar, "The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection", AAAI Conference on Artificial Intelligence (AAAI), 2022.
Li, H., S. Wang, S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "To Interpolate or Not to Interpolate: PRF, Dense and Sparse Retrievers", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Voorhees, E. M., N. Craswell, and J. Lin, "Too Many Relevants: Whither Cranfield Test Collections?", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Xue, H., F. D. Salim, Y. Ren, and C. Clarke, "Translating Human Mobility Forecasting Through Natural Language Generation", Web Search and Data Mining (WSDM), 2022.
Borgida, A., E. Franconi, D. Toman, and G. Weddell, "Understanding Document Data Sources Using Ontologies With Referring Expressions", Australian Joint Conference on Artificial Intelligence (AUS-AI), 2022.
Arabzadeh, N., M. Seifikar, and C. Clarke, "Unsupervised Question Clarity Prediction Through Retrieved Item Coherency", International Conference on Information and Knowledge Management (CIKM), 2022.
Durvasula, S., R. Kiguru, S. Mathur, J. Xu, J. Lin, and N. Vijaykumar, "VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks", International Conference on Parallel Architectures and Compilation Techniques (PACT), 2022.
Shi, P., R. Zhang, H. Bai, and J. Lin, "XRICL: Cross-Lingual Retrieval-Augmented in-Context Learning For Cross-Lingual Text-to-SQL Semantic Parsing", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Lin, S-C., and J. Lin, "A Dense Representation Framework for Lexical and Semantic Matching", ArXiv, vol. abs/2206.09912, 2022.
Chen, J., Y. Huang, M. Wang, S. Salihoglu, and K. Salem, "Accurate Summary-Based Cardinality Estimation Through the Lens Of Cardinality Estimation Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 8, pp. 1533--1545, 2022.
Ogundepo, O., X. Zhang, and J. Lin, "Better Than Whitespace: Information Retrieval for Languages Without Custom Tokenizers", ArXiv, vol. abs/2210.05481, 2022.
Lin, J., "Building a Culture of Reproducibility in Academic Research", ArXiv, vol. abs/2212.13534, 2022.
Xin, J., R. Tang, Z. Jiang, Y. Yu, and J. Lin, "Building an Efficiency Pipeline: Commutativity and Cumulativeness Of Efficiency Operators for Transformers", ArXiv, vol. abs/2208.00483, 2022.
Mazmudar, M., T. Humphries, J. Liu, M. Rafuse, and X. He, "Cache Me if You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration", ArXiv, vol. abs/2211.15732, 2022.
Mazmudar, M., T. Humphries, J. Liu, M. Rafuse, and X. He, "Cache Me if You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration", Proceedings of the VLDB Endowment (PVLDB), vol. 16, issue 4, pp. 574--586, 2022.
Voorhees, E. M., I. Soboroff, and J. Lin, "Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?", ArXiv, vol. abs/2201.11086, 2022.
Li, M., X. Zhang, J. Xin, H. Zhang, and J. Lin, "Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking", ArXiv, vol. abs/2205.09638, 2022.
Li, M., S-C. Lin, B. Oguz, A. Ghoshal, J. Lin, Y. Mehdad, W-tau. Yih, and X. Chen, "CITADEL: Conditional Token Interaction via Dynamic Lexical Routing For Efficient and Effective Multi-Vector Retrieval", ArXiv, vol. abs/2211.10411, 2022.
Kassaie, B., E. L. Irving, and F. Tompa, "Computer-Assisted Cohort Identification in Practice", ACM Transactions on Computing for Healthcare, vol. 3, issue 2, pp. 17:1--17:28, 2022.
Zheng, Z., L. Zheng, M. Alipour Langouri, F. Chiang, L. Golab, J. Szlichta, and S. Baskaran, "Contextual Data Cleaning With Ontology Functional Dependencies", Journal of Data and Information Quality, vol. 14, issue 3, pp. 20:1--20:26, 2022.
Sadri, N., and G. Cormack, "Continuous Active Learning Using Pretrained Transformers", ArXiv, vol. abs/2208.06955, 2022.
Ilyas, I., and F. Naumann, "Data Errors: Symptoms, Causes and Origins", IEEE Data Engineering Bulletin, vol. 45, issue 1, pp. 4--9, 2022.
Thakur, N., N. Reimers, and J. Lin, "Domain Adaptation for Memory-Efficient Dense Retrieval", ArXiv, vol. abs/2205.11498, 2022.
Pappachan, P., S. Zhang, X. He, and S. Mehrotra, "Don't Be a Tattle-Tale: Preventing Leakages Through Data Dependencies On Access Control Protected Data", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 2437--2449, 2022.
Pappachan, P., S. Zhang, X. He, and S. Mehrotra, "Don't Be a Tattle-Tale: Preventing Leakages Through Data Dependencies On Access Control Protected Data", ArXiv, vol. abs/2207.08757, 2022.
Shehata, D., N. Arabzadeh, and C. Clarke, "Early Stage Sparse Retrieval With Entity Linking", ArXiv, vol. abs/2208.04887, 2022.
Artikis, A., N. Tatbul, L. Golab, and M. Sadoghi, "Editorial", Information Systems, vol. 109, pp. 102088, 2022.
Kargar, M., L. Golab, D. Srivastava, J. Szlichta, and M. Zihayat, "Effective Keyword Search Over Weighted Graphs", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 34, issue 2, pp. 601--616, 2022.
Zhong, W., J-H. Yang, and J. Lin, "Evaluating Token-Level and Passage-Level Dense Retrieval Models For Math Information Retrieval", ArXiv, vol. abs/2203.11163, 2022.
Dadvar, V., L. Golab, and D. Srivastava, "Exploring Data Using Patterns: A Survey", Information Systems, vol. 108, pp. 101985, 2022.
Hebert, L., L. Golab, P. Poupart, and R. Cohen, "FedFormer: Contextual Federation With Attention in Reinforcement Learning", ArXiv, vol. abs/2205.13697, 2022.
Jiang, Z., Y. Dai, J. Xin, M. Li, and J. Lin, "Few-Shot Non-Parametric Learning With Deep Latent Variable Model", ArXiv, vol. abs/2206.11573, 2022.
Yan, D., G. Guo, J. Khalil, T. Ozsu, W-S. Ku, and J. C. S. Lui, "G-Thinker: A General Distributed Framework for Finding Qualified Subgraphs In a Big Graph With Load Balancing", The VLDB Journal, vol. 31, issue 2, pp. 287--320, 2022.
Dehghan, M., D. Kumar, and L. Golab, "GRS: Combining Generation and Revision in Unsupervised Sentence Simplification", ArXiv, vol. abs/2203.09742, 2022.
Yan, X., C. Luo, C. Clarke, N. Craswell, E. M. Voorhees, and P. Castells, "Human Preferences as Dueling Bandits", ArXiv, vol. abs/2204.10362, 2022.
Zhong, Y., J. Xiao, T. Vetterli, M. Matin, E. Loo, J. Lin, R. Bourgon, and O. Shapira, "Improving Precancerous Case Characterization via Transformer-Based Ensemble Learning", ArXiv, vol. abs/2212.05150, 2022.
Herodotou, H., P. K. Chrysanthis, S. Chen, M. Hsu, K. Daudjee, Y. Wu, and C. Costa, "Introduction to the special issue on self‑managing and hardware‑optimized database systems 2020", Distributed and Parallel Databases, vol. 40, issue 1, pp. 1--3, 2022.
Xia, K., W. Zhao, A. Jolfaei, and T. Ozsu, "Introduction to the Special Section on Edge/Fog Computing for Infectious Disease Intelligence", ACM Transactions on Internet Technology (TOIT), vol. 22, issue 3, pp. 63e:1--63e:2, 2022.
Jiang, Z., M. Y. R. Yang, M. Tsirlin, R. Tang, and J. Lin, "Less Is More: Parameter-Free Text Classification With Gzip", ArXiv, vol. abs/2212.09410, 2022.
Ilyas, I., and T. Rekatsinas, "Machine Learning and Data Cleaning: Which Serves the Other?", Journal of Data and Information Quality, vol. 14, issue 3, pp. 13:1--13:11, 2022.
Zhang, X., N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, M. Rezagholizadeh, and J. Lin, "Making a MIRACL: Multilingual Information Retrieval Across a Continuum Of Languages", ArXiv, vol. abs/2210.09984, 2022.
Jin, G., and S. Salihoglu, "Making RDBMSs Efficient on Graph Workloads Through Predefined Joins", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 5, pp. 1011--1023, 2022.
Ghayyur, S., D. Ghosh, X. He, and S. Mehrotra, "MIDE: Accuracy Aware Minimally Invasive Data Exploration for Decision Support", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 2653--2665, 2022.
Mhedhbi, A., and S. Salihoglu, "Modern Techniques for Querying Graph-Structured Relations: Foundations, System Implementations, and Open Challenges", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 12, pp. 3762--3765, 2022.
Ammar, K., S. Sahu, S. Salihoglu, and T. Ozsu, "Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs", ArXiv, vol. abs/2208.00273, 2022.
Ammar, K., S. Sahu, S. Salihoglu, and T. Ozsu, "Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 3186--3198, 2022.
Dadvar, V., L. Golab, and D. Srivastava, "POEM: Pattern-Oriented Explanations of CNN Models", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 12, pp. 3618--3621, 2022.
Gao, L., X. Ma, J. Lin, and J. Callan, "Precise Zero-Shot Dense Retrieval Without Relevance Labels", ArXiv, vol. abs/2212.10496, 2022.
Liu, L., M. Li, J. Lin, S. Riedel, and P. Stenetorp, "Query Expansion Using Contextual Clue Sampling With Language Models", ArXiv, vol. abs/2210.07093, 2022.
Ozsu, T., "Reminiscences on Influential Papers", SIGMOD Record, vol. 51, issue 2, pp. 44--46, 2022.
Yamamoto, T., Z. Dou, N. Kando, C. Clarke, M. P. Kato, and Y. Liu, "Report on the 16th Round of NII Testbeds and Community for Information Access Research (NTCIR-16)", SIGIR Forum, vol. 56, issue 2, pp. 7:1--7:8, 2022.
Ilyas, I., T. Rekatsinas, V. Konda, J. Pound, X. Qi, and M. A. Soliman, "Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale", ArXiv, vol. abs/2204.07309, 2022.
Sheshbolouki, A., and T. Ozsu, "sGrapp: Butterfly Approximation in Streaming Graphs", ACM Transactions on Knowledge Discovery from Data, vol. 16, issue 4, pp. 76:1--76:43, 2022.
Arabzadeh, N., A. Vtyurina, X. Yan, and C. Clarke, "Shallow Pooling for Sparse Labels", Information Retrieval Journal, vol. 25, issue 4, pp. 365--385, 2022.
Li, Y., L. Zou, T. Ozsu, and D. Zhao, "Space-Efficient Subgraph Search Over Streaming Graph With Timing Order Constraint", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 34, issue 9, pp. 4453--4467, 2022.
Tang, R., K. Kumar, G. Yang, A. Pandey, Y. Mao, V. Belyaev, M. Emmadi, C. G. Murray, F. Türe, and J. Lin, "SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale", ArXiv, vol. abs/2211.11740, 2022.
Gao, L., X. Ma, J. Lin, and J. Callan, "Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval", ArXiv, vol. abs/2203.05765, 2022.
Wang, R., J. Wang, S. Idreos, T. Ozsu, and W. G. Aref, "The Case for Distributed Shared-Memory Databases With RDMA-Enabled Memory Disaggregation", ArXiv, vol. abs/2207.03027, 2022.
Wang, R., J. Wang, S. Idreos, T. Ozsu, and W. G. Aref, "The Case for Distributed Shared-Memory Databases With RDMA-Enabled Memory Disaggregation", Proceedings of the VLDB Endowment (PVLDB), vol. 16, issue 1, pp. 15--22, 2022.
Abebe, M., H. Lazu, and K. Daudjee, "Tiresias: Enabling Predictive Autonomous Storage and Indexing", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 3126--3136, 2022.
Li, H., S. Wang, S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "To Interpolate or Not to Interpolate: PRF, Dense and Sparse Retrievers", ArXiv, vol. abs/2205.00235, 2022.
Zhang, X., K. Ogueji, X. Ma, and J. Lin, "Towards Best Practices for Training Multilingual Dense Retrieval Models", ArXiv, vol. abs/2204.02363, 2022.
Arabzadeh, N., M. Seifikar, and C. Clarke, "Unsupervised Question Clarity Prediction Through Retrieved Item Coherency", ArXiv, vol. abs/2208.04882, 2022.
Nanayakkara, P., J. Bater, X. He, J. Hullman, and J. Rogers, "Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases", ArXiv, vol. abs/2201.05964, 2022.
Nanayakkara, P., J. Bater, X. He, J. Hullman, and J. Rogers, "Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases", Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2022, issue 2, pp. 601--618, 2022.
Durvasula, S., R. Kiguru, S. Mathur, J. Xu, J. Lin, and N. Vijaykumar, "VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks", ArXiv, vol. abs/2210.08729, 2022.
Tang, R., A. Pandey, Z. Jiang, G. Yang, K. Kumar, J. Lin, and F. Türe, "What the DAAM: Interpreting Stable Diffusion Using Cross Attention", ArXiv, vol. abs/2210.04885, 2022.
Shi, P., R. Zhang, H. Bai, and J. Lin, "XRICL: Cross-Lingual Retrieval-Augmented in-Context Learning For Cross-Lingual Text-to-SQL Semantic Parsing", ArXiv, vol. abs/2210.13693, 2022.

2021

Lin, J., R. Nogueira, and A. Yates, Pretrained Transformers for Text Ranking: BERT and Beyond: Morgan & Claypool, 2021.
Mhedhbi, A., P. Gupta, S. Khaliq, and S. Salihoglu, "A+ Indexes: Tunable and Space-Efficient Adjacency Lists in Graph Database Management Systems", IEEE International Conference on Data Engineering (ICDE), 2021.
Parsa, M. S., and L. Golab, "Academic Integrity in Online Education During the COVID-19 Pandemic: A Social Media Mining Study", Educational Data Mining (EDM), 2021.
Chopra, S., and L. Golab, "Analyzing Ranking Strategies to Characterize Competition for Co-Operative Work Placements", Educational Data Mining (EDM), 2021.
Zhong, W., X. Zhang, J. Xin, R. Zanibbi, and J. Lin, "Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens", Conference and Labs of the Evaluation Forum (CLEF), 2021.
Brown, D. G., L. Byl, and M. Grossman, "Are Machine Learning Corpora "Fair Dealing" Under Canadian Law?", International Conference on Computational Creativity (ICCC), 2021.
Xin, J., R. Tang, Y. Yu, and J. Lin, "BERxiT: Early Exiting for BERT With Better Fine-Tuning and Extension To Regression", Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.
Alway, K., E. Blais, and S. Salihoglu, "Box Covers and Domain Orderings for Beyond Worst-Case Join Processing", International Conference on Database Theory (ICDT), 2021.
Zhang, E., S-C. Lin, J-H. Yang, R. Pradeep, R. Nogueira, and J. Lin, "Chatty Goose: A Python Framework for Conversational Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Zhang, X., A. Yates, and J. Lin, "Comparing Score Aggregation Approaches for Document Retrieval With Pretrained Transformers", European Conference on Information Retrieval (ECIR), 2021.
Lin, S-C., J-H. Yang, and J. Lin, "Contextualized Query Embeddings for Conversational Search", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Glasbergen, B., F. Wu, and K. Daudjee, "Dendrite: Bolt-on Adaptivity for Data Systems", ACM International Conference on Management of Data (SIGMOD), 2021.
Zhang, M., L. Tan, Z. Fu, K. Xiong, J. Lin, M. Li, and Z. Tu, "Don't Change Me! User-Controllable Selective Paraphrase Generation", Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.
Ng, Y. Ki, D. J. Fraser, B. Kassaie, and F. Tompa, "Dowsing for Answers to Math Questions: Ongoing Viability of Traditional MathIR", Conference and Labs of the Evaluation Forum (CLEF), 2021.
Ng, Y. Ki, D. J. Fraser, B. Kassaie, and F. Tompa, "Dowsing for Math Answers", Conference and Labs of the Evaluation Forum (CLEF), 2021.
Xia, S., B. Chang, K. Knopf, Y. He, Y. Tao, and X. He, "DPGraph: A Benchmark Platform for Differentially Private Graph Analysis", ACM International Conference on Management of Data (SIGMOD), 2021.
Kargar, M., L. Golab, D. Srivastava, J. Szlichta, and M. Zihayat, "Effective Keyword Search in Weighted Graphs (Extended Abstract)", IEEE International Conference on Data Engineering (ICDE), 2021.
Karegar, R., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Efficient Discovery of Approximate Order Dependencies", International Conference on Extending Database Technology (EDBT), 2021.
Hofstätter, S., S-C. Lin, J-H. Yang, J. Lin, and A. Hanbury, "Efficiently Teaching an Effective Dense Retriever With Balanced Topic Aware Sampling", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Clarke, C., C. Luo, and M. Smucker, "Evaluation Measures Based on Preference Graphs", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Golab, L., and D. Srivastava, "Exploring Data Using Pa Erns: A Survey and Open Problems", International Workshop on Data Warehousing and OLAP (DOLAP), 2021.
Jiang, K., R. Pradeep, and J. Lin, "Exploring Listwise Evidence Reasoning With T5 for Fact Verification", Association for Computational Linguistics (ACL), 2021.
Chen, H. H., S. Mohapatra, G. Michalopoulos, X. He, and I. McKillop, "Federated Deep Learning Architecture for Personalized Healthcare", Medical Informatics Europe (MIE), 2021.
Toman, D., and G. Weddell, "FO Rewritability for OMQ Using Beth Definability and Interpolation", International Workshop on Description Logics (DL), 2021.
Sahu, S., and S. Salihoglu, "Graphsurge: Graph Analytics on View Collections Using Differential Computation", ACM International Conference on Management of Data (SIGMOD), 2021.
Jiang, Z., R. Tang, J. Xin, and J. Lin, "How Does BERT Rerank Passages? An Attribution Analysis With Information Bottlenecks", Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2021.
Lin, S-C., J-H. Yang, and J. Lin, "In-Batch Negatives for Knowledge Distillation With Tightly-Coupled Teachers for Dense Retrieval", Workshop on Representation Learning for NLP (RepL4NLP), 2021.
Farhat, O., K. Daudjee, and L. Querzoni, "Klink: Progress-Aware Scheduling for Streaming Data Systems", ACM International Conference on Management of Data (SIGMOD), 2021.
Xia, S., N. Anzum, S. Salihoglu, and J. Zhao, "KTabulator: Interactive Ad Hoc Table Creation Using Knowledge Graphs", ACM Conference on Human Factors in Computing Systems (CHI), 2021.
Zhang, Y., C. Hu, Y. Liu, H. Fang, and J. Lin, "Learning to Rank in the Age of Muppets: Effectiveness-Efficiency Tradeoffs In Multi-Stage Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Craswell, N., B. Mitra, E. Yilmaz, D. Campos, and J. Lin, "MS MARCO: Benchmarking Ranking Models in the Large-Data Regime", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Li, M., M. Li, K. Xiong, and J. Lin, "Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Langendoen, K., B. Glasbergen, and K. Daudjee, "NIR-Tree: A Non-Intersecting R-Tree", International Conference on Statistical and Scientific Database Management (SSDBM), 2021.
Lin, J., X. Ma, J. Mackenzie, and A. Mallia, "On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2021.
Shafieinejad, M., F. Kerschbaum, and I. Ilyas, "PCOR: Private Contextual Outlier Release via Differentially Private Search", ACM International Conference on Management of Data (SIGMOD), 2021.
He, X., J. Rogers, J. Bater, A. Machanavajjhala, C. Wang, and X. Wang, "Practical Security and Privacy for Database Systems", ACM International Conference on Management of Data (SIGMOD), 2021.
Arabzadeh, N., X. Yan, and C. Clarke, "Predicting Efficiency/Effectiveness Trade-Offs for Dense vs. Sparse Retrieval Strategy Selection", International Conference on Information and Knowledge Management (CIKM), 2021.
Yates, A., R. Nogueira, and J. Lin, "Pretrained Transformers for Text Ranking: BERT and Beyond", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Yates, A., R. Nogueira, and J. Lin, "Pretrained Transformers for Text Ranking: BERT and Beyond", Web Search and Data Mining (WSDM), 2021.
Toman, D., and G. Wedell, "Projective Beth Definability and Craig Interpolation for Relational Query Optimization (Material to Accompany Invited Talk)", International Conference on Principles of Knowledge Representation and Reasoning (KR), 2021.
Livshits, E., R. Kochirgan, S. Tsur, I. Ilyas, B. Kimelfeld, and S. Roy, "Properties of Inconsistency Measures for Databases", ACM International Conference on Management of Data (SIGMOD), 2021.
Zhong, W., and J. Lin, "PYA0: A Python Toolkit for Accessible Math-Aware Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Lin, J., X. Ma, S-C. Lin, J-H. Yang, R. Pradeep, and R. Nogueira, "Pyserini: A Python Toolkit for Reproducible Information Retrieval Research With Sparse and Dense Representations", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Anzum, N., and S. Salihoglu, "R2GSync and Edge Views: Practical RDBMS to GDBMS Synchronization", ACM International Conference on Management of Data (SIGMOD), 2021.
Odunayo, O., N. N. Sookoo, G. Bathla, A. Cavallin, B. D. Persaud, K. Szigeti, P. Van Cappellen, and J. Lin, "Rescuing Historical Climate Observations to Support Hydrological Research: A Case Study of Solar Radiation Data", ACM Symposium on Document Engineering (DocEng), 2021.
Nemec, J., H. Davoudi, L. Golab, M. Kargar, Y. Lytvyn, P. Mierzejewski, J. Szlichta, and M. Zihayat, "RW-Team: Robust Team Formation Using Random Walk", International Conference on Information and Knowledge Management (CIKM), 2021.
Pradeep, R., X. Ma, R. Frassetto Nogueira, and J. Lin, "Scientific Claim Verification With VerT5erini", International Workshop on Health Text Mining and Information Analysis (Louhi), 2021.
Bai, H., P. Shi, J. Lin, Y. Xie, L. Tan, K. Xiong, W. Gao, and M. Li, "Segatron: Segment-Aware Transformer for Language Modeling and Understanding", AAAI Conference on Artificial Intelligence (AAAI), 2021.
Bai, H., P. Shi, J. Lin, L. Tan, K. Xiong, W. Gao, J. Liu, and M. Li, "Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation With GPT2", Association for Computational Linguistics (ACL), 2021.
Anand, M., J. Zhang, S. Ding, J. Xin, and J. Lin, "Serverless BM25 Search and BERT Reranking", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2021.
Lin, J., D. Campos, N. Craswell, B. Mitra, and E. Yilmaz, "Significant Improvements Over the State of the Art? A Case Study Of the MS MARCO Document Ranking Leaderboard", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Ma, X., M. Li, K. Sun, J. Xin, and J. Lin, "Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Xin, J., R. Tang, Y. Yu, and J. Lin, "The Art of Abstention: Selective Prediction and Error Regularization For Natural Language Processing", Association for Computational Linguistics (ACL), 2021.
Han, X., Y. Liu, and J. Lin, "The Simplest Thing That Can Possibly Work: (Pseudo-)Relevance Feedback Via Text Classification", International Conference on the Theory of Information Retrieval (ICTIR), 2021.
Mitra, A., C. Gorenflo, L. Golab, and S. Keshav, "TimeFabric: Trusted Time for Permissioned Blockchains", International Symposium on Foundations and Applications of Blockchain (FAB) , 2021.
Deshmukh, A. Anand, Q. Zhang, M. Li, J. Lin, and L. Mou, "Unsupervised Chunking as Syntactic Structure Induction With a Knowledge-Transfer Approach", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Pradeep, R., X. Ma, R. Nogueira, and J. Lin, "Vera: Prediction Techniques for Reducing Harmful Misinformation In Consumer Health Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Abualsaud, M., M. Smucker, and C. Clarke, "Visualizing Searcher Gaze Patterns", Conference on Human Information Interaction and Retrieval (CHIIR), 2021.
Tang, R., K. Kumar, K. Chalkley, J. Xin, L. Zhang, W. Li, G. Yang, Y. Mao, J. Shin, G. Craig Murray, et al., "Voice Query Auto Completion", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Lin, J., "A Proposed Conceptual Framework for a Representational Approach To Information Retrieval", SIGIR Forum, vol. 55, issue 2, pp. 4:1--4:29, 2021.
Ma, X., K. Sun, R. Pradeep, and J. Lin, "A Replication Study of Dense Passage Retriever", ArXiv, vol. abs/2104.05740, 2021.
Chen, J., Y. Huang, M. Wang, S. Salihoglu, and K. Salem, "Accurate Summary-Based Cardinality Estimation Through the Lens Of Cardinality Estimation Graphs", ArXiv, vol. abs/2105.08878, 2021.
Clarke, C., A. Vtyurina, and M. Smucker, "Assessing Top- Preferences", ACM Transactions on Information Systems (TOIS), vol. 39, issue 3, pp. 33:1--33:21, 2021.
Liu, J., K. Knopf, Y. Tan, B. Ding, and X. He, "Catch a Blowfish Alive: A Demonstration of Policy-Aware Differential Privacy for Interactive Data Exploration", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 12, pp. 2859--2862, 2021.
Parsa, M. S., L. Golab, and S. Keshav, "Climate Action During COVID-19 Recovery and Beyond: A Twitter Text Mining Study", ArXiv, vol. abs/2105.12190, 2021.
Gupta, P., A. Mhedhbi, and S. Salihoglu, "Columnar Storage and List-Based Processing for Graph Database Management Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 11, pp. 2491--2504, 2021.
Lin, S-C., J-H. Yang, and J. Lin, "Contextualized Query Embeddings for Conversational Search", ArXiv, vol. abs/2104.08707, 2021.
Shi, P., R. Zhang, H. Bai, and J. Lin, "Cross-Lingual Training With Dense Retrieval for Document Retrieval", ArXiv, vol. abs/2109.01628, 2021.
Lin, S-C., and J. Lin, "Densifying Sparse Representations for Passage Retrieval by Representational Slicing", ArXiv, vol. abs/2112.04666, 2021.
Near, J. P., and X. He, "Differential Privacy for Databases", Foundations and Trends in Databases, vol. 11, issue 2, pp. 109--225, 2021.
Zheng, Z., L. Zheng, M. Alipour Langouri, F. Chiang, L. Golab, and J. Szlichta, "Discovery and Contextual Data Cleaning With Ontology Functional Dependencies", ArXiv, vol. abs/2105.08105, 2021.
Valduriez, P., R. Jiménez-Peris, and T. Ozsu, "Distributed Database Systems: The Case for NewSQL", Transactions on Large-Scale Data- and Knowledge-Centered Systems, vol. 48, pp. 1--15, 2021.
Wagh, S., X. He, A. Machanavajjhala, and P. Mittal, "DP-cryptography: Marrying Differential Privacy and Cryptography In Emerging Applications", Communications of the ACM, vol. 64, issue 2, pp. 84--93, 2021.
Karegar, R., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Efficient Discovery of Approximate Order Dependencies", ArXiv, vol. abs/2101.02174, 2021.
Hofstätter, S., S-C. Lin, J-H. Yang, J. Lin, and A. Hanbury, "Efficiently Teaching an Effective Dense Retriever With Balanced Topic Aware Sampling", ArXiv, vol. abs/2104.06967, 2021.
Suri, S., I. Ilyas, C. Ré, and T. Rekatsinas, "Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins", ArXiv, vol. abs/2106.01501, 2021.
Suri, S., I. Ilyas, C. Ré, and T. Rekatsinas, "Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 3, pp. 699--712, 2021.
Li, M., and J. Lin, "Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering", ArXiv, vol. abs/2110.01599, 2021.
Pacaci, A., A. Bonifati, and T. Ozsu, "Evaluating Complex Queries on Streaming Graphs", ArXiv, vol. abs/2101.12305, 2021.
Fritz, S., I. Milligan, N. Ruest, and J. Lin, "Fostering Community Engagement Through Datathon Events: The Archives Unleashed Experience", Digital Humanities Quarterly, vol. 15, issue 1, 2021.
Chen, Y., T. Ozsu, G. Xiao, Z. Tang, and K. Li, "GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra - Full Version", ArXiv, vol. abs/2106.14038, 2021.
Li, H., S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "Improving Query Representations for Dense Retrieval With Pseudo Relevance Feedback: A Reproducibility Study", ArXiv, vol. abs/2112.06400, 2021.
Gupta, P., A. Mhedhbi, and S. Salihoglu, "Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems", ArXiv, vol. abs/2103.02284, 2021.
Nogueira, R., Z. Jiang, and J. Lin, "Investigating the Limitations of the Transformers With Simple Arithmetic Tasks", ArXiv, vol. abs/2102.13019, 2021.
Ge, C., S. Mohapatra, X. He, and I. Ilyas, "Kamino: Constraint-Aware Differentially Private Data Synthesis", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 10, pp. 1886--1899, 2021.
Jin, G., and S. Salihoglu, "Making RDBMSs Efficient on Graph Workloads Through Predefined Joins", ArXiv, vol. abs/2108.10540, 2021.
Zhang, X., X. Ma, P. Shi, and J. Lin, "Mr. TyDi: A Multi-Lingual Benchmark for Dense Retrieval", ArXiv, vol. abs/2108.08787, 2021.
Craswell, N., B. Mitra, E. Yilmaz, D. Campos, and J. Lin, "MS MARCO: Benchmarking Ranking Models in the Large-Data Regime", ArXiv, vol. abs/2105.04021, 2021.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting", ACM Transactions on Information Systems (TOIS), vol. 39, issue 4, pp. 48:1--48:29, 2021.
Peng, P., Q. Ge, L. Zou, T. Ozsu, Z. Xu, and D. Zhao, "Optimizing Multi-Query Evaluation in Federated RDF Systems", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 33, issue 4, pp. 1692--1707, 2021.
Mhedhbi, A., C. Kankanamge, and S. Salihoglu, "Optimizing One-Time and Continuous Subgraph Queries Using Worst-Case Optimal Joins", ACM Transactions on Database Systems (TODS), vol. 46, issue 2, pp. 6:1--6:45, 2021.
Shafieinejad, M., F. Kerschbaum, and I. Ilyas, "PCOR: Private Contextual Outlier Release via Differentially Private Search", ArXiv, vol. abs/2103.05173, 2021.
Arabzadeh, N., X. Yan, and C. Clarke, "Predicting Efficiency/Effectiveness Trade-Offs for Dense vs. Sparse Retrieval Strategy Selection", ArXiv, vol. abs/2109.10739, 2021.
Lin, J., X. Ma, S-C. Lin, J-H. Yang, R. Pradeep, and R. Nogueira, "Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research With Sparse and Dense Representations", ArXiv, vol. abs/2102.10073, 2021.
Saxena, H., L. Golab, S. Idreos, and I. Ilyas, "Real-Time LSM-Trees for HTAP Workloads", ArXiv, vol. abs/2101.06801, 2021.
Kato, M. P., Y. Liu, N. Kando, and C. Clarke, "Report on the 15th Round of NII Testbeds and Community for Information Access Research (NTCIR-15)", SIGIR Forum, vol. 55, issue 2, pp. 21:1--21:6, 2021.
Sheshbolouki, A., and T. Ozsu, "Scale-Invariant Strength Assortativity of Streaming Butterflies", ArXiv, vol. abs/2111.12217, 2021.
Sheshbolouki, A., and T. Ozsu, "sGrapp: Butterfly Approximation in Streaming Graphs", ArXiv, vol. abs/2101.12334, 2021.
Arabzadeh, N., A. Vtyurina, X. Yan, and C. Clarke, "Shallow Pooling for Sparse Labels", ArXiv, vol. abs/2109.00062, 2021.
Lin, J., D. Campos, N. Craswell, B. Mitra, and E. Yilmaz, "Significant Improvements Over the State of the Art? A Case Study Of the MS MARCO Document Ranking Leaderboard", ArXiv, vol. abs/2102.12887, 2021.
Yang, J-H., X. Ma, and J. Lin, "Sparsifying Sparse Representations for Passage Retrieval by Top-K Masking", ArXiv, vol. abs/2112.09628, 2021.
Grossman, M., and G. Cormack, "The eDiscovery Medicine Show", ArXiv, vol. abs/2109.13908, 2021.
Pradeep, R., R. Nogueira, and J. Lin, "The Expando-Mono-Duo Design Pattern for Text Ranking With Pretrained Sequence-to-Sequence Models", ArXiv, vol. abs/2101.05667, 2021.
Sakr, S., A. Bonifati, H. Voigt, A. Iosup, K. Ammar, R. Angles, W. G. Aref, M. Arenas, M. Besta, P. A. Boncz, et al., "The Future Is Big Graphs: A Community View on Graph Processing Systems", Communications of the ACM, vol. 64, issue 9, pp. 62--71, 2021.
Gauch, M., J. Mai, and J. Lin, "The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction", Environmental Modelling and Software, vol. 135, pp. 104926, 2021.
Mohapatra, S., S. Sasy, X. He, G. Kamath, and O. Thakkar, "The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection", ArXiv, vol. abs/2111.04906, 2021.
Xue, H., F. D. Salim, Y. Ren, and C. Clarke, "Translating Human Mobility Forecasting Through Natural Language Generation", ArXiv, vol. abs/2112.11481, 2021.
Covington, C., X. He, J. Honaker, and G. Kamath, "Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy", ArXiv, vol. abs/2110.14465, 2021.
Mackenzie, J., A. Trotman, and J. Lin, "Wacky Weights in Learned Sparse Representations and the Revenge Of Score-at-a-Time Query Evaluation", ArXiv, vol. abs/2110.11540, 2021.

2020

Ozsu, T., and P. Valduriez, Principles of Distributed Database Systems, 4th Edition: Springer, 2020.
Kassaie, B., and F. Tompa, "A Framework for Extracted View Maintenance", ACM Symposium on Document Engineering (DocEng), 2020.
Yilmaz, Z. Akkalyoncu, C. Clarke, and J. Lin, "A Lightweight Environment for Learning Experimental IR Research Practices", International Conference on Research and Development in Information Retrieval (SIGIR), 2020.
Zhang, X., A. Yates, and J. Lin, "A Little Bit Is Worse Than None: Ranking With Limited Training Data", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Vtyurina, A., C. Clarke, E. Law, J. R. Trippas, and H. Bota, "A Mixed-Method Analysis of Text and Audio Search Interfaces With Varying Task Complexity", International Conference on the Theory of Information Retrieval (ICTIR), 2020.
Ghenai, A., M. Smucker, and C. Clarke, "A Think-Aloud Study to Understand Factors Affecting Online Health Search", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Gauch, M., J. Bai, J. Mai, and J. Lin, "An Open-Source Interface to the Canadian Surface Prediction Archive", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
Tu, Z., W. Yang, Z. Fu, Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin, "Approximate Nearest Neighbor Search and Lightweight Dense Vector Reranking In Multi-Stage Retrieval Architectures", International Conference on the Theory of Information Retrieval (ICTIR), 2020.
Wu, R., A. Zhang, I. Ilyas, and T. Rekatsinas, "Attention-Based Learning for Missing Data Imputation in HoloClean", Conference on Machine Learning and Systems (MLSys), 2020.
Yates, A., S. Arora, X. Zhang, W. Yang, K. Martin Jose, and J. Lin, "Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval", Web Search and Data Mining (WSDM), 2020.
Glasbergen, B., K. Langendoen, M. Abebe, and K. Daudjee, "ChronoCache: Predictive and Adaptive Mid-Tier Query Result Caching", ACM International Conference on Management of Data (SIGMOD), 2020.
Tao, Y., X. He, A. Machanavajjhala, and S. Roy, "Computing Local Sensitivities of Counting Queries With Joins", ACM International Conference on Management of Data (SIGMOD), 2020.
Agarwal, R. Raj, D. Kumar, L. Golab, and S. Keshav, "Consentio: Managing Consent to Data Access Using Permissioned Blockchains", IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2020.
Adewoye, T., X. Han, N. Ruest, I. Milligan, S. Fritz, and J. Lin, "Content-Based Exploration of Archival Images Using Neural Networks", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
Zhang, E., N. Gupta, R. Tang, X. Han, R. Pradeep, K. Lu, Y. Zhang, R. Nogueira, K. Cho, H. Fang, et al., "Covidex: Neural Ranking Models and Keyword Search Infrastructure For The COVID-19 Open Research Dataset", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Shi, P., H. Bai, and J. Lin, "Cross-Lingual Training of Neural Models for Document Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Chowdhury, A. Roy, C. Wang, X. He, A. Machanavajjhala, and S. Jha, "Crypt?: Crypto-Assisted Differential Privacy on Untrusted Servers", ACM International Conference on Management of Data (SIGMOD), 2020.
Ding, S., E. Zhang, and J. Lin, "Cydex: Neural Search Infrastructure for the Scholarly Literature", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Xin, J., R. Tang, J. Lee, Y. Yu, and J. Lin, "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference", Association for Computational Linguistics (ACL), 2020.
Yang, J-H., S-C. Lin, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Designing Templates for Eliciting Commonsense Knowledge From Pretrained Sequence-to-Sequence Models", International Conference on Computational Linguistics (COLING), 2020.
Xie, Y., W. Yang, L. Tan, K. Xiong, N. Jing Yuan, B. Huai, M. Li, and J. Lin, "Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering", The Web Conference (WWW), 2020.
Nogueira, R., Z. Jiang, R. Pradeep, and J. Lin, "Document Ranking With a Pretrained Sequence-to-Sequence Model", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Ng, Y. Ki, D. J. Fraser, B. Kassaie, G. Labahn, M. S. Marzouk, F. Tompa, and K. Wang, "Dowsing for Math Answers With Tangent-L", Conference and Labs of the Evaluation Forum (CLEF), 2020.
Abebe, M., B. Glasbergen, and K. Daudjee, "DynaMast: Adaptive Dynamic Mastering for Replicated Systems", IEEE International Conference on Data Engineering (ICDE), 2020.
Xin, J., R. Nogueira, Y. Yu, and J. Lin, "Early Exiting BERT for Efficient Document Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Zhang, X., T. Ozsu, and L. Chen, "ELite: Cost-Effective Approximation of Exploration-Based Graph Analysis", ACM International Conference on Management of Data (SIGMOD), 2020.
Szlichta, J., P. Godfrey, L. Golab, M. Kargar, and D. Srivastava, "Erratum for Discovering Order Dependencies Through Order Compatibility (Edbt 2019)", International Conference on Extending Database Technology (EDBT), 2020.
Nogueira, R., Z. Jiang, K. Cho, and J. Lin, "Evaluating Pretrained Transformer Models for Citation Recommendation", International Workshop on Bibliometric-enhanced Information Retrieval (BIR), 2020.
Adhikari, A., A. Ram, R. Tang, W. L. Hamilton, and J. Lin, "Exploring the Limits of Simple Learners in Knowledge Distillation For Document Classification With DocBERT", Workshop on Representation Learning for NLP (RepL4NLP), 2020.
Toman, D., and G. Weddell, "First Order Rewritability for Ontology Mediated Querying in Horn-DLFD", International Workshop on Description Logics (DL), 2020.
Yates, A., K. Martin Jose, X. Zhang, and J. Lin, "Flexible IR Pipelines With Capreolus", International Conference on Information and Knowledge Management (CIKM), 2020.
Grand, A., R. Muir, J. Ferenczi, and J. Lin, "From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance", European Conference on Information Retrieval (ECIR), 2020.
Yan, D., G. Guo, M. Mashiur Ra Chowdhury, T. Ozsu, W-S. Ku, and J. C. S. Lui, "G-Thinker: A Distributed Framework for Mining Subgraphs in a Big Graph", IEEE International Conference on Data Engineering (ICDE), 2020.
Lin, J., C. Zhong, D. Hu, C. Rudin, and M. I. Seltzer, "Generalized and Scalable Optimal Sparse Decision Trees", International Conference on Machine Learning (ICML), 2020.
Zeng, L., L. Zou, T. Ozsu, L. Hu, and F. Zhang, "GSI: GPU-friendly Subgraph Isomorphism", IEEE International Conference on Data Engineering (ICDE), 2020.
Pradeep, R., X. Ma, X. Zhang, H. Cui, R. Xu, R. Nogueira, and J. Lin, "H2oloo at TREC 2020: When All You Got Is a Hammer... Deep Learning, Health Misinformation, and Precision Medicine", Text Retrieval Conference (TREC), 2020.
Jiang, Z., R. Tang, J. Xin, and J. Lin, "Inserting Information Bottleneck for Attribution in Transformers", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Kumar, D., L. Mou, L. Golab, and O. Vechtomova, "Iterative Edit-Based Unsupervised Sentence Simplification", Association for Computational Linguistics (ACL), 2020.
Farhat, O., H. Bindra, and K. Daudjee, "Leaving Stragglers at the Window: Low-Latency Stream Sampling With Accuracy Guarantees", Distributed Event-Based Systems (DEBS), 2020.
Xiang, Z., B. Ding, X. He, and J. Zhou, "Linear and Range Counting Under Metric-Based Local Differential Privacy", International Symposium on Information Theory (ISIT), 2020.
Agarwal, R. Raj, R. Cohen, L. Golab, and A. Tsang, "Locating Influential Agents in Social Networks: Budget-Constrained Seed Set Selection", Canadian Conference on Artificial Intelligence (AI), 2020.
Buchanan, G., D. McKay, C. Clarke, L. Azzopardi, and J. R. Trippas, "Made to Measure: A Workshop on Human-Centred Metrics for Information Seeking", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Li, Q., T. Ozsu, and H. Xiong, "Message From the General Chairs of DSC 2020", International Conference on Data Science in Cyberspace (DSC), 2020.
Grossman, M., G. Cormack, and B'. Pham, "MRG_UWaterloo Participation in the TREC 2020 Precision Medicine Track", Text Retrieval Conference (TREC), 2020.
Clarke, C., M. Smucker, and A. Vtyurina, "Offline Evaluation by Maximum Similarity to an Ideal Ranking", International Conference on Information and Knowledge Management (CIKM), 2020.
Clarke, C., A. Vtyurina, and M. Smucker, "Offline Evaluation Without Gain", International Conference on the Theory of Information Retrieval (ICTIR), 2020.
Clarke, C., S. Rizvi, M. Smucker, M. Maistro, and G. Zuccon, "Overview of the TREC 2020 Health Misinformation Track", Text Retrieval Conference (TREC), 2020.
Meng, X., and L. Golab, "Parallel Scheduling of Data-Intensive Tasks", European Conference on Parallel Processing (Euro-Par), 2020.
Khan, A., and L. Golab, "Reddit Mining to Understand Gendered Movements", International Conference on Extending Database Technology (EDBT), 2020.
Jacobs, A., S. Chopra, and L. Golab, "Reddit Mining to Understand Women's Issues in STEM", International Conference on Extending Database Technology (EDBT), 2020.
Pacaci, A., A. Bonifati, and T. Ozsu, "Regular Path Query Evaluation on Streaming Graphs", ACM International Conference on Management of Data (SIGMOD), 2020.
Lin, J., and Q. Zhang, "Reproducibility Is a Process, Not an Achievement: The Replicability Of IR Reproducibility Experiments", European Conference on Information Retrieval (ECIR), 2020.
Guo, R. Benson, and K. Daudjee, "Research Challenges in Deep Reinforcement Learning-Based Join Query Optimization", ACM International Conference on Management of Data (SIGMOD), 2020.
Mior, M. J., and K. Salem, "ReSpark: Automatic Caching for Iterative Applications in Apache Spark", IEEE International Conference on Big Data (IEEE BigData), 2020.
Glasbergen, B., M. Abebe, K. Daudjee, D. Vogel, and J. Zhao, "Sentinel: Understanding Data Systems", ACM International Conference on Management of Data (SIGMOD), 2020.
Tang, R., J. Lee, J. Xin, X. Liu, Y. Yu, and J. Lin, "Showing Your Work Doesn't Always Work", Association for Computational Linguistics (ACL), 2020.
Satuluri, V., Y. Wu, X. Zheng, Y. Qian, B. Wichers, Q. Dai, G. Ming Tang, J. Jiang, and J. Lin, "SimClusters: Community-Based Representations for Heterogeneous Recommendations At Twitter", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020.
Parsa, M. S., and L. Golab, "Social Media Mining to Understand the Impact of Co-Operative Education On Mental Health", Educational Data Mining (EDM), 2020.
Ozsu, T., "Streaming Graph Processing and Analytics", Distributed Event-Based Systems (DEBS), 2020.
Lin, J., J. M. Mackenzie, C. Kamphuis, C. Macdonald, A. Mallia, M. Siedlaczek, A. Trotman, and A. P. de Vries, "Supporting Interoperability Between Open-Source Search Engines With The Common Index File Format", International Conference on Research and Development in Information Retrieval (SIGIR), 2020.
Naseem, S. Saad, D. Kumar, M. S. Parsa, and L. Golab, "Text Mining of COVID-19 Discussions on Reddit", IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2020.
Ruest, N., J. Lin, I. Milligan, and S. Fritz, "The Archives Unleashed Project: Technology, Process, and Community To Improve Scholarly Access to Web Archives", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
Lin, S-C., J-H. Yang, and J. Lin, "TREC 2020 Notebook: CAsT Track", Text Retrieval Conference (TREC), 2020.
Shahidi, H., M. Li, and J. Lin, "Two Birds, One Stone: A Simple, Unified Model for Text Generation From Structured and Unstructured Data", Association for Computational Linguistics (ACL), 2020.
Sequiera, R., L. Tan, Y. Zhang, and J. Lin, "Update Delivery Mechanisms for Prospective Information Needs: A Reproducibility Study", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Arabzadeh, N., and C. Clarke, "WaterlooClarke at the Trec 2020 Conversational Assistant Track", Text Retrieval Conference (TREC), 2020.
Lin, J., I. Milligan, D. W. Oard, N. Ruest, and K. Shilton, "We Could, but Should We?: Ethical Considerations for Providing Access To GeoCities and Other Historical Digital Collections", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Kamphuis, C., A. P. de Vries, L. Boytsov, and J. Lin, "Which BM25 Do You Mean? A Large-Scale Reproducibility Study Of Scoring Variants", European Conference on Information Retrieval (ECIR), 2020.
Gorenflo, C., L. Golab, and S. Keshav, "XOX Fabric: A Hybrid Approach to Blockchain Transaction Execution", IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2020.
Gauch, M., and J. Lin, "A Data Scientist's Guide to Streamflow Prediction", ArXiv, vol. abs/2006.12975, 2020.
Lin, J., "A Prototype of Serverless Lucene", ArXiv, vol. abs/2002.01447, 2020.
Ozsu, T., "A Systematic View of Data Science", IEEE Data Engineering Bulletin, vol. 43, issue 3, pp. 3--11, 2020.
Mhedhbi, A., P. Gupta, S. Khaliq, and S. Salihoglu, "A+ Indexes: Lightweight and Highly Flexible Adjacency Lists For Graph Database Management Systems", ArXiv, vol. abs/2004.00130, 2020.
Chen, Y., G. Xiao, T. Ozsu, C. Liu, A. Y. Zomaya, and T. Li, "aeSpTV: An Adaptive and Efficient Framework for Sparse Tensor-Vector Product Kernel on a High-Performance Computing Platform", IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 31, issue 10, pp. 2329--2345, 2020.
Livshits, E., A. Heidari, I. Ilyas, and B. Kimelfeld, "Approximate Denial Constraints", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 10, pp. 1682--1695, 2020.
Livshits, E., A. Heidari, I. Ilyas, and B. Kimelfeld, "Approximate Denial Constraints", ArXiv, vol. abs/2005.08540, 2020.
Clarke, C., A. Vtyurina, and M. Smucker, "Assessing Top-K Preferences", ArXiv, vol. abs/2007.11682, 2020.
Oliveira, P. H., D. S. Kaster, C. Traina, Jr., and I. Ilyas, "Batchwise Probabilistic Incremental Data Cleaning", ArXiv, vol. abs/2011.04730, 2020.
Fritz, S., I. Milligan, N. Ruest, and J. Lin, "Building Community at Distance: A Datathon During COVID-19", Digital Library Perspectives, vol. 36, issue 4, pp. 415--428, 2020.
Khan, A., L. Golab, M. Kargar, J. Szlichta, and M. Zihayat, "Compact Group Discovery in Attributed Graphs and Social Networks", Information Processing and Management, vol. 57, issue 2, pp. 102054, 2020.
Tao, Y., X. He, A. Machanavajjhala, and S. Roy, "Computing Local Sensitivities of Counting Queries With Joins", ArXiv, vol. abs/2004.04656, 2020.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Conversational Question Reformulation via Sequence-to-Sequence Architectures And Pretrained Language Models", ArXiv, vol. abs/2004.01909, 2020.
Zhang, E., N. Gupta, R. Tang, X. Han, R. Pradeep, K. Lu, Y. Zhang, R. Nogueira, K. Cho, H. Fang, et al., "Covidex: Neural Ranking Models and Keyword Search Infrastructure For The COVID-19 Open Research Dataset", ArXiv, vol. abs/2007.07846, 2020.
Xin, J., R. Tang, J. Lee, Y. Yu, and J. Lin, "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference", ArXiv, vol. abs/2004.12993, 2020.
Kassaie, B., and F. Tompa, "Detecting Opportunities for Differential Maintenance of Extracted Views", ArXiv, vol. abs/2007.01973, 2020.
Karegar, R., M. Mirsafian, P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Discovering Domain Orders Through Order Dependencies", ArXiv, vol. abs/2005.14068, 2020.
Lin, S-C., J-H. Yang, and J. Lin, "Distilling Dense Representations for Ranking Using Tightly-Coupled Teachers", ArXiv, vol. abs/2010.11386, 2020.
Nogueira, R., Z. Jiang, and J. Lin, "Document Ranking With a Pretrained Sequence-to-Sequence Model", ArXiv, vol. abs/2003.06713, 2020.
Wagh, S., X. He, A. Machanavajjhala, and P. Mittal, "DP-Cryptography: Marrying Differential Privacy and Cryptography In Emerging Applications", ArXiv, vol. abs/2004.08887, 2020.
Zhang, H., G. Cormack, M. Grossman, and M. Smucker, "Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval", Information Retrieval Journal, vol. 23, issue 1, pp. 1--26, 2020.
Gorenflo, C., S. Lee, L. Golab, and S. Keshav, "FastFabric: Scaling Hyperledger Fabric to 20 000 Transactions Per Second", International Journal of Network Management, vol. 30, issue 5, 2020.
Lin, J., C. Zhong, D. Hu, C. Rudin, and M. I. Seltzer, "Generalized Optimal Sparse Decision Trees", ArXiv, vol. abs/2006.08690, 2020.
Sahu, S., and S. Salihoglu, "Graphsurge: Graph Analytics on View Collections Using Differential Computation", ArXiv, vol. abs/2004.05297, 2020.
Tang, R., J. Lee, A. Razi, J. Cambre, I. Bicking, J. Kaye, and J. Lin, "Howl: A Deployed, Open-Source Wake Word Detection System", ArXiv, vol. abs/2008.09606, 2020.
Jiang, Z., R. Tang, J. Xin, and J. Lin, "Inserting Information Bottlenecks for Attribution in Transformers", ArXiv, vol. abs/2012.13838, 2020.
Chen, S., P. K. Chrysanthis, K. Daudjee, M. Hsu, and M. Sadoghi, "Introduction to the Special Issue on Self-Managing and Hardware-Optimized Database Systems 2019", Distributed and Parallel Databases, vol. 38, issue 4, pp. 767--769, 2020.
Kumar, D., L. Mou, L. Golab, and O. Vechtomova, "Iterative Edit-Based Unsupervised Sentence Simplification", ArXiv, vol. abs/2006.09639, 2020.
Ge, C., S. Mohapatra, X. He, and I. Ilyas, "Kamino: Constraint-Aware Differentially Private Data Synthesis", ArXiv, vol. abs/2012.15713, 2020.
Li, M., H. Bai, L. Tan, K. Xiong, M. Li, and J. Lin, "Latte-Mix: Measuring Sentence Semantic Similarity With Latent Categorical Mixtures", ArXiv, vol. abs/2010.11351, 2020.
Chen, L., and L. Golab, "Micro-Journal Mining to Understand Mood Triggers", Computing, vol. 102, issue 5, pp. 1227--1244, 2020.
Abebe, M., B. Glasbergen, and K. Daudjee, "MorphoSys: Automatic Physical Design Metamorphosis for Distributed Database Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 13, pp. 3573--3587, 2020.
Nogueira, R., Z. Jiang, K. Cho, and J. Lin, "Navigation-Based Candidate Expansion and Pretrained Language Models For Citation Recommendation", Scientometrics, vol. 125, issue 3, pp. 3001--3016, 2020.
Nogueira, R., Z. Jiang, K. Cho, and J. Lin, "Navigation-Based Candidate Expansion and Pretrained Language Models For Citation Recommendation", ArXiv, vol. abs/2001.08687, 2020.
Heidari, A., S. Kushagra, and I. Ilyas, "On Sampling From Data With Duplicate Records", ArXiv, vol. abs/2008.10549, 2020.
Wang, X-J., M. Grossman, and S. Gyu Hyun, "Participation in TREC 2020 COVID Track Using Continuous Active Learning", ArXiv, vol. abs/2011.01453, 2020.
Lin, J., R. Nogueira, and A. Yates, "Pretrained Transformers for Text Ranking: BERT and Beyond", ArXiv, vol. abs/2010.06467, 2020.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Query Reformulation Using Query History for Passage Retrieval in Conversational Search", ArXiv, vol. abs/2005.02230, 2020.
Gauch, M., F. Kratzert, D. Klotz, G. Nearing, J. Lin, and S. Hochreiter, "Rainfall-Runoff Prediction at Multiple Timescales With a Single Long Short-Term Memory Network", ArXiv, vol. abs/2010.07921, 2020.
Zhang, R., W. Yang, L. Lin, Z. Tu, Y. Xie, Z. Fu, Y. Xie, L. Tan, K. Xiong, and J. Lin, "Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents", ArXiv, vol. abs/2002.01861, 2020.
Tang, R., R. Nogueira, E. Zhang, N. Gupta, P. Cam, K. Cho, and J. Lin, "Rapidly Bootstrapping a Question Answering Dataset for COVID-19", ArXiv, vol. abs/2004.11339, 2020.
Zhang, E., N. Gupta, R. Nogueira, K. Cho, and J. Lin, "Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned", ArXiv, vol. abs/2004.05125, 2020.
Heidari, A., G. Michalopoulos, S. Kushagra, I. Ilyas, and T. Rekatsinas, "Record Fusion: A Learning Approach", ArXiv, vol. abs/2006.10208, 2020.
Pacaci, A., A. Bonifati, and T. Ozsu, "Regular Path Query Evaluation on Streaming Graphs", ArXiv, vol. abs/2004.02012, 2020.
Bryson, S., H. Davoudi, L. Golab, M. Kargar, Y. Lytvyn, P. Mierzejewski, J. Szlichta, and M. Zihayat, "Robust Keyword Search in Large Attributed Graphs", Information Retrieval Journal, vol. 23, issue 5, pp. 502--524, 2020.
Bater, J., Y. Park, X. He, X. Wang, and J. Rogers, "SAQE: Practical Privacy-Preserving Approximate Query Processing For Data Federations", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 11, pp. 2691--2705, 2020.
Guo, G., D. Yan, T. Ozsu, Z. Jiang, and J. Khalil, "Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 4, pp. 573--585, 2020.
Guo, G., D. Yan, T. Ozsu, and Z. Jiang, "Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach", ArXiv, vol. abs/2005.00081, 2020.
Pradeep, R., X. Ma, R. Nogueira, and J. Lin, "Scientific Claim Verification With VERT5ERINI", ArXiv, vol. abs/2010.11930, 2020.
Bai, H., P. Shi, J. Lin, L. Tan, K. Xiong, W. Gao, and M. Li, "SegaBERT: Pre-Training of Segment-Aware BERT for Language Understanding", ArXiv, vol. abs/2004.14996, 2020.
Bai, H., P. Shi, J. Lin, L. Tan, K. Xiong, W. Gao, J. Liu, and M. Li, "Semantics of the Unwritten", ArXiv, vol. abs/2004.02251, 2020.
Glasbergen, B., M. Abebe, K. Daudjee, and A. Levi, "Sentinel: Universal Analysis and Insight for Data Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 11, pp. 2720--2733, 2020.
Tang, R., J. Lee, J. Xin, X. Liu, Y. Yu, and J. Lin, "Showing Your Work Doesn't Always Work", ArXiv, vol. abs/2004.13705, 2020.
Salem, K., "Special Issue on Best Papers of DaMoN 2018", The VLDB Journal, vol. 29, issue 2-3, pp. 755, 2020.
Boncz, P. A., and K. Salem, "Special Issue on Best Papers of VLDB 2017", The VLDB Journal, vol. 29, issue 1, pp. 483--484, 2020.
Lin, J., J. M. Mackenzie, C. Kamphuis, C. Macdonald, A. Mallia, M. Siedlaczek, A. Trotman, and A. P. de Vries, "Supporting Interoperability Between Open-Source Search Engines With The Common Index File Format", ArXiv, vol. abs/2003.08276, 2020.
Ruest, N., J. Lin, I. Milligan, and S. Fritz, "The Archives Unleashed Project: Technology, Process, and Community To Improve Scholarly Access to Web Archives", ArXiv, vol. abs/2001.05399, 2020.
Sakr, S., A. Bonifati, H. Voigt, A. Iosup, K. Ammar, R. Angles, W. G. Aref, M. Arenas, M. Besta, P. A. Boncz, et al., "The Future Is Big Graphs! A Community View on Graph Processing Systems", ArXiv, vol. abs/2012.06171, 2020.
Sahu, S., A. Mhedhbi, S. Salihoglu, J. Lin, and T. Ozsu, "The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey", The VLDB Journal, vol. 29, issue 2-3, pp. 595--618, 2020.
Zhang, M., L. Tan, Z. Tu, Z. Fu, K. Xiong, M. Li, and J. Lin, "To Paraphrase or Not to Paraphrase: User-Controllable Selective Paraphrase Generation", ArXiv, vol. abs/2008.09290, 2020.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "TTTTTackling WinoGrande Schemas", ArXiv, vol. abs/2003.08380, 2020.
Toman, D., and G. Weddell, "Using Feature-Based Description Logics to Avoid Duplicate Elimination In Object-Relational Query Languages", German Journal of Artificial Intelligence (KI), vol. 34, issue 3, pp. 355--363, 2020.

2019

Ilyas, I., and X. Chu, Data Cleaning: ACM, 2019.
Ilyas, I., "Data Unification at Scale: Data Tamer", Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker: ACM / Morgan & Claypool, 2019.
Salihoglu, S., and N. Yakovets, "Graph Query Processing", Encyclopedia of Big Data Technologies: Springer, 2019.
Golab, L., "Types of Stream Processing Algorithms", Encyclopedia of Big Data Technologies: Springer, 2019.
De Sa, C., I. Ilyas, B. Kimelfeld, C. Ré, and T. Rekatsinas, "A Formal Framework for Probabilistic Unclean Databases", International Conference on Database Theory (ICDT), 2019.
Kushagra, S., H. Saxena, I. Ilyas, and S. Ben-David, "A Semi-Supervised Framework of Clustering Selection for De-Duplication", IEEE International Conference on Data Engineering (ICDE), 2019.
Yang, H-W., Y. Zou, P. Shi, W. Lu, J. Lin, and X. Sun, "Aligning Cross-Lingual Entities With Multi-Aspect Information", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Ge, C., X. He, I. Ilyas, and A. Machanavajjhala, "APEx: Accuracy-Aware Differentially Private Data Exploration", ACM International Conference on Management of Data (SIGMOD), 2019.
Yilmaz, Z. Akkalyoncu, S. Wang, W. Yang, H. Zhang, and J. Lin, "Applying BERT to Document Retrieval With Birch", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Heidari, A., I. Ilyas, and T. Rekatsinas, "Approximate Inference in Structured Instances With Noisy Categorical Observations", Conference on Uncertainty in Artificial Intelligence (UAI), 2019.
Rao, J., L. Liu, Y. Tay, H-W. Yang, P. Shi, and J. Lin, "Bridging the Gap Between Relevance Matching and Semantic Matching For Short Text Similarity Modeling", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Davoudi, H., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Bring Order to Data", Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), 2019.
Milligan, I., N. Casemajor, S. Fritz, J. Lin, N. Ruest, M. S. Weber, and N. Worby, "Building Community and Tools for Analyzing Web Archives Through Datathons", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Ilyas, I., "Building Scalable Machine Learning Solutions for Data Cleaning", Datenbanksysteme für Business, Technologie und Web(BTW), 2019.
Türe, F., J. Rao, R. Tang, and J. Lin, "Challenges and Opportunities in Understanding Spoken Queries Directed At Modern Entertainment Platforms", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, W., K. Lu, P. Yang, and J. Lin, "Critically Examining the "Neural Hype": Weak Baselines and the Additivity Of Effectiveness Gains From Neural Ranking Models", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yilmaz, Z. Akkalyoncu, W. Yang, H. Zhang, and J. Lin, "Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Neumann, T., and K. Salem, "DaMoN 19: The 15th International Workshop on Data Management on New Hardware", ACM International Conference on Management of Data (SIGMOD), 2019.
Yang, W., L. Tan, C. Lu, A. Cui, H. Li, X. Chen, K. Xiong, M. Wang, M. Li, J. Pei, et al., "Detecting Customer Complaint Escalation With Recurrent Neural Networks And Manually-Engineered Features", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Saxena, H., L. Golab, and I. Ilyas, "Distributed Discovery of Functional Dependencies", IEEE International Conference on Data Engineering (ICDE), 2019.
Alonso, G., C. Binnig, I. Pandis, K. Salem, J. Skrzypczak, R. Stutsman, L. Thostrup, T. Wang, Z. Wang, and T. Ziegler, "DPI: The Data Processing Interface for Modern Networks", Conference on Innovative Data Systems Research (CIDR), 2019.
Cormack, G., H. Zhang, N. Ghelani, M. Abualsaud, M. Smucker, M. Grossman, S. Rahbariasl, and A. Ghenai, "Dynamic Sampling Meets Pooling", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, W., Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin, "End-to-End Open-Domain Question Answering With BERTserini", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Toman, D., and G. Weddell, "Exhaustive Query Answering via Referring Expressions", International Workshop on Description Logics (DL), 2019.
Pacaci, A., and T. Ozsu, "Experimental Analysis of Streaming Algorithms for Graph Partitioning", ACM International Conference on Management of Data (SIGMOD), 2019.
Le Guilly, M., J-M. Petit, V-M. Scuturici, and I. Ilyas, "ExplIQuE: Interactive Databases Exploration With SQL", International Conference on Information and Knowledge Management (CIKM), 2019.
Gorenflo, C., S. Lee, L. Golab, and S. Keshav, "FastFabric: Scaling Hyperledger Fabric to 20, 000 Transactions Per Second", IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2019.
Toman, D., and G. Weddell, "Finding ALL Answers to OBDA Queries Using Referring Expressions", Australian Joint Conference on Artificial Intelligence (AUS-AI), 2019.
McIntyre, S., D. Toman, and G. Weddell, "FunDL - A Family of Feature-Based Description Logics, With Applications In Querying Structured Data Sources", Description Logic, Theory Combination, and All That - Essays Dedicated to Franz Baader, 2019.
Chopra, S., A. Khan, M. Mirsafian, and L. Golab, "Gender Differences in Science and Engineering: A Data Mining Approach", International Conference on Extending Database Technology (EDBT), 2019.
Chopra, S., A. Khan, M. Mirsafian, and L. Golab, "Gender Differences in Work-Integrated Learning Assessments", Educational Data Mining (EDM), 2019.
Anzum, N., S. Salihoglu, and D. Vogel, "GraphWrangler: An Interactive Graph View on Relational Data", ACM International Conference on Management of Data (SIGMOD), 2019.
Heidari, A., J. McGrath, I. Ilyas, and T. Rekatsinas, "HoloDetect: Few-Shot Learning for Error Detection", ACM International Conference on Management of Data (SIGMOD), 2019.
Lee, J., R. Tang, and J. Lin, "Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
McCoy, A. B., D. F. Sittig, J. Lin, and A. Wright, "Identification and Ranking of Biomedical Informatics Researcher Citation Statistics Through a Google Scholar Scraper", American Medical Informatics Association Annual Symposium (AMIA), 2019.
Toman, D., and G. Weddell, "Identity Resolution in Ontology Based Data Access to Structured Data Sources", Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2019.
Liu, L., W. Yang, J. Rao, R. Tang, and J. Lin, "Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Clancy, R., J. Lee, Z. Akkalyoncu Yilmaz, and J. Lin, "Information Retrieval Meets Scalable Text Analytics: Solr Integration With Spark", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Vollmer, M., L. Golab, K. Böhm, and D. Srivastava, "Informative Summarization of Numeric Data", International Conference on Statistical and Scientific Database Management (SSDBM), 2019.
Clarke, C., "Length Normalization in the Era of Neural Rankers", International Workshop on Evaluating Information Access (EVIA), 2019.
Gorenflo, C., L. Golab, and S. Keshav, "Mitigating Trust Issues in Electric Vehicle Charging Using a Blockchain", Energy-Efficient Computing and Networking (e-Energy), 2019.
Rao, J., W. Yang, Y. Zhang, F. Türe, and J. Lin, "Multi-Perspective Relevance Matching With Hierarchical ConvNets For Social Media Search", AAAI Conference on Artificial Intelligence (AAAI), 2019.
Tang, R., Y. Lu, and J. Lin, "Natural Language Generation for Effective Knowledge Distillation", Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing (DeepLo), 2019.
McIntyre, S., A. Borgida, D. Toman, and G. Weddell, "On Limited Conjunctions and Partial Features in Parameter-Tractable Feature Logics", AAAI Conference on Artificial Intelligence (AAAI), 2019.
Borgida, A., D. Toman, and G. Weddell, "On Special Description Logics for Processes and Plans", International Workshop on Description Logics (DL), 2019.
Kumar, D., R. Cohen, and L. Golab, "Online Abuse Detection: The Value of Preprocessing and Neural Attention Models", Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), 2019.
Clancy, R., N. Ferro, C. Hauff, J. Lin, T. Sakai, and Z. Zhong Wu, "Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Abualsaud, M., and M. Smucker, "Patterns of Search Result Examination: Query to First Action", International Conference on Information and Knowledge Management (CIKM), 2019.
Kassaie, B., and F. Tompa, "Predictable and Consistent Information Extraction", ACM Symposium on Document Engineering (DocEng), 2019.
Rogers, J., J. Bater, X. He, A. Machanavajjhala, M. Suresh, and X. Wang, "Privacy Changes Everything", Very Large Data Bases Conference (VLDB), 2019.
Cormack, G., and M. Grossman, "Quantifying Bias and Variance of System Rankings", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, J-H., S-C. Lin, C-J. Wang, J. Lin, and M-F. Tsai, "Query and Answer Expansion From Conversation History", Text Retrieval Conference (TREC), 2019.
Yang, P., and J. Lin, "Reproducing and Generalizing Semantic Term Matching in Axiomatic Information Retrieval", European Conference on Information Retrieval (ECIR), 2019.
Adhikari, A., A. Ram, R. Tang, and J. Lin, "Rethinking Complex Neural Network Architectures for Document Classification", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Yang, H-W., L. Liu, I. Milligan, N. Ruest, and J. Lin, "Scalable Content-Based Analysis of Images in Web Archives With TensorFlow And the Archives Unleashed Toolkit", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Kushagra, S., S. Ben-David, and I. Ilyas, "Semi-Supervised Clustering for De-Duplication", International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
Kazhamiaka, M., B. Naveed Memon, C. Kankanamge, S. Sahu, S. Rizvi, B. Wong, and K. Daudjee, "Sift: Resource-Efficient Consensus With RDMA", Conference on Emerging Network Experiment and Technology (CoNEXT), 2019.
Shi, P., J. Rao, and J. Lin, "Simple Attention-Based Representation Learning for Ranking Short Social Media Posts", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Yu, R., Y. Xie, and J. Lin, "Simple Techniques for Cross-Collection Relevance Feedback", European Conference on Information Retrieval (ECIR), 2019.
Clancy, R., T. Eskildsen, N. Ruest, and J. Lin, "Solr Integration in the Anserini Information Retrieval Toolkit", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yan, D., G. Guo, M. Mashiur Ra Chowdhury, T. Ozsu, J. C. S. Lui, and W. Tan, "T-Thinker: A Task-Centric Distributed Framework for Compute-Intensive Divide-and-Conquer Algorithms", ACM Symposium on Principles & Practice of Parallel Programming (PPoPP), 2019.
Deschamps, R., N. Ruest, J. Lin, S. Fritz, and I. Milligan, "The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration of Web Archives", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Deschamps, R., S. Fritz, J. Lin, I. Milligan, and N. Ruest, "The Cost of a WARC: Analyzing Web Archives in the Cloud", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Lin, J., and P. Yang, "The Impact of Score Ties on Repeatability in Document Ranking", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Clancy, R., N. Ferro, C. Hauff, J. Lin, T. Sakai, and Z. Zhong Wu, "The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Li, Y., L. Zou, T. Ozsu, and D. Zhao, "Time Constrained Continuous Subgraph Search Over Streaming Graphs", IEEE International Conference on Data Engineering (ICDE), 2019.
Rahbariasl, S., and M. Smucker, "Time-Limits and Summaries for Faster Relevance Assessing", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Cormack, G., and M. Grossman, "Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Lee, J., R. Tang, and J. Lin, "Universal Voice-Enabled User Interfaces Using JavaScript", International Conference on Intelligent User Interfaces (IUI), 2019.
Clancy, R., Z. Akkalyoncu Yilmaz, Z. Zhong Wu, and J. Lin, "University of Waterloo Docker Images for OSIRRC at SIGIR 2019", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Deng, D., W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, G. Li, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Unsupervised String Transformation Learning for Entity Consolidation", IEEE International Conference on Data Engineering (ICDE), 2019.
Abualsaud, M., F. C. Beylunioglu, M. Smucker, and R. P. Duimering, "UWaterlooMDS at the TREC 2019 Decision Track", Text Retrieval Conference (TREC), 2019.
Ruest, N., I. Milligan, and J. Lin, "Warclight: A Rails Engine for Web Archive Discovery", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Abebe, M., B. Glasbergen, and K. Daudjee, "WatDFS: A Project for Understanding Distributed Systems in the Undergraduate Curriculum", Technical Symposium on Computer Science Education (SIGCSE), 2019.
Clarke, C., "WaterlooClarke at the TREC 2019 Conversational Assistant Track", Text Retrieval Conference (TREC), 2019.
Xin, J., J. Lin, and Y. Yu, "What Part of the Neural Network Does This? Understanding LSTMs By Measuring and Dissecting Neurons", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Tang, R., F. Türe, and J. Lin, "Yelling at Your TV: An Analysis of Speech Recognition Errors And Subsequent User Behavior on Entertainment Systems", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, H-W., Y. Zou, P. Shi, W. Lu, J. Lin, and X. Sun, "Aligning Cross-Lingual Entities With Multi-Aspect Information", ArXiv, vol. abs/1910.06575, 2019.
Heidari, A., I. Ilyas, and T. Rekatsinas, "Approximate Inference in Structured Instances With Noisy Categorical Observations", ArXiv, vol. abs/1907.00141, 2019.
Liu, L., H. Wang, J. Lin, R. Socher, and C. Xiong, "Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation For Pretrained Models", ArXiv, vol. abs/1911.03588, 2019.
Alway, K., E. Blais, and S. Salihoglu, "Box Covers and Domain Orderings for Beyond Worst-Case Join Processing", ArXiv, vol. abs/1909.12102, 2019.
Aluç, G., T. Ozsu, and K. Daudjee, "Building Self-Clustering RDF Databases Using Tunable-LSH", The VLDB Journal, vol. 28, issue 2, pp. 173--195, 2019.
Agarwal, R. Raj, D. Kumar, L. Golab, and S. Keshav, "Consentio: Managing Consent to Data Access Using Permissioned Blockchains", ArXiv, vol. abs/1910.07110, 2019.
Zhang, X., and T. Ozsu, "Correlation Constraint Shortest Path Over Large Multi-Relation Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 5, pp. 488--501, 2019.
Shi, P., and J. Lin, "Cross-Lingual Relevance Transfer for Document Retrieval", ArXiv, vol. abs/1911.02989, 2019.
Ehsan, N., A. Shakery, and F. Tompa, "Cross-Lingual Text Alignment for Fine-Grained Plagiarism Detection", Journal of Information Science, vol. 45, issue 4, 2019.
Yang, W., Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin, "Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering", ArXiv, vol. abs/1904.06652, 2019.
Xiang, Z., B. Ding, X. He, and J. Zhou, "Design of Algorithms Under Policy-Aware Local Differential Privacy: Utility-Privacy Trade-Offs", ArXiv, vol. abs/1909.11778, 2019.
Karyakin, A., and K. Salem, "DimmStore: Memory Power Optimization for Database Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1499--1512, 2019.
Tang, R., Y. Lu, L. Liu, L. Mou, O. Vechtomova, and J. Lin, "Distilling Task-Specific Knowledge From BERT Into Simple Neural Networks", ArXiv, vol. abs/1903.12136, 2019.
Saxena, H., L. Golab, and I. Ilyas, "Distributed Dependency Discovery", ArXiv, vol. abs/1903.05228, 2019.
Saxena, H., L. Golab, and I. Ilyas, "Distributed Implementations of Dependency Discovery Algorithms", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1624--1636, 2019.
Adhikari, A., A. Ram, R. Tang, and J. Lin, "DocBERT: BERT for Document Classification", ArXiv, vol. abs/1904.08398, 2019.
Nogueira, R., W. Yang, J. Lin, and K. Cho, "Document Expansion by Query Prediction", ArXiv, vol. abs/1904.08375, 2019.
Yang, W., Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin, "End-to-End Open-Domain Question Answering With BERTserini", ArXiv, vol. abs/1902.01718, 2019.
Godfrey, P., L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Errata Note: Discovering Order Dependencies Through Order Compatibility", ArXiv, vol. abs/1905.02010, 2019.
Ram, A., J. Xin, M. Nagappan, Y. Yu, R. Cabrera Lozoya, A. Sabetta, and J. Lin, "Exploiting Token and Path-Based Representations of Code for Identifying Security-Relevant Commits", ArXiv, vol. abs/1911.07620, 2019.
Gorenflo, C., S. Lee, L. Golab, and S. Keshav, "FastFabric: Scaling Hyperledger Fabric to 20, 000 Transactions Per Second", ArXiv, vol. abs/1901.00910, 2019.
Zeng, L., L. Zou, T. Ozsu, L. Hu, and F. Zhang, "GSI: GPU-friendly Subgraph Isomorphism", ArXiv, vol. abs/1906.03420, 2019.
Heidari, A., J. McGrath, I. Ilyas, and T. Rekatsinas, "HoloDetect: Few-Shot Learning for Error Detection", ArXiv, vol. abs/1904.02285, 2019.
Liu, C., X. He, T. Chanyaswad, S. Wang, and P. Mittal, "Investigating Statistical Privacy Frameworks From the Perspective Of Hypothesis Testing", Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2019, issue 3, pp. 233--254, 2019.
Teofili, T., and J. Lin, "Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors", ArXiv, vol. abs/1910.10208, 2019.
Azmy, M., P. Shi, J. Lin, and I. Ilyas, "Matching Entities Across Different Knowledge Graphs With Graph Embeddings", ArXiv, vol. abs/1903.06607, 2019.
Nogueira, R., W. Yang, K. Cho, and J. Lin, "Multi-Stage Document Ranking With BERT", ArXiv, vol. abs/1910.14424, 2019.
Mhedhbi, A., and S. Salihoglu, "Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1692--1704, 2019.
Mhedhbi, A., and S. Salihoglu, "Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins", ArXiv, vol. abs/1903.02076, 2019.
Chowdhury, A. Roy, C. Wang, X. He, A. Machanavajjhala, and S. Jha, "Outis: Crypto-Assisted Differential Privacy on Untrusted Servers", ArXiv, vol. abs/1902.07756, 2019.
Livshits, E., I. Ilyas, B. Kimelfeld, and S. Roy, "Principles of Progress Indicators for Database Repairing", ArXiv, vol. abs/1904.06492, 2019.
Kotsogiannis, I., Y. Tao, X. He, M. Fanaeepour, A. Machanavajjhala, M. Hay, and G. Miklau, "PrivateSQL: A Differentially Private SQL Query Engine", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1371--1384, 2019.
Ge, C., I. Ilyas, and F. Kerschbaum, "Secure Multi-Party Functional Dependency Discovery", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 2, pp. 184--196, 2019.
Yang, W., H. Zhang, and J. Lin, "Simple Applications of BERT for Ad Hoc Document Retrieval", ArXiv, vol. abs/1903.10972, 2019.
Shi, P., and J. Lin, "Simple BERT Models for Relation Extraction and Semantic Role Labeling", ArXiv, vol. abs/1904.05255, 2019.
Sun, J., D. Deng, I. Ilyas, G. Li, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Technical Report: Optimizing Human Involvement for Entity Matching And Consolidation", ArXiv, vol. abs/1906.06574, 2019.
Lin, J., "The Neural Hype, Justified!: A Recantation", SIGIR Forum, vol. 53, issue 2, pp. 88--93, 2019.
Lin, J., L. Paniak, and G. Boerke, "The Performance Envelope of Inverted Indexing on Modern Hardware", ArXiv, vol. abs/1910.11028, 2019.
Gauch, M., J. Mai, and J. Lin, "The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction", ArXiv, vol. abs/1911.07249, 2019.
Lee, J., R. Tang, and J. Lin, "What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning", ArXiv, vol. abs/1911.03090, 2019.
Gorenflo, C., L. Golab, and S. Keshav, "XOX Fabric: A Hybrid Approach to Transaction Execution", ArXiv, vol. abs/1906.11229, 2019.

2018

Abedjan, Z., L. Golab, F. Naumann, and T. Papenbrock, Data Profiling: Morgan & Claypool, 2018.
Liu, L., and T. Ozsu, Encyclopedia of Database Systems, Second Edition: Springer, 2018.
Chomicki, J., and D. Toman, "Abstract Versus Concrete Temporal Query Languages", Encyclopedia of Database Systems: Springer, 2018.
Machanavajjhala, A., and X. He, "Analyzing Your Location Data With Provable Privacy Guarantees", Springer Handbooks: Springer, 2018.
Ozsu, T., "Client-Server Architecture", Encyclopedia of Database Systems: Springer, 2018.
Ozsu, T., "Data Manipulation Language (DML)", Encyclopedia of Database Systems: Springer, 2018.
Golab, L., "Data Stream", Encyclopedia of Database Systems: Springer, 2018.
Ozsu, T., "Database", Encyclopedia of Database Systems: Springer, 2018.
Ozsu, T., "Database Administrator (DBA)", Encyclopedia of Database Systems: Springer, 2018.
Tompa, F., "Document Databases", Encyclopedia of Database Systems: Springer, 2018.
Tompa, F., "Enterprise Content Management", Encyclopedia of Database Systems: Springer, 2018.
Tompa, F., "Hypertexts", Encyclopedia of Database Systems: Springer, 2018.
Toman, D., "Point-Stamped Temporal Models", Encyclopedia of Database Systems: Springer, 2018.
Ilyas, I., "Rank-Aware Query Processing", Encyclopedia of Database Systems: Springer, 2018.
Ilyas, I., "Rank-Join", Encyclopedia of Database Systems: Springer, 2018.
Salem, K., "Sagas", Encyclopedia of Database Systems: Springer, 2018.
Golab, L., "Stream Models", Encyclopedia of Database Systems: Springer, 2018.
Lin, J., "Summarization", Encyclopedia of Database Systems: Springer, 2018.
Chomicki, J., and D. Toman, "Temporal Logic in Database Query Languages", Encyclopedia of Database Systems: Springer, 2018.
Chomicki, J., and D. Toman, "Temporal Relational Calculus", Encyclopedia of Database Systems: Springer, 2018.
Roddick, J. F., and D. Toman, "Temporal Vacuuming", Encyclopedia of Database Systems: Springer, 2018.
Ilyas, I., "Top-K Queries", Encyclopedia of Database Systems: Springer, 2018.
Clarke, C., "Web Question Answering", Encyclopedia of Database Systems: Springer, 2018.
Zhang, H., M. Abualsaud, and M. Smucker, "A Study of Immediate Requery Behavior in Search", Conference on Human Information Interaction and Retrieval (CHIIR), 2018.
Abualsaud, M., N. Ghelani, H. Zhang, M. Smucker, G. Cormack, and M. Grossman, "A System for Efficient High-Recall Retrieval", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Koutris, P., S. Salihoglu, and D. Suciu, "Algorithmic Aspects of Parallel Query Processing", ACM International Conference on Management of Data (SIGMOD), 2018.
Tang, R., W. Wang, Z. Tu, and J. Lin, "An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
Glasbergen, B., M. Abebe, K. Daudjee, S. Foggo, and A. Pacaci, "Apollo: Learning Query Correlations for Predictive Caching in Geo-Distributed Systems", International Conference on Extending Database Technology (EDBT), 2018.
Cormack, G., and M. Grossman, "Beyond Pooling", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Mansour, E., D. Deng, R. Castro Fernandez, A. Ali Qahtan, W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, et al., "Building Data Civilizer Pipelines With an Advanced Workflow Engine", IEEE International Conference on Data Engineering (ICDE), 2018.
Yan, X., L. Yang, H. Zhang, X. Charles Lin, B. Wong, K. Salem, and T. Brecht, "Carousel: Low-Latency Transaction Processing for Globally-Distributed Data", ACM International Conference on Management of Data (SIGMOD), 2018.
Fraser, D. J., A. Kane, and F. Tompa, "Choosing Math Features for BM25 Ranking With Tangent-L", ACM Symposium on Document Engineering (DocEng), 2018.
Liang, Y., Z. Tu, L. Huang, and J. Lin, "CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities", North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
Lin, J., "Computing Without Servers, V8, Rocket Ships, and Other Batsh*t Crazy Ideas in Data Systems", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2018.
Langouri, M. Alipour, Z. Zheng, F. Chiang, L. Golab, and J. Szlichta, "Contextual Data Cleaning", IEEE International Conference on Data Engineering (ICDE), 2018.
Chopra, S., Y. Helen Jiang, A. Toulis, and L. Golab, "Data Analytics to Improve Co-Operative Education", International Conference on Extending Database Technology (EDBT), 2018.
Tang, R., and J. Lin, "Deep Residual Learning for Small-Footprint Keyword Spotting", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
Pacaci, A., and T. Ozsu, "Distribution-Aware Stream Partitioning for Distributed Stream Processing Systems", ACM International Conference on Management of Data (SIGMOD), 2018.
Abebe, M., K. Daudjee, B. Glasbergen, and Y. Tian, "EC-Store: Bridging the Gap Between Storage and Latency in Distributed Erasure Coded Systems", IEEE International Conference on Distributed Computing Systems (ICDCS), 2018.
Zihayat, M., A. An, L. Golab, M. Kargar, and J. Szlichta, "Effective Team Formation in Expert Networks", Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), 2018.
Zhang, H., M. Abualsaud, N. Ghelani, M. Smucker, G. Cormack, and M. Grossman, "Effective User Interaction for High-Recall Retrieval: Less Is More", International Conference on Information and Knowledge Management (CIKM), 2018.
Azmy, M., P. Shi, J. Lin, and I. Ilyas, "Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia", International Conference on Computational Linguistics (COLING), 2018.
Tompa, F., "Fashioning a Search Engine to Support Humanities Research", ACM Symposium on Document Engineering (DocEng), 2018.
Mihaylov, A., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "FASTOD: Bringing Order to Data", IEEE International Conference on Data Engineering (ICDE), 2018.
Zheng, Z., M. Alipour, Z. Qu, I. Currie, F. Chiang, L. Golab, and J. Szlichta, "FastOFD: Contextual Data Cleaning With Ontology Functional Dependencies", International Conference on Extending Database Technology (EDBT), 2018.
Chopra, S., H. Gautreau, A. Khan, M. Mirsafian, and L. Golab, "Gender Differences in Undergraduate Engineering Applicants: A Text Mining Approach", Educational Data Mining (EDM), 2018.
Yu, R., Y. Xie, and J. Lin, "H2oloo at TREC 2018: Cross-Collection Relevance Transfer for The Common Core Track", Text Retrieval Conference (TREC), 2018.
Toman, D., and G. Weddell, "Identity Resolution in Conjunctive Querying Over DL-Based Knowledge Bases", International Workshop on Description Logics (DL), 2018.
Chopra, S., and L. Golab, "Job Description Mining to Understand Work-Integrated Learning", Educational Data Mining (EDM), 2018.
Grossman, M., and G. Cormack, "MRG_UWaterloo Participation in the TREC 2018 Common Core Track", Text Retrieval Conference (TREC), 2018.
Peng, P., L. Zou, T. Ozsu, and D. Zhao, "Multi-Query Optimization in Federated RDF Systems", International Conference on Database Systems for Advanced Applications (DASFAA), 2018.
Rao, J., F. Türe, and J. Lin, "Multi-Task Learning With Neural Networks for Voice Query Understanding On an Entertainment Platform", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2018.
McIntyre, S., A. Borgida, D. Toman, and G. Weddell, "On Limited Conjunctions in Polynomial Feature Logics, With Applications In OBDA", International Conference on Principles of Knowledge Representation and Reasoning (KR), 2018.
Sequiera, R., L. Tan, and J. Lin, "Overview of the TREC 2018 Real-Time Summarization Track", Text Retrieval Conference (TREC), 2018.
Tu, Z., M. Li, and J. Lin, "Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures", North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
Mackenzie, J. M., S. J. Culpepper, R. Blanco, M. Crane, C. Clarke, and J. Lin, "Query Driven Algorithm Selection in Early Stage Retrieval", Web Search and Data Mining (WSDM), 2018.
Memon, B. Naveed, X. Charles Lin, A. Mufti, A. Scott Wesley, T. Brecht, K. Salem, B. Wong, and B. Cassell, "RaMP: A Lightweight RDMA Abstraction for Loosely Coupled Applications", USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2018.
Grewal, A., J. Jiang, G. Lam, T. Jung, L. Vuddemarri, Q. Li, A. Landge, and J. Lin, "RecService: Distributed Real-Time Graph Processing at Twitter", USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2018.
Ghelani, N., G. Cormack, and M. Smucker, "Refresh Strategies in Continuous Active Learning", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Mior, M. J., and K. Salem, "Renormalization of NoSQL Database Schemas", International Conference on Conceptual Modeling (ER), 2018.
Yang, P., S. Thiagarajan, and J. Lin, "Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter", ACM International Conference on Management of Data (SIGMOD), 2018.
Fernandez, R. Castro, E. Mansour, A. Ali Qahtan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery", IEEE International Conference on Data Engineering (ICDE), 2018.
Kim, Y., and J. Lin, "Serverless Data Analytics With Flint", IEEE International Conference on Cloud Computing (CLOUD), 2018.
Aleardi, L. Castelli, S. Salihoglu, G. Singh, and M. Ovsjanikov, "Spectral Measures of Distortion for Change Detection in Dynamic Graphs", International Workshop on Complex Networks & Their Applications, 2018.
Kane, A., and F. Tompa, "Split-Lists and Initial Thresholds for WAND-based Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Gao, L., L. Golab, T. Ozsu, and G. Aluç, "Stream WatDiv: A Streaming RDF Benchmark", ACM International Conference on Management of Data (SIGMOD), 2018.
Mohammed, S., P. Shi, and J. Lin, "Strong Baselines for Simple Question Answering Over Knowledge Graphs With and Without Neural Networks", North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
Cormack, G., and M. Grossman, "Technology-Assisted Review in Empirical Medicine: Waterloo Participation In CLEF eHealth 2018", Conference and Labs of the Evaluation Forum (CLEF), 2018.
Grewal, A., and J. Lin, "The Evolution of Content Analysis for Personalized Recommendations At Twitter", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Cormack, G., and M. Grossman, "The Quest for Total Recall", ACM Symposium on Document Engineering (DocEng), 2018.
Ma, W., M. C. Keet, W. Oldford, D. Toman, and G. Weddell, "The Utility of the Abstract Relational Model and Attribute Paths In SQL", International Conference Knowledge Engineering and Knowledge Management (EKAW), 2018.
Glasbergen, B., M. Abebe, and K. Daudjee, "Tutorial: Adaptive Replication and Partitioning in Data Systems", International Middleware Conference (Middleware), 2018.
Lin, J., S. Mohammed, R. Sequiera, and L. Tan, "Update Delivery Mechanisms for Prospective Information Needs: An Analysis Of Attention in Mobile Users", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Abualsaud, M., G. Cormack, N. Ghelani, A. Ghenai, M. Grossman, S. Rahbariasl, H. Zhang, and M. Smucker, "UWaterlooMDS at the TREC 2018 Common Core Track", Text Retrieval Conference (TREC), 2018.
Rao, J., F. Türe, and J. Lin, "What Do Viewers Say to Their TVs?: An Analysis of Voice Queries To Entertainment Systems", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Korkmaz, M., M. Karsten, K. Salem, and S. Salihoglu, "Workload-Aware CPU Performance Scaling for Transactional Database Systems", ACM International Conference on Management of Data (SIGMOD), 2018.
De Sa, C., I. Ilyas, B. Kimelfeld, C. Ré, and T. Rekatsinas, "A Formal Framework for Probabilistic Unclean Databases", ArXiv, vol. abs/1801.06750, 2018.
Ren, Y., M. Tomko, F. Dilys Salim, J. Chan, C. Clarke, and M. Sanderson, "A Location-Query-Browse Graph for Contextual Recommendation", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 30, issue 2, pp. 204--218, 2018.
Tang, R., and J. Lin, "Adaptive Pruning of Neural Language Models for Mobile Devices", ArXiv, vol. abs/1809.10282, 2018.
Koutris, P., S. Salihoglu, and D. Suciu, "Algorithmic Aspects of Parallel Data Processing", Foundations and Trends in Databases, vol. 8, issue 4, pp. 239--370, 2018.
Yang, P., H. Fang, and J. Lin, "Anserini: Reproducible Ranking Baselines Using Lucene", Journal of Data and Information Quality, vol. 10, issue 4, pp. 16:1--16:20, 2018.
Tang, G., S. Keshav, L. Golab, and K. Wu, "Bikeshare Pool Sizing for Bike-and-Ride Multimodal Transit", IEEE Transactions on Intelligent Transportation Systems, vol. 19, issue 7, pp. 2279--2289, 2018.
Stonebraker, M., and I. Ilyas, "Data Integration: The Current Status and the Way Forward", IEEE Data Engineering Bulletin, vol. 41, issue 2, pp. 3--9, 2018.
Ammar, K., F. McSherry, S. Salihoglu, and M. Joglekar, "Distributed Evaluation of Subgraph Queries Using Worst-Case Optimal And Low-Memory Dataflows", Proceedings of the VLDB Endowment (PVLDB), vol. 11, issue 6, pp. 691--704, 2018.
Ammar, K., F. McSherry, S. Salihoglu, and M. Joglekar, "Distributed Evaluation of Subgraph Queries Using Worstcase Optimal LowMemory Dataflows", ArXiv, vol. abs/1802.03760, 2018.
Szlichta, J., P. Godfrey, L. Golab, M. Kargar, and D. Srivastava, "Effective and Complete Discovery of Bidirectional Order Dependencies Via Set-Based Axioms", The VLDB Journal, vol. 27, issue 4, pp. 573--591, 2018.
Lamb, C., D. G. Brown, and C. Clarke, "Evaluating Computational Creativity: An Interdisciplinary Tutorial", ACM Computing Surveys, vol. 51, issue 2, pp. 28:1--28:34, 2018.
Zhang, H., G. Cormack, M. Grossman, and M. Smucker, "Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval", ArXiv, vol. abs/1803.08988, 2018.
Hopfgartner, F., A. Hanbury, H. Müller, I. Eggel, K. Balog, T. Brodt, G. Cormack, J. Lin, J. Kalpathy-Cramer, N. Kando, et al., "Evaluation-as-a-Service for the Computational Sciences: Overview And Outlook", Journal of Data and Information Quality, vol. 10, issue 4, pp. 15:1--15:32, 2018.
Ammar, K., and T. Ozsu, "Experimental Analysis of Distributed Graph Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 11, issue 10, pp. 1151--1164, 2018.
Ammar, K., and T. Ozsu, "Experimental Analysis of Distributed Graph Systems", ArXiv, vol. abs/1806.08082, 2018.
Gebaly, K. El, G. Feng, L. Golab, F. Korn, and D. Srivastava, "Explanation Tables", IEEE Data Engineering Bulletin, vol. 41, issue 3, pp. 43--51, 2018.
Tang, R., A. Adhikari, and J. Lin, "FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks", ArXiv, vol. abs/1811.03060, 2018.
Gebaly, K. El, and J. Lin, "In-Browser Split-Execution Support for Interactive Analytics in The Cloud", ArXiv, vol. abs/1804.08822, 2018.
Rao, J., W. Yang, Y. Zhang, F. Türe, and J. Lin, "Multi-Perspective Relevance Matching With Hierarchical ConvNets For Social Media Search", ArXiv, vol. abs/1805.08159, 2018.
Tang, R., and J. Lin, "Progress and Tradeoffs in Neural Language Models", ArXiv, vol. abs/1811.00942, 2018.
Lin, J., and P. Yang, "Repeatability Corner Cases in Document Ranking: The Impact of Score Ties", ArXiv, vol. abs/1807.05798, 2018.
Liu, Y., M. P. Kato, C. Clarke, N. Kando, and T. Sakai, "Report on NTCIR-13: The Thirteenth Round of NII Testbeds and Community For Information Access Research", SIGIR Forum, vol. 52, issue 1, pp. 102--110, 2018.
J. Culpepper, S., F. Diaz, and M. Smucker, "Research Frontiers in Information Retrieval: Report From the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018)", SIGIR Forum, vol. 52, issue 1, pp. 34--90, 2018.
Salihoglu, S., and T. Ozsu, "Response to "Scale Up or Scale Out for Graph Processing"", IEEE Internet Computing, vol. 22, issue 5, pp. 18--24, 2018.
El-Roby, A., K. Ammar, A. Aboulnaga, and J. Lin, "Sapphire: Querying RDF Data Made Simple", ArXiv, vol. abs/1805.11728, 2018.
Lin, J., "Scale Up or Scale Out for Graph Processing?", IEEE Internet Computing, vol. 22, issue 3, pp. 72--78, 2018.
Kushagra, S., S. Ben-David, and I. Ilyas, "Semi-Supervised Clustering for De-Duplication", ArXiv, vol. abs/1810.04361, 2018.
Kim, Y., and J. Lin, "Serverless Data Analytics With Flint", ArXiv, vol. abs/1803.06354, 2018.
Bater, J., X. He, W. Ehrich, A. Machanavajjhala, and J. Rogers, "Shrinkwrap: Differentially-Private Query Processing in Private Data Federations", ArXiv, vol. abs/1810.01816, 2018.
Bater, J., X. He, W. Ehrich, A. Machanavajjhala, and J. Rogers, "ShrinkWrap: Efficient SQL Query Processing in Differentially Private Data Federations", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 3, pp. 307--320, 2018.
Shi, P., J. Rao, and J. Lin, "Simple Attention-Based Representation Learning for Ranking Short Social Media Posts", ArXiv, vol. abs/1811.01013, 2018.
Tang, R., G. Yang, H. Wei, Y. Mao, F. Türe, and J. Lin, "Streaming Voice Query Recognition Using Causal Convolutional Recurrent Neural Networks", ArXiv, vol. abs/1812.07754, 2018.
Lin, J., "The Neural Hype and Comparisons Against Weak Baselines", SIGIR Forum, vol. 52, issue 2, pp. 40--51, 2018.
Li, Y., L. Zou, T. Ozsu, and D. Zhao, "Time Constrained Continuous Subgraph Search Over Streaming Graphs", ArXiv, vol. abs/1801.09240, 2018.
He, X., Policy Driven Data Sharing With Provable Privacy Guarantees: Duke University, Durham, NC, USA, 2018.

2017

Shen, C., T. Shen, and J. Lin, "Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future", Algorithms for Next-Generation Sequencing Data, Techniques, Approaches, and Applications: Springer, 2017.
Crane, M., S. J. Culpepper, J. Lin, J. M. Mackenzie, and A. Trotman, "A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation", Web Search and Data Mining (WSDM), 2017.
Baruah, G., R. McCreadie, and J. Lin, "A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries", International Conference on Information and Knowledge Management (CIKM), 2017.
Fernandez, R. Castro, D. Deng, E. Mansour, A. Ali Qahtan, W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, et al., "A Demo of the Data Civilizer System", ACM International Conference on Management of Data (SIGMOD), 2017.
Karyakin, A., and K. Salem, "An Analysis of Memory Power Consumption in Database Systems", International Workshop on Data Management on New Hardware (DaMoN), 2017.
Crane, M., and J. Lin, "An Exploration of Serverless Architectures for Information Retrieval", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
He, H., K. Ganjam, N. Jain, J. Lundin, R. White, and J. Lin, "An Insight Extraction System on BioMedical Literature With Deep Neural Networks", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
Toman, D., and G. Weddell, "An Interpolation-Based Compiler and Optimizer for Relational Queries (System Design Report)", International Conference on Logic Programming and Automated Reasoning (LPAR), 2017.
Yang, P., H. Fang, and J. Lin, "Anserini: Enabling the Use of Lucene for Information Retrieval Research", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Zihayat, M., A. An, L. Golab, M. Kargar, and J. Szlichta, "Authority-Based Team Discovery in Social Networks", International Conference on Extending Database Technology (EDBT), 2017.
Grossman, M., G. Cormack, and A. Roegiest, "Automatic and Semi-Automatic Document Selection for Technology-Assisted Review", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Zhang, H., J. Rao, J. Lin, and M. Smucker, "Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
He, X., A. Machanavajjhala, C. J. Flynn, and D. Srivastava, "Composing Differential Privacy and Secure Computation: A Case Study On Scaling Private Record Linkage", Conference on Computer and Communications Security (CCS), 2017.
Borgida, A., D. Toman, and G. Weddell, "Concerning Referring Expressions in Query Answers", International Joint Conference on Artificial Intelligence (IJCAI), 2017.
Abedjan, Z., L. Golab, and F. Naumann, "Data Profiling: A Tutorial", ACM International Conference on Management of Data (SIGMOD), 2017.
Bejnordi, B. Ehteshami, J. Lin, B. Glass, M. Mullooly, G. L. Gierach, M. E. Sherman, N. Karssemeijer, J. van der Laak, and A. H. Beck, "Deep Learning-Based Assessment of Tumor-Associated Stroma for Diagnosing Breast Cancer in Histopathology Images", IEEE International Symposium on Biomedical Imaging (ISBI), 2017.
Machanavajjhala, A., X. He, and M. Hay, "Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges", ACM International Conference on Management of Data (SIGMOD), 2017.
Pacaci, A., A. Zhou, J. Lin, and T. Ozsu, "Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications", International Workshop on Graph Data Management Experiences and Systems (GRADES), 2017.
Baskaran, S., A. Keller, F. Chiang, L. Golab, and J. Szlichta, "Efficient Discovery of Ontology Functional Dependencies", International Conference on Information and Knowledge Management (CIKM), 2017.
Ghelani, N., S. Mohammed, S. Wang, and J. Lin, "Event Detection on Curated Tweet Streams", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Rao, J., H. He, and J. Lin, "Experiments With Convolutional Neural Network Models for Answer Selection", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Vtyurina, A., D. Savenkov, E. Agichtein, and C. Clarke, "Exploring Conversational Search With Humans, Assistants, and Wizards", ACM Conference on Human Factors in Computing Systems (CHI), 2017.
Sequiera, R., and J. Lin, "Finally, a Downloadable Test Collection of Tweets", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Toulis, A., and L. Golab, "Graph Mining to Characterize Competition for Employment", ACM International Conference on Management of Data (SIGMOD), 2017.
Kankanamge, C., S. Sahu, A. Mhedhbi, J. Chen, and S. Salihoglu, "Graphflow: An Active Graph Database", ACM International Conference on Management of Data (SIGMOD), 2017.
Afrati, F. N., M. R. Joglekar, C. Ré, S. Salihoglu, and J. D. Ullman, "GYM: A Multiround Distributed Join Algorithm", International Conference on Database Theory (ICDT), 2017.
Fink, S. Dominik, L. Golab, S. Keshav, and H. de Meer, "How Similar Is the Usage of Electric Cars and Electric Bicycles?", Energy-Efficient Computing and Networking (e-Energy), 2017.
Gebaly, K. El, and J. Lin, "In-Browser Interactive SQL Analytics With Afterburner", ACM International Conference on Management of Data (SIGMOD), 2017.
Lamb, C., D. G. Brown, and C. Clarke, "Incorporating Novelty, Meaning, Reaction and Craft Into Computational Poetry: A Negative Experimental Result", International Conference on Computational Creativity (ICCC), 2017.
Gorenflo, C., L. Golab, and S. Keshav, "Managing Sensor Data Streams: Lessons Learned From the WeBike Project", International Conference on Statistical and Scientific Database Management (SSDBM), 2017.
Rao, J., F. Türe, X. Niu, and J. Lin, "Mining the Temporal Statistics of Query Terms for Searching Social Media Posts", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
Grossman, M., and G. Cormack, "MRG_UWaterloo and WaterlooCormack Participation in the TREC 2017 Common Core Track", Text Retrieval Conference (TREC), 2017.
Grossman, M., and G. Cormack, "MRG_UWaterloo and WaterlooCormack Participation in the TREC 2017 Common Core Track", Text Retrieval Conference (TREC), 2017.
Cormack, G., and M. Grossman, "Navigating Imprecision in Relevance Assessments on the Road to Total Recall: Roger and Me", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Cui, X., M. Mior, B. Wong, K. Daudjee, and S. Rizvi, "Netstore: Leveraging Network Optimizations to Improve Distributed Transaction Processing Performance", International Middleware Conference (Middleware), 2017.
Toman, D., and G. Weddell, "On Partial Features in the DLF Dialects of Description Logic With Inverse Features", International Workshop on Description Logics (DL), 2017.
Tan, L., G. Baruah, and J. Lin, "On the Reusability of "Living Labs" Test Collections: : A Case Study Of Real-Time Summarization", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Roegiest, A., L. Tan, and J. Lin, "Online in-Situ Interleaved Evaluation of Real-Time Push Notification Systems", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Meng, X., and L. Golab, "Optimal Reducer Placement to Minimize Data Transfer in MapReduce-style Processing", IEEE International Conference on Big Data (IEEE BigData), 2017.
Lin, J., S. Mohammed, R. Sequiera, L. Tan, N. Ghelani, M. Abualsaud, R. McCreadie, D. Milajevs, and E. M. Voorhees, "Overview of the TREC 2017 Real-Time Summarization Track", Text Retrieval Conference (TREC), 2017.
Clarke, C., N. Kando, and T. Sakai, "Preface From NTCIR-13 General Chairs", Conference on Evaluation of Information Access Technologies (NTCIR), 2017.
Mohammed, S., M. Crane, and J. Lin, "Quantization in Append-Only Collections", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
Mate, J., K. Daudjee, and S. Kamali, "Robust Multi-Tenant Server Consolidation in the Cloud for Data Analytics Workloads", IEEE International Conference on Distributed Computing Systems (ICDCS), 2017.
Feng, G., L. Golab, and D. Srivastava, "Scalable Informative Rule Mining", IEEE International Conference on Data Engineering (ICDE), 2017.
Kane, A., and F. Tompa, "Small-Term Distribution for Disk-Based Search", ACM Symposium on Document Engineering (DocEng), 2017.
Toulis, A., and L. Golab, "Social Media Mining to Understand Public Mental Health", Very Large Data Bases Conference (VLDB), 2017.
Rao, J., F. Türe, H. He, O. Jojic, and J. Lin, "Talking to Your TV: Context-Aware Voice Search With Hierarchical Recurrent Neural Networks", International Conference on Information and Knowledge Management (CIKM), 2017.
Cormack, G., and M. Grossman, "Technology-Assisted Review in Empirical Medicine: Waterloo Participation In CLEF eHealth 2017", Conference and Labs of the Evaluation Forum (CLEF), 2017.
Clarke, C., G. Cormack, J. Lin, and A. Roegiest, "Ten Blue Links on Mars", The Web Conference (WWW), 2017.
Deng, D., R. Castro Fernandez, Z. Abedjan, S. Wang, M. Stonebraker, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, and N. Tang, "The Data Civilizer System", Conference on Innovative Data Systems Research (CIDR), 2017.
Azzopardi, L., M. Crane, H. Fang, G. Ingersoll, J. Lin, Y. Moshfeghi, H. Scells, P. Yang, and G. Zuccon, "The Lucene for Information Access and Retrieval Research (LIARR) Workshop at SIGIR 2017", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Baruah, G., and J. Lin, "The Pareto Frontier of Utility Models as a Framework for Evaluating Push Notification Systems", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
Pogacar, F. A., A. Ghenai, M. Smucker, and C. Clarke, "The Positive and Negative Influence of Search Results on People's Decisions About the Efficacy of Medical Treatments", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
Wang, Z., B. Lin, I. Milligan, and J. Lin, "Topic Shifts Between Two US Presidential Administrations", Web Archiving and Digital Libraries Workshop (WADL), 2017.
Zhang, H., M. Abualsaud, N. Ghelani, A. Ghosh, M. Smucker, G. Cormack, and M. Grossman, "UWaterlooMDS at the TREC 2017 Common Core Track", Text Retrieval Conference (TREC), 2017.
Tang, R., W. Wang, Z. Tu, and J. Lin, "An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting", ArXiv, vol. abs/1711.00333, 2017.
Tu, Z., M. Crane, R. Sequiera, J. Zhang, and J. Lin, "An Exploration of Approaches to Integrating Neural Reranking Models In Multi-Stage Ranking Architectures", ArXiv, vol. abs/1707.08275, 2017.
Abdelaziz, I., R. Harbi, S. Salihoglu, and P. Kalnis, "Combining Vertex-Centric Graph Processing With SPARQL for Large-Scale RDF Data Analytics", IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 28, issue 12, pp. 3374--3388, 2017.
Sadiq, S. Wasim, T. Dasu, X. Luna Dong, J. Freire, I. Ilyas, S. Link, R. J. Miller, F. Naumann, X. Zhou, and D. Srivastava, "Data Quality: The Role of Empiricism", SIGMOD Record, vol. 46, issue 4, pp. 35--43, 2017.
Bejnordi, B. Ehteshami, J. Lin, B. Glass, M. Mullooly, G. L. Gierach, M. E. Sherman, N. Karssemeijer, J. van der Laak, and A. H. Beck, "Deep Learning-Based Assessment of Tumor-Associated Stroma for Diagnosing Breast Cancer in Histopathology Images", ArXiv, vol. abs/1702.05803, 2017.
Tang, R., and J. Lin, "Deep Residual Learning for Small-Footprint Keyword Spotting", ArXiv, vol. abs/1710.10361, 2017.
Mohammed, S., N. Ghelani, and J. Lin, "Distant Supervision for Topic Classification of Tweets in Curated Streams", ArXiv, vol. abs/1704.06726, 2017.
Szlichta, J., P. Godfrey, L. Golab, M. Kargar, and D. Srivastava, "Effective and Complete Discovery of Order Dependencies via Set-Based Axiomatization", Proceedings of the VLDB Endowment (PVLDB), vol. 10, issue 7, pp. 721--732, 2017.
Mackenzie, J. M., S. J. Culpepper, R. Blanco, M. Crane, C. Clarke, and J. Lin, "Efficient and Effective Tail Latency Minimization in Multi-Stage Retrieval Systems", ArXiv, vol. abs/1704.03970, 2017.
Deng, D., W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Entity Consolidation: The Golden Record Problem", ArXiv, vol. abs/1709.10436, 2017.
Sequiera, R., G. Baruah, Z. Tu, S. Mohammed, J. Rao, H. Zhang, and J. Lin, "Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering", ArXiv, vol. abs/1707.07804, 2017.
Yan, D., H. Chen, J. Cheng, T. Ozsu, Q. Zhang, and J. C. S. Lui, "G-Thinker: Big Graph Mining Made Easier and Faster", ArXiv, vol. abs/1709.03110, 2017.
Zou, L., and T. Ozsu, "Graph-Based RDF Data Management", Data Science and Engineering, vol. 2, issue 1, pp. 56--70, 2017.
Rekatsinas, T., X. Chu, I. Ilyas, and C. Ré, "HoloClean: Holistic Data Repairs With Probabilistic Inference", Proceedings of the VLDB Endowment (PVLDB), vol. 10, issue 11, pp. 1190--1201, 2017.
Rekatsinas, T., X. Chu, I. Ilyas, and C. Ré, "HoloClean: Holistic Data Repairs With Probabilistic Inference", ArXiv, vol. abs/1702.00820, 2017.
Tang, R., and J. Lin, "Honk: A PyTorch Reimplementation of Convolutional Neural Networks For Keyword Spotting", ArXiv, vol. abs/1710.06554, 2017.
Vadehra, A., M. Grossman, and G. Cormack, "Impact of Feature Selection on Micro-Text Classification", ArXiv, vol. abs/1708.08123, 2017.
Lin, J., "In Defense of MapReduce", IEEE Internet Computing, vol. 21, issue 3, pp. 94--98, 2017.
Rao, J., H. He, H. Zhang, F. Türe, R. Sequiera, S. Mohammed, and J. Lin, "Integrating Lexical and Temporal Signals in Neural Ranking Models For Searching Social Media Streams", ArXiv, vol. abs/1707.07792, 2017.
Konow, R., G. Navarro, C. Clarke, and A. López-Ortiz, "Inverted Treaps", ACM Transactions on Information Systems (TOIS), vol. 35, issue 3, pp. 22:1--22:45, 2017.
Ünel, G., and D. Toman, "Logic Programming Approach to Automata-Based Decision Procedures", Journal of Logic Programming, vol. 86, issue 1, pp. 391--407, 2017.
Mior, M. J., K. Salem, A. Aboulnaga, and R. Liu, "NoSE: Schema Design for NoSQL Applications", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 29, issue 10, pp. 2275--2289, 2017.
Allan, J., N. J. Belkin, P. N. Bennett, J. Callan, C. Clarke, F. Diaz, S. T. Dumais, N. Ferro, D. Harman, D. Hiemstra, et al., "Overview of Special Issue", SIGIR Forum, vol. 51, issue 2, pp. 1--25, 2017.
Ge, C., I. Ilyas, X. He, and A. Machanavajjhala, "Private Exploration Primitives for Data Cleaning", ArXiv, vol. abs/1712.10266, 2017.
He, X., A. Machanavajjhala, C. J. Flynn, and D. Srivastava, "Scaling Private Record Linkage Using Output Constrained Differential Privacy", ArXiv, vol. abs/1702.00535, 2017.
Liu, X., L. Golab, W. M. Golab, I. Ilyas, and S. Jin, "Smart Meter Data Analytics: Systems, Algorithms, and Benchmarking", ACM Transactions on Database Systems (TODS), vol. 42, issue 1, pp. 2:1--2:39, 2017.
Mohammed, S., P. Shi, and J. Lin, "Strong Baselines for Simple Question Answering Over Knowledge Graphs With and Without Neural Networks", ArXiv, vol. abs/1712.01969, 2017.
Rao, J., F. Türe, H. He, O. Jojic, and J. Lin, "Talking to Your TV: Context-Aware Voice Search With Hierarchical Recurrent Neural Networks", ArXiv, vol. abs/1705.04892, 2017.
Lin, J., "The Lambda and the Kappa", IEEE Internet Computing, vol. 21, issue 5, pp. 60--66, 2017.
Lin, J., and A. Trotman, "The Role of Index Compression in Score-at-a-Time Query Evaluation", Information Retrieval Journal, vol. 20, issue 3, pp. 199--220, 2017.
Sahu, S., A. Mhedhbi, S. Salihoglu, J. Lin, and T. Ozsu, "The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing", Proceedings of the VLDB Endowment (PVLDB), vol. 11, issue 4, pp. 420--431, 2017.
Sahu, S., A. Mhedhbi, S. Salihoglu, J. Lin, and T. Ozsu, "The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: A User Survey", ArXiv, vol. abs/1709.03188, 2017.
Yang, Y., L. Golab, and T. Ozsu, "ViewDF: Declarative Incremental View Maintenance for Streaming Data", Information Systems, vol. 71, pp. 55--67, 2017.
Lin, J., I. Milligan, J. Wiebe, and A. Zhou, "Warcbase: Scalable Analytics Infrastructure for Exploring Web Archives", ACM Journal on Computing and Cultural Heritage, vol. 10, issue 4, pp. 22:1--22:30, 2017.

2016

Cormack, G., and M. Grossman, ""When to Stop" Waterloo (Cormack) Participation in the TREC 2016 Total Recall Track", Text Retrieval Conference (TREC), 2016.
Agrawal, S., and K. Daudjee, "A Performance Comparison of Algorithms for Byzantine Agreement In Distributed Systems", European Dependable Computing Conference (EDCC), 2016.
Roegiest, A., L. Tan, J. Lin, and C. Clarke, "A Platform for Streaming Push Notifications to Mobile Assessors", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Wu, G. Zhiping, and F. Tompa, "A Space-Efficient Data Structure for Fast Access Control in ECM Systems", ACM Symposium on Access Control Models and Technologies (SACMAT), 2016.
Roegiest, A., and G. Cormack, "An Architecture for Privacy-Preserving and Replicable High-Recall Retrieval Experiments", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Hashemi, S. Hadi, C. Clarke, A. Dean-Hall, J. Kamps, and J. Kiseleva, "An Easter Egg Hunting Approach to Test Collection Building in Dynamic Domains", Conference on Evaluation of Information Access Technologies (NTCIR), 2016.
Tan, L., A. Roegiest, J. Lin, and C. Clarke, "An Exploration of Evaluation Metrics for Mobile Push Notifications", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Al-Harbi, A. Lafi, and M. Smucker, "Are Secondary Assessors Uncertain When They Disagree About Relevance Judgements?", Conference on Human Information Interaction and Retrieval (CHIIR), 2016.
Buntain, C., and J. Lin, "Burst Detection in Social Media Streams for Tracking Interest Profiles In Real Time", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Farid, M. H., A. Roatis, I. Ilyas, H-F. Hoffmann, and X. Chu, "CLAMS: Bringing Quality to Data Lakes", ACM International Conference on Management of Data (SIGMOD), 2016.
Rao, J., X. Niu, and J. Lin, "Compressing and Decoding Term Statistics Time Series", European Conference on Information Retrieval (ECIR), 2016.
Milligan, I., N. Ruest, and J. Lin, "Content Selection and Curation for Web Archiving: The Gatekeepers Vs. The Masses", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2016.
Cafarella, M. J., I. Ilyas, M. Kornacker, T. Kraska, and C. Ré, "Dark Data: Are We Solving the Right Problems?", IEEE International Conference on Data Engineering (ICDE), 2016.
Chu, X., I. Ilyas, S. Krishnan, and J. Wang, "Data Cleaning: Overview and Emerging Challenges", ACM International Conference on Management of Data (SIGMOD), 2016.
Abedjan, Z., L. Golab, and F. Naumann, "Data Profiling", IEEE International Conference on Data Engineering (ICDE), 2016.
Abedjan, Z., J. Morcos, I. Ilyas, M. Ouzzani, P. Papotti, and M. Stonebraker, "DataXFormer: A Robust Transformation Discovery System", IEEE International Conference on Data Engineering (ICDE), 2016.
Jackson, A., J. Lin, I. Milligan, and N. Ruest, "Desiderata for Exploratory Search Interfaces to Web Archives in Support Of Scholarly Activities", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2016.
Buntain, C., J. Lin, and J. Golbeck, "Discovering Key Moments in Social Media Streams", Consumer Communications and Networking Conference (CCNC), 2016.
J. Culpepper, S., C. Clarke, and J. Lin, "Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems", Australasian Document Computing Symposium (ADCS), 2016.
Kargar, M., L. Golab, and J. Szlichta, "eGraphSearch: Effective Keyword Search in Graphs", International Conference on Information and Knowledge Management (CIKM), 2016.
Cormack, G., and M. Grossman, "Engineering Quality and Reliability in Technology-Assisted Review", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Bommannavar, P., J. Lin, and A. Rajaraman, "Estimating Topical Volume in Social Media Streams", ACM Symposium on Applied Computing (SAC), 2016.
Lamb, C., D. G. Brown, and C. Clarke, "Evaluating Digital Poetry: Insights From the CAT", International Conference on Computational Creativity (ICCC), 2016.
Oard, D. W., K. Shilton, and J. Lin, "Evaluating Search Among Secrets", Conference on Evaluation of Information Access Technologies (NTCIR), 2016.
Milligan, I., J. Lin, J. Wiebe, and A. Zhou, "Exploring and Discovering Archive-It Collections With Warcbase", Digital Humanities Conference (DH), 2016.
Roegiest, A., and G. Cormack, "Impact of Review-Set Selection on Human Assessment for Text Classification", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Trotman, A., and J. Lin, "In Vacuo and in Situ Evaluation of SIMD Codecs", Australasian Document Computing Symposium (ADCS), 2016.
Qian, X., J. Lin, and A. Roegiest, "Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Farid, M. H., I. Ilyas, S. Euijong Whang, and C. Yu, "LONLIES: Estimating Property Values for Long Tail Entities", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Smucker, M., and C. Clarke, "Modeling Optimal Switching Behavior", Conference on Human Information Interaction and Retrieval (CHIIR), 2016.
Zanibbi, R., K. Davila, A. Kane, and F. Tompa, "Multi-Stage Math Formula Search: Using Appearance-Based Similarity Metrics at Scale", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Rao, J., H. He, and J. Lin, "Noise-Contrastive Estimation for Answer Selection With Deep Neural Networks", International Conference on Information and Knowledge Management (CIKM), 2016.
Mior, M. J., K. Salem, A. Aboulnaga, and R. Liu, "NoSE: Schema Design for NoSQL Applications", IEEE International Conference on Data Engineering (ICDE), 2016.
Jacques, J. St., D. Toman, and G. Weddell, "Object-Relational Queries over CFDI^∀−_nc Knowledge Bases: OBDA for the SQL-Literate", International Joint Conference on Artificial Intelligence (IJCAI), 2016.
Jacques, J. St., D. Toman, and G. Weddell, "Object-Relational Queries Over CFDI_nc Knowledge Bases: OBDA For the SQL-Literate (Extended Abstract)", International Workshop on Description Logics (DL), 2016.
Jiang, Y. Helen, and L. Golab, "On Competition for Undergraduate Co-Op Placements: A Graph Mining Approach", Educational Data Mining (EDM), 2016.
Toman, D., and G. Weddell, "On Partial Features in the DLF Family of Description Logics", Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2016.
Borgida, A., D. Toman, and G. Weddell, "On Referring Expressions in Information Systems Derived From Conceptual Modelling", International Conference on Conceptual Modeling (ER), 2016.
Borgida, A., D. Toman, and G. Weddell, "On Referring Expressions in Query Answering Over First Order Knowledge Bases", International Conference on Principles of Knowledge Representation and Reasoning (KR), 2016.
Toman, D., and G. Weddell, "Ontology Based Data Access With Referring Expressions for Logics With The Tree Model Property - (Extended Abstract)", Australian Joint Conference on Artificial Intelligence (AUS-AI), 2016.
Baruah, G., H. Zhang, R. Guttikonda, J. Lin, M. Smucker, and O. Vechtomova, "Optimizing Nugget Annotations With Active Learning", International Conference on Information and Knowledge Management (CIKM), 2016.
Hashemi, S. Hadi, J. Kamps, J. Kiseleva, C. Clarke, and E. M. Voorhees, "Overview of the TREC 2016 Contextual Suggestion Track", Text Retrieval Conference (TREC), 2016.
Lin, J., A. Roegiest, L. Tan, R. McCreadie, E. M. Voorhees, and F. Diaz, "Overview of the TREC 2016 Real-Time Summarization Track", Text Retrieval Conference (TREC), 2016.
He, H., and J. Lin, "Pairwise Word Interaction Modeling With Deep Neural Networks for Semantic Similarity Measurement", North American Chapter of the Association for Computational Linguistics (NAACL), 2016.
Bonenfant, M., B. C. Desai, D. Desai, B. C. M. Fung, T. Ozsu, and J. D. Ullman, "Panel: The State of Data: Invited Paper From Panelists", International Database Engineering and Applications Symposium (IDEAS), 2016.
Yang, G. Hui, I. Soboroff, L. Xiong, C. Clarke, and S. L. Garfinkel, "Privacy-Preserving IR 2016: Differential Privacy, Search, and Social Media", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Lin, J., Z. Tu, M. Rose, and P. White, "Prizm: A Wireless Access Point for Proxy-Based Web Lifelogging", ACM International Conference on Multimedia (MM), 2016.
Han, M., and K. Daudjee, "Providing Serializability for Pregel-Like Graph Processing Systems", International Conference on Extending Database Technology (EDBT), 2016.
Gebhard, L., L. Golab, S. Keshav, and H. de Meer, "Range Prediction for Electric Bicycles", Energy-Efficient Computing and Networking (e-Energy), 2016.
Elbagoury, A., M. Crane, and J. Lin, "Rank-at-a-Time Query Processing", International Conference on the Theory of Information Retrieval (ICTIR), 2016.
Paik, J. H., and J. Lin, "Retrievability in API-Based "Evaluation as a Service"", International Conference on the Theory of Information Retrieval (ICTIR), 2016.
Zhang, H., J. Lin, G. Cormack, and M. Smucker, "Sampling Strategies and Active Learning for Volume Estimation", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Cormack, G., and M. Grossman, "Scalability of Continuous Active Learning for Reliable High-Recall Text Classification", International Conference on Information and Knowledge Management (CIKM), 2016.
Murdock, V., C. Clarke, J. Kamps, and J. Karlgren, "Second Workshop on Search and Exploration of X-Rated Information (SEXI'16): WSDM Workshop Summary", Web Search and Data Mining (WSDM), 2016.
Moschitti, A., L. Màrquez, P. Nakov, E. Agichtein, C. Clarke, and I. Szpektor, "SIGIR 2016 Workshop WebQA II: Web Question Answering Beyond Factoids", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Tan, L., A. Roegiest, C. Clarke, and J. Lin, "Simple Dynamic Emission Strategies for Microblog Filtering", International Conference on Research and Development in Information Retrieval (SIGIR), 2016.
Davila, K., R. Zanibbi, A. Kane, and F. Tompa, "Tangent-3 at the NTCIR-12 MathIR Task", Conference on Evaluation of Information Access Technologies (NTCIR), 2016.
Rao, J., and J. Lin, "Temporal Query Expansion Using a Continuous Hidden Markov Model", International Conference on the Theory of Information Retrieval (ICTIR), 2016.
Clarke, C., G. Cormack, J. Lin, and A. Roegiest, "Total Recall: Blue Sky on Mars", International Conference on the Theory of Information Retrieval (ICTIR), 2016.
Lin, J., M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna, "Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge", European Conference on Information Retrieval (ECIR), 2016.
Grossman, M., G. Cormack, and A. Roegiest, "TREC 2016 Total Recall Track Overview", Text Retrieval Conference (TREC), 2016.
He, H., J. Wieting, K. Gimpel, J. Rao, and J. Lin, "UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement", International Workshop on Semantic Evaluation (SemEval), 2016.
Ehsan, N., F. Tompa, and A. Shakery, "Using a Dictionary and N-Gram Alignment to Improve Fine-Grained Cross-Language Plagiarism Detection", ACM Symposium on Document Engineering (DocEng), 2016.
Radhakrishnan, S., B. J. Muscedere, and K. Daudjee, "V-Hadoop: Virtualized Hadoop Using Containers", IEEE International Symposium on Network Computing and Applications (NCA), 2016.
Hartig, O., and T. Ozsu, "Walking Without a Map: Ranking-Based Traversal for Querying Linked Data", International Semantic Web Conference (ISWC), 2016.
Ozsu, T., "Web Data Management in the RDF Age: Keynote Talk Abstract", International Database Engineering and Applications Symposium (IDEAS), 2016.
He, X., N. Raval, and A. Machanavajjhala, "A Demonstration of VisDPT: Visual Exploration of Differentially Private Trajectories", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 13, pp. 1489--1492, 2016.
Yan, D., J. Cheng, T. Ozsu, F. Yang, Y. Lu, J. C. S. Lui, Q. Zhang, and W. Ng, "A General-Purpose Query-Centric Framework for Querying Big Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 7, pp. 564--575, 2016.
Ozsu, T., "A Survey of RDF Data Management Systems", Frontiers of Computer Science, vol. 10, issue 3, pp. 418--432, 2016.
Ozsu, T., "A Survey of RDF Data Management Systems", ArXiv, vol. abs/1601.00707, 2016.
Gebaly, K. El, and J. Lin, "Afterburner: The Case for in-Browser Analytics", ArXiv, vol. abs/1605.04035, 2016.
Clarke, C., S. J. Culpepper, and A. Moffat, "Assessing Efficiency-Effectiveness Tradeoffs in Multi-Stage Retrieval Systems Without Using Relevance Judgments", Information Retrieval Journal, vol. 19, issue 4, pp. 351--377, 2016.
Zihayat, M., A. An, L. Golab, M. Kargar, and J. Szlichta, "Authority-Based Team Discovery in Social Networks", ArXiv, vol. abs/1611.02992, 2016.
Jiang, Y. Helen, S. Javaad Syed, and L. Golab, "Data Mining of Undergraduate Course Evaluations", Informatics in Education, vol. 15, issue 1, pp. 85--102, 2016.
Bär, A., P. Casas, A. D'Alconzo, P. Fiadino, L. Golab, M. Mellia, and E. Schikuta, "DBStream: A Holistic Approach to Large-Scale Network Traffic Monitoring And Analysis", Computer Networks, vol. 107, pp. 5--19, 2016.
Abedjan, Z., X. Chu, D. Deng, R. Castro Fernandez, I. Ilyas, M. Ouzzani, P. Papotti, M. Stonebraker, and N. Tang, "Detecting Data Errors: Where Are We and What Needs to Be Done?", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 12, pp. 993--1004, 2016.
Machanavajjhala, A., X. He, and M. Hay, "Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 13, pp. 1611--1614, 2016.
Chu, X., I. Ilyas, and P. Koutris, "Distributed Data Deduplication", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 11, pp. 864--875, 2016.
J. Culpepper, S., C. Clarke, and J. Lin, "Dynamic Trade-Off Prediction in Multi-Stage Retrieval Systems", ArXiv, vol. abs/1610.02502, 2016.
Bizer, C., L. Dong, I. Ilyas, and M-E. Vidal, "Editorial: Special Issue on Web Data Quality", Journal of Data and Information Quality, vol. 8, issue 1, pp. 1:1--1:3, 2016.
Szlichta, J., P. Godfrey, L. Golab, M. Kargar, and D. Srivastava, "Effective and Complete Discovery of Order Dependencies via Set-Based Axiomatization", ArXiv, vol. abs/1608.06169, 2016.
Ilyas, I., "Effective Data Cleaning With Continuous Evaluation", IEEE Data Engineering Bulletin, vol. 39, issue 2, pp. 38--46, 2016.
Clarke, C., and E. Yilmaz, "EVIA 2016: The Seventh International Workshop on Evaluating Information Access", SIGIR Forum, vol. 50, issue 2, pp. 44--46, 2016.
Sharma, A., J. Jiang, P. Bommannavar, B. Larson, and J. Lin, "GraphJet: Real-Time Content Recommendations at Twitter", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 13, pp. 1281--1292, 2016.
Khabsa, M., A. K. Elmagarmid, I. Ilyas, H. Hammady, and M. Ouzzani, "Learning to Identify Relevant Studies for Systematic Reviews Using Random Forest and External Information", Machine Learning, vol. 102, issue 3, pp. 465--482, 2016.
Quamar, A., A. Deshpande, and J. Lin, "NScale: Neighborhood-Centric Large-Scale Graph Analytics in the Cloud", The VLDB Journal, vol. 25, issue 2, pp. 125--150, 2016.
Drzadzewski, G., and F. Tompa, "Partial Materialization for Online Analytical Processing Over Multi-Tagged Document Collections", Knowledge and Information Systems (KAIS), vol. 47, issue 3, pp. 697--732, 2016.
Peng, P., L. Zou, T. Ozsu, L. Chen, and D. Zhao, "Processing SPARQL Queries Over Distributed RDF Graphs", The VLDB Journal, vol. 25, issue 2, pp. 243--268, 2016.
Chu, X., and I. Ilyas, "Qualitative Data Cleaning", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 13, pp. 1605--1608, 2016.
Yan, D., J. Cheng, T. Ozsu, F. Yang, Y. Lu, J. C. S. Lui, Q. Zhang, and W. Ng, "Quegel: A General-Purpose Query-Centric Framework for Querying Big Graphs", ArXiv, vol. abs/1601.06497, 2016.
El-Roby, A., K. Ammar, A. Aboulnaga, and J. Lin, "Sapphire: Querying RDF Data Made Simple", Proceedings of the VLDB Endowment (PVLDB), vol. 9, issue 13, pp. 1481--1484, 2016.
Lin, J., C. Clarke, and G. Baruah, "Searching From Mars", IEEE Internet Computing, vol. 20, issue 1, pp. 78--82, 2016.
Clarke, C., G. Cormack, J. Lin, and A. Roegiest, "Ten Blue Links on Mars", ArXiv, vol. abs/1610.06468, 2016.
Tan, L., J. Lin, A. Roegiest, and C. Clarke, "The Effects of Latency Penalties in Evaluating Push Notification Systems", ArXiv, vol. abs/1606.03066, 2016.
Lin, J., and K. El Gebaly, "The Future of Big Data Is ... JavaScript?", IEEE Internet Computing, vol. 20, issue 5, pp. 82--88, 2016.