Publications

Sort by: Author Type Year

2024

Arabzadeh, N., A. Bigdeli, and C. Clarke, "Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers", European Conference on Information Retrieval (ECIR), 2024.
Usta, A., C. Liu, and S. Salihoglu, "Analysis of Open Government Datasets From a Data Design and Integration Perspective", International Conference on Extending Database Technology (EDBT), 2024.
Mousavi, A., X. Zhan, H. Bai, P. Shi, T. Rekatsinas, B. Han, Y. Li, J. Pound, J. M. Susskind, N. Schluter, et al., "Construction of Paired Knowledge Graph - Text Datasets Informed By Cyclic Evaluation", International Conference on Computational Linguistics (COLING), 2024.
Arabzadeh, N., and C. Clarke, "Fréchet Distance for Offline Evaluation of Information Retrieval Systems With Sparse Labels", Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024.
Lin, J., J. Li, J. Gao, W. Ma, and Y. Liu, "Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification", AAAI Conference on Artificial Intelligence (AAAI), 2024.
Arabzadeh, N., K. Golzadeh, C. Risi, C. Clarke, and J. Zhao, "KnowFIRES: A Knowledge-Graph Framework for Interpreting Retrieved Entities From Search", European Conference on Information Retrieval (ECIR), 2024.
Hebert, L., G. Sahu, Y. Guo, N. Kishore Sreenivas, L. Golab, and R. Cohen, "Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media", AAAI Conference on Artificial Intelligence (AAAI), 2024.
Esmaeilzadeh, A., J. Rorseth, A. Yu, P. Godfrey, L. Golab, D. Srivastava, J. Szlichta, and K. Taghva, "On Integrating the Data-Science and Machine-Learning Pipelines For Responsible AI", Workshop in Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI), 2024.
Sahu, S., and S. Salihoglu, "Optimizing Differential Computation for Large-Scale Graph Processing", International Workshop on Graph Data Management Experiences and Systems (GRADES), 2024.
Khalaji, M., T. Brown, K. Daudjee, and V. Aksenov, "Practical Hardware Transactional vEB Trees", ACM Symposium on Principles & Practice of Parallel Programming (PPoPP), 2024.
Bonifati, A., T. Ozsu, Y. Tian, H. Voigt, W. Yu, and W. Zhang, "The Future of Graph Analytics", ACM International Conference on Management of Data (SIGMOD), 2024.
Azzopardi, L., C. Clarke, P. B. Kantor, B. Mitra, J. R. Trippas, and Z. Ren, "The Search Futures Workshop", European Conference on Information Retrieval (ECIR), 2024.
Pradeep, R., and J. Lin, "Towards Automated End-to-End Health Misinformation Free Search With A Large Language Model", European Conference on Information Retrieval (ECIR), 2024.
Xian, J., T. Teofili, R. Pradeep, and J. Lin, "Vector Search With OpenAI Embeddings: Lucene Is All You Need", Web Search and Data Mining (WSDM), 2024.
Arabzadeh, N., and C. Clarke, "A Comparison of Methods for Evaluating Generative IR", ArXiv, vol. abs/2404.04044, 2024.
Arabzadeh, N., A. Bigdeli, and C. Clarke, "Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers", ArXiv, vol. abs/2401.04842, 2024.
Arabzadeh, N., S. Huo, N. Mehta, Q. Wu, C. Wang, A. Awadallah, C. Clarke, and J. Kiseleva, "Assessing and Verifying Task Utility in LLM-Powered Applications", ArXiv, vol. abs/2405.02178, 2024.
Golzadeh, K., L. Golab, and J. Szlichta, "Explaining Expert Search and Team Formation Systems With ExES", ArXiv, vol. abs/2405.12881, 2024.
Lin, S-C., L. Gao, B. Oguz, W. Xiong, J. Lin, W-tau. Yih, and X. Chen, "FLAME: Factuality-Aware Alignment for Large Language Models", ArXiv, vol. abs/2405.01525, 2024.
Arabzadeh, N., and C. Clarke, "Fréchet Distance for Offline Evaluation of Information Retrieval Systems With Sparse Labels", ArXiv, vol. abs/2401.17543, 2024.
Alaofi, M., N. Arabzadeh, C. Clarke, and M. Sanderson, "Generative Information Retrieval Evaluation", ArXiv, vol. abs/2404.08137, 2024.
Lin, J., J. Li, J. Gao, W. Ma, and Y. Liu, "Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification", ArXiv, vol. abs/2404.15279, 2024.
Upadhyay, S., E. Kamalloo, and J. Lin, "LLMs Can Patch Up Missing Relevance Judgments in Evaluation", ArXiv, vol. abs/2405.04727, 2024.
Li, M., X. Chen, A. Holtzman, B. Chen, J. Lin, W-tau. Yih, and X. Victoria Lin, "Nearest Neighbor Speculative Decoding for LLM Generation and Attribution", ArXiv, vol. abs/2405.19325, 2024.
Zhuang, S., X. Ma, B. Koopman, J. Lin, and G. Zuccon, "PromptReps: Prompting Large Language Models to Generate Dense And Sparse Representations for Zero-Shot Document Retrieval", ArXiv, vol. abs/2404.18424, 2024.
Rorseth, J., P. Godfrey, L. Golab, D. Srivastava, and J. Szlichta, "RAGE Against the Machine: Retrieval-Augmented LLM Explanations", ArXiv, vol. abs/2405.13000, 2024.
Shehata, D., R. Cohen, and C. Clarke, "Rumour Evaluation With Very Large Language Models", ArXiv, vol. abs/2404.16859, 2024.
He, X., "Technical Perspective: Synthetic Data Needs a Reproducibility Benchmark", SIGMOD Record, vol. 53, issue 1, pp. 64, 2024.
Zhang, X., K. Ogueji, X. Ma, and J. Lin, "Toward Best Practices for Training Multilingual Dense Retrieval Models", ACM Transactions on Information Systems (TOIS), vol. 42, issue 2, pp. 39:1--39:33, 2024.
Arabzadeh, N., J. Kiseleva, Q. Wu, C. Wang, A. Awadallah, V. Dibia, A. Fourney, and C. Clarke, "Towards Better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications", ArXiv, vol. abs/2402.09015, 2024.
Sharifymoghaddam, S., S. Upadhyay, W. Chen, and J. Lin, "UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models", ArXiv, vol. abs/2405.10311, 2024.
Faggioli, G., L. Dietz, C. Clarke, G. Demartini, M. Hagen, C. Hauff, N. Kando, E. Kanoulas, M. Potthast, B. Stein, et al., "Who Determines What Is Relevant? Humans or AI? Why Not Both?", Communications of the ACM, vol. 67, issue 4, pp. 31--34, 2024.

2023

Jiang, Z., M. Y. R. Yang, M. Tsirlin, R. Tang, Y. Dai, and J. Lin, ""Low-Resource" Text Classification: A Parameter-Free Classification Method With Compressors", Association for Computational Linguistics (ACL), 2023.
Arabzadeh, N., O. Kmet, B. Carterette, C. Clarke, C. Hauff, and P. Chandar, "A Is for Adele: An Offline Evaluation Metric for Instant Search", International Conference on the Theory of Information Retrieval (ICTIR), 2023.
Seifikar, M., L. Nhi Phan Minh, N. Arabzadeh, C. Clarke, and M. Smucker, "A Preference Judgment Tool for Authoritative Assessment", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Fernando, L., H. Bindra, and K. Daudjee, "An Experimental Analysis of Quantile Sketches Over Data Streams", International Conference on Extending Database Technology (EDBT), 2023.
Zhang, C., A. Bonifati, and T. Ozsu, "An Overview of Reachability Indexes on Graphs", ACM International Conference on Management of Data (SIGMOD), 2023.
Ma, X., T. Teofili, and J. Lin, "Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes", International Conference on Information and Knowledge Management (CIKM), 2023.
Zhong, W., Y. Xie, and J. Lin, "Answer Retrieval for Math Questions Using Structural and Dense Retrieval", Conference and Labs of the Evaluation Forum (CLEF), 2023.
Yang, J-H., C. Lassance, R. Sampaio de Rezende, K. Srinivasan, M. Redi, S. Clinchant, and J. Lin, "AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Oladipo, A., M. Adeyemi, O. Ahia, A. Toluwase Owodunni, O. Ogundepo, D. Ifeoluwa Adelani, and J. Lin, "Better Quality Pre-Training Data and T5 Models for African Languages", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Adeyemi, M., A. Oladipo, X. Zhang, D. Alfonso-Hermelo, M. Rezagholizadeh, B. Chen, and J. Lin, "CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages", Forum for Information Retrieval Evaluation (FIRE), 2023.
Li, M., S-C. Lin, B. Oguz, A. Ghoshal, J. Lin, Y. Mehdad, W-tau. Yih, and X. Chen, "CITADEL: Conditional Token Interaction via Dynamic Lexical Routing For Efficient and Effective Multi-Vector Retrieval", Association for Computational Linguistics (ACL), 2023.
Rorseth, J., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "CREDENCE: Counterfactual Explanations for Document Ranking", IEEE International Conference on Data Engineering (ICDE), 2023.
Wang, R., J. Wang, P. Kadam, T. Ozsu, and W. G. Aref, "dLSM: An LSM-Based Index for Memory Disaggregation", IEEE International Conference on Data Engineering (ICDE), 2023.
Chai, A., A. Vezvaei, L. Golab, M. Kargar, D. Srivastava, J. Szlichta, and M. Zihayat, "EAGER: Explainable Question Answering Using Knowledge Graphs", International Workshop on Graph Data Management Experiences and Systems (GRADES), 2023.
Ma, X., H. Fun, X. Yin, A. Mallia, and J. Lin, "Enhancing Sparse Retrieval via Unsupervised Learning", ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP), 2023.
Kamalloo, E., X. Zhang, O. Ogundepo, N. Thakur, D. Alfonso-Hermelo, M. Rezagholizadeh, and J. Lin, "Evaluating Embedding APIs for Information Retrieval", Association for Computational Linguistics (ACL), 2023.
Kamalloo, E., N. Dziri, C. Clarke, and D. Rafiei, "Evaluating Open-Domain Question Answering in the Era of Large Language Models", Association for Computational Linguistics (ACL), 2023.
Hebert, L., L. Golab, P. Poupart, and R. Cohen, "FedFormer: Contextual Federation With Attention in Reinforcement Learning", International Joint Conference on Autonomous Agents & Multiagent Systems (AAMAS), 2023.
Bayat, F. Fatahi, K. Qian, B. Han, Y. Sang, A. Belyi, S. Khorshidi, F. Wu, I. Ilyas, and Y. Li, "FLEEK: Factual Error Detection and Correction With Evidence Retrieved From External Knowledge", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Piktus, A., O. Ogundepo, C. Akiki, A. Oladipo, X. Zhang, H. Schoelkopf, S. Biderman, M. Potthast, and J. Lin, "GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration", Association for Computational Linguistics (ACL), 2023.
Hu, L., L. Zou, and T. Ozsu, "GAMMA: A Graph Pattern Mining Framework for Large Graphs on GPU", IEEE International Conference on Data Engineering (ICDE), 2023.
Pang, Y., L. Yang, L. Zou, and T. Ozsu, "gFOV: A Full-Stack SPARQL Query Optimizer & Plan Visualizer", International Conference on Information and Knowledge Management (CIKM), 2023.
Liu, C., A. Usta, J. Zhao, and S. Salihoglu, "Governor: Turning Open Government Data Portals Into Interactive Databases", ACM Conference on Human Factors in Computing Systems (CHI), 2023.
Ilyas, I., JP. Lacerda, Y. Li, U. Farooq Minhas, A. Mousavi, J. Pound, T. Rekatsinas, and C. Sumanth, "Growing and Serving Large Open-Domain Knowledge Graphs", ACM International Conference on Management of Data (SIGMOD), 2023.
Pradeep, R., K. Hui, J. Gupta, Á. D. Lelkes, H. Zhuang, J. Lin, D. Metzler, and V. Q. Tran, "How Does Generative Retrieval Scale to Millions of Passages?", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Lin, S-C., A. Asai, M. Li, B. Oguz, J. Lin, Y. Mehdad, W-tau. Yih, and X. Chen, "How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Conia, S., M. Li, D. Lee, U. Farooq Minhas, I. Ilyas, and Y. Li, "Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Esmaeilzadeh, A., L. Golab, and K. Taghva, "InfoMoD: Information-Theoretic Model Diagnostics", International Conference on Statistical and Scientific Database Management (SSDBM), 2023.
Bianchi, A., R. Karegar, P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "iORDER: Mining Implicit Domain Orders", IEEE International Conference on Data Engineering (ICDE), 2023.
Jin, G., X. Feng, Z. Chen, C. Liu, and S. Salihoglu, "KÙZU Graph Database Management System", Conference on Innovative Data Systems Research (CIDR), 2023.
Kamalloo, E., C. Clarke, and D. Rafiei, "Limitations of Open-Domain Question Answering Benchmarks for Document-Level Reasoning", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Buchanan, G. Robert, D. McKay, and C. Clarke, "Made to Measure: A Workshop on Human-Centred Metrics for Information Seeking", Conference on Human Information Interaction and Retrieval (CHIIR), 2023.
Lin, S-C., A. Ahmad, and J. Lin, "mAggretriever: A Simple Yet Effective Approach to Zero-Shot Multilingual Dense Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Kamphuis, C., A. Lin, S. Yang, J. Lin, A. P. de Vries, and F. Hasibi, "MMEAD: MS MARCO Entity Annotations and Disambiguations", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Ghasemitaheri, S., A. Holcomb, L. Golab, and S. Keshav, "On the Data Quality of Remotely Sensed Forest Maps", Very Large Data Bases Conference (VLDB), 2023.
Zhong, W., S-C. Lin, J-H. Yang, and J. Lin, "One Blade for One Purpose: Advancing Math Information Retrieval Using Hybrid Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Xin, J., R. Tang, Z. Jiang, Y. Yu, and J. Lin, "Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers", Association for Computational Linguistics (ACL), 2023.
Adeyemi, M., A. Oladipo, X. Crystina Zhang, D. Alfonso-Hermelo, M. Rezagholizadeh, B. Chen, and J. Lin, "Overview of the CIRAL Track at FIRE 2023: Cross-Lingual Information Retrieval for African Languages", Forum for Information Retrieval Evaluation (FIRE), 2023.
Feng, E., A. Borgida, E. Franconi, P. F. Patel-Schneider, D. Toman, and G. Weddell, "Path Description Dependencies in Feature-Based DLs", International Workshop on Description Logics (DL), 2023.
Faggioli, G., L. Dietz, C. Clarke, G. Demartini, M. Hagen, C. Hauff, N. Kando, E. Kanoulas, M. Potthast, B. Stein, et al., "Perspectives on Large Language Models for Relevance Judgment", International Conference on the Theory of Information Retrieval (ICTIR), 2023.
Tamber, M. Singh, R. Pradeep, and J. Lin, "Pre-Processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering", European Conference on Information Retrieval (ECIR), 2023.
Gao, L., X. Ma, J. Lin, and J. Callan, "Precise Zero-Shot Dense Retrieval Without Relevance Labels", Association for Computational Linguistics (ACL), 2023.
Ehrlinger, L., H. Harmouch, I. Ilyas, and F. Naumann, "Preface QDB", Very Large Data Bases Conference (VLDB), 2023.
Ozsu, T., and X. Xue, "Preface SDA", Very Large Data Bases Conference (VLDB), 2023.
Clarke, C., F. Diaz, and N. Arabzadeh, "Preference-Based Offline Evaluation", Web Search and Data Mining (WSDM), 2023.
Pradeep, R., H. Chen, L. Gu, M. Singh Tamber, and J. Lin, "PyGaggle: A Gaggle of Resources for Open-Domain Question Answering", European Conference on Information Retrieval (ECIR), 2023.
Saxena, H., L. Golab, S. Idreos, and I. Ilyas, "Real-Time LSM-Trees for HTAP Workloads", IEEE International Conference on Data Engineering (ICDE), 2023.
Huo, S., N. Arabzadeh, and C. Clarke, "Retrieving Supporting Evidence for Generative Question Answering", ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP), 2023.
Li, M., S-C. Lin, X. Ma, and J. Lin, "SLIM: Sparsified Late Interaction for Multi-Vector Retrieval With Inverted Indexes", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Akiki, C., O. Ogundepo, A. Piktus, X. Zhang, A. Oladipo, J. Lin, and M. Potthast, "Spacerini: Plug-and-Play Search Engines With Pyserini and Hugging Face", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Thakur, N., K. Wang, I. Gurevych, and J. Lin, "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-Shot Neural Sparse Retrieval", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
O'Halloran, T., B. McManus, A. Harbison, M. Grossman, and G. Cormack, "Technology-Assisted Review for Spreadsheets and Noisy Text", ACM Symposium on Document Engineering (DocEng), 2023.
Gao, L., X. Ma, J. Lin, and J. Callan, "Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval", International Conference on Research and Development in Information Retrieval (SIGIR), 2023.
Usta, A., and S. Salihoglu, "To Join or Not to Join: An Analysis on the Usefulness of Joining Tables In Open Government Data Portals", Very Large Data Bases Conference (VLDB), 2023.
Tang, R., L. Liu, A. Pandey, Z. Jiang, G. Yang, K. Kumar, P. Stenetorp, J. Lin, and F. Türe, "What the DAAM: Interpreting Stable Diffusion Using Cross Attention", Association for Computational Linguistics (ACL), 2023.
Lin, S-C., and J. Lin, "A Dense Representation Framework for Lexical and Semantic Matching", ACM Transactions on Information Systems (TOIS), vol. 41, issue 4, pp. 110:1--110:29, 2023.
Chen, J., Y. Huang, M. Wang, S. Salihoglu, and K. Salem, "Accurate Summary-Based Cardinality Estimation Through the Lens Of Cardinality Estimation Graphs", SIGMOD Record, vol. 52, issue 1, pp. 94--102, 2023.
Lin, S-C., M. Li, and J. Lin, "Aggretriever: A Simple Approach to Aggregate Textual Representations For Robust Dense Passage Retrieval", Transactions of the Association for Computational Linguistics, vol. 11, pp. 436--452, 2023.
Ma, X., T. Teofili, and J. Lin, "Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes", ArXiv, vol. abs/2304.12139, 2023.
Huang, C., Y. Xie, Z. Jiang, J. Lin, and M. Li, "Approximating Human-Like Few-Shot Learning With GPT-based Compression", ArXiv, vol. abs/2308.06942, 2023.
Yang, J-H., C. Lassance, R. Sampaio de Rezende, K. Srinivasan, M. Redi, S. Clinchant, and J. Lin, "AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation", ArXiv, vol. abs/2304.01961, 2023.
Kassaie, B., and F. Tompa, "Autonomously Computable Information Extraction", Proceedings of the VLDB Endowment (PVLDB), vol. 16, issue 10, pp. 2431--2443, 2023.
Hildred, J., M. Abebe, and K. Daudjee, "Caerus: Low-Latency Distributed Transactions for Geo-Replicated Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 17, issue 3, pp. 469--482, 2023.
Mousavi, A., X. Zhan, H. Bai, P. Shi, T. Rekatsinas, B. Han, Y. Li, J. Pound, J. M. Susskind, N. Schluter, et al., "Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation", ArXiv, vol. abs/2309.11669, 2023.
Rorseth, J., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "CREDENCE: Counterfactual Explanations for Document Ranking", ArXiv, vol. abs/2302.04983, 2023.
Ozsu, T., "Data Science - A Systematic Treatment", Communications of the ACM, vol. 66, issue 7, pp. 106--116, 2023.
Ozsu, T., "Data Science: A Systematic Treatment", ArXiv, vol. abs/2301.13761, 2023.
Mohapatra, S., J. Zong, F. Kerschbaum, and X. He, "Differentially Private Data Generation With Missing Data", ArXiv, vol. abs/2310.11548, 2023.
Zhang, S., and X. He, "DProvDB: Differentially Private Query Processing With Multi-Analyst Provenance", ArXiv, vol. abs/2309.10240, 2023.
Mackenzie, J., A. Trotman, and J. Lin, "Efficient Document-at-a-Time and Score-at-a-Time Query Evaluation For Learned Sparse Representations", ACM Transactions on Information Systems (TOIS), vol. 41, issue 4, pp. 96:1--96:28, 2023.
Zou, L., Y. Pang, T. Ozsu, and J. Chen, "Efficient Execution of SPARQL Queries With OPTIONAL and UNION Expressions", ArXiv, vol. abs/2303.13844, 2023.
Chen, H., C. Lassance, and J. Lin, "End-to-End Retrieval With Learned Dense and Sparse Representations Using Lucene", ArXiv, vol. abs/2311.18503, 2023.
Kamalloo, E., X. Zhang, O. Ogundepo, N. Thakur, D. Alfonso-Hermelo, M. Rezagholizadeh, and J. Lin, "Evaluating Embedding APIs for Information Retrieval", ArXiv, vol. abs/2305.06300, 2023.
Kamalloo, E., N. Dziri, C. Clarke, and D. Rafiei, "Evaluating Open-Domain Question Answering in the Era of Large Language Models", ArXiv, vol. abs/2305.06984, 2023.
Ren, H., A. Mousavi, A. Pacaci, S. Rahman Chowdhury, J. Mohoney, I. Ilyas, Y. Li, and T. Rekatsinas, "Fact Ranking Over Large-Scale Knowledge Graphs With Reasoning Embedding Models", IEEE Data Engineering Bulletin, vol. 46, issue 2, pp. 126--139, 2023.
Ma, X., L. Wang, N. Yang, F. Wei, and J. Lin, "Fine-Tuning LLaMA for Multi-Stage Text Retrieval", ArXiv, vol. abs/2310.08319, 2023.
Bayat, F. Fatahi, K. Qian, B. Han, Y. Sang, A. Belyi, S. Khorshidi, F. Wu, I. Ilyas, and Y. Li, "FLEEK: Factual Error Detection and Correction With Evidence Retrieved From External Knowledge", ArXiv, vol. abs/2310.17119, 2023.
Tang, R., X. Zhang, X. Ma, J. Lin, and F. Türe, "Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models", ArXiv, vol. abs/2310.07712, 2023.
Piktus, A., O. Ogundepo, C. Akiki, A. Oladipo, X. Zhang, H. Schoelkopf, S. Biderman, M. Potthast, and J. Lin, "GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration", ArXiv, vol. abs/2306.01481, 2023.
Li, M., H. Zhuang, K. Hui, Z. Qin, J. Lin, R. Jagerman, X. Wang, and M. Bendersky, "Generate, Filter, and Fuse: Query Expansion via Multi-Step Keyword Generation for Zero-Shot Neural Rankers", ArXiv, vol. abs/2311.09175, 2023.
Ilyas, I., J. P. Lacerda, Y. Li, U. Farooq Minhas, A. Mousavi, J. Pound, T. Rekatsinas, and C. Sumanth, "Growing and Serving Large Open-Domain Knowledge Graphs", ArXiv, vol. abs/2305.09464, 2023.
Kamalloo, E., A. Jafari, X. Zhang, N. Thakur, and J. Lin, "HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking With Attribution", ArXiv, vol. abs/2307.16883, 2023.
Mohoney, J., A. Pacaci, S. Rahman Chowdhury, A. Mousavi, I. Ilyas, U. Farooq Minhas, J. Pound, and T. Rekatsinas, "High-Throughput Vector Similarity Search in Knowledge Graphs", ArXiv, vol. abs/2304.01926, 2023.
Mohoney, J., A. Pacaci, S. Rahman Chowdhury, A. Mousavi, I. Ilyas, U. Farooq Minhas, J. Pound, and T. Rekatsinas, "High-Throughput Vector Similarity Search in Knowledge Graphs", Proceedings of the ACM on Management of Data, vol. 1, issue 2, pp. 197:1--197:25, 2023.
Pradeep, R., K. Hui, J. Gupta, Á. Dániel Lelkes, H. Zhuang, J. Lin, D. Metzler, and V. Q. Tran, "How Does Generative Retrieval Scale to Millions of Passages?", ArXiv, vol. abs/2305.11841, 2023.
Lin, S-C., A. Asai, M. Li, B. Oguz, J. Lin, Y. Mehdad, W-tau. Yih, and X. Chen, "How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval", ArXiv, vol. abs/2302.07452, 2023.
Conia, S., M. Li, D. Lee, U. Farooq Minhas, I. Ilyas, and Y. Li, "Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs", ArXiv, vol. abs/2311.15781, 2023.
Zhang, C., A. Bonifati, and T. Ozsu, "Indexing Techniques for Graph Reachability Queries", ArXiv, vol. abs/2311.03542, 2023.
Salihoglu, S., "Kùzu: A Database Management System for "Beyond Relational" Workloads", SIGMOD Record, vol. 52, issue 3, pp. 39--40, 2023.
Thakur, N., J. Ni, G. Hernández Ábrego, J. Wieting, J. Lin, and D. Cer, "Leveraging LLMs for Synthesizing Training Data Across Many Languages In Multilingual Dense Retrieval", ArXiv, vol. abs/2311.05800, 2023.
Zhang, X., N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, M. Rezagholizadeh, and J. Lin, "MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages", Transactions of the Association for Computational Linguistics, vol. 11, pp. 1114--1131, 2023.
Kamphuis, C., A. Lin, S. Yang, J. Lin, A. P. de Vries, and F. Hasibi, "MMEAD: MS MARCO Entity Annotations and Disambiguations", ArXiv, vol. abs/2309.07574, 2023.
Hebert, L., G. Sahu, N. Kishore Sreenivas, L. Golab, and R. Cohen, "Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media", ArXiv, vol. abs/2307.09312, 2023.
Thakur, N., L. Bonifacio, X. Zhang, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, B. Chen, M. Rezagholizadeh, et al., "NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation", ArXiv, vol. abs/2312.11361, 2023.
Qian, K., A. Belyi, F. Wu, S. Khorshidi, A. Nikfarjam, R. Khot, Y. Sang, K. Luna, X. Chu, E. Choi, et al., "Open Domain Knowledge Extraction for Knowledge Graphs", ArXiv, vol. abs/2312.09424, 2023.
Faggioli, G., L. Dietz, C. Clarke, G. Demartini, M. Hagen, C. Hauff, N. Kando, E. Kanoulas, M. Potthast, B. Stein, et al., "Perspectives on Large Language Models for Relevance Judgment", ArXiv, vol. abs/2304.09161, 2023.
Dadvar, V., L. Golab, and D. Srivastava, "POEM: Pattern-Oriented Explanations of Convolutional Neural Networks", Proceedings of the VLDB Endowment (PVLDB), vol. 16, issue 11, pp. 3192--3200, 2023.
Hebert, L., L. Golab, and R. Cohen, "Predicting Hateful Discussions on Reddit Using Graph Transformer Networks And Communal Context", ArXiv, vol. abs/2301.04248, 2023.
Hebert, L., H. Yi Chen, R. Cohen, and L. Golab, "Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content", ArXiv, vol. abs/2301.10871, 2023.
Zhang, X., S. Hofstätter, P. Lewis, R. Tang, and J. Lin, "Rank-Without-Gpt: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models", ArXiv, vol. abs/2312.02969, 2023.
Pradeep, R., S. Sharifymoghaddam, and J. Lin, "RankVicuna: Zero-Shot Listwise Document Reranking With Open-Source Large Language Models", ArXiv, vol. abs/2309.15088, 2023.
Pradeep, R., S. Sharifymoghaddam, and J. Lin, "RankZephyr: Effective and Robust Zero-Shot Listwise Reranking Is A Breeze!", ArXiv, vol. abs/2312.02724, 2023.
Liao, V., S. Shariyar Murtaza, Y. Nie, and J. Lin, "Regex-Augmented Domain Transfer Topic Classification Based on a Pre-Trained Language Model: An Application in Financial Domain", ArXiv, vol. abs/2305.18324, 2023.
Bauer, C., B. Carterette, N. Ferro, N. Fuhr, J. Beel, T. Breuer, C. Clarke, A. Crescenzi, G. Demartini, G. Maria Di Nunzio, et al., "Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and Education", SIGIR Forum, vol. 57, issue 1, pp. 7:1--7:28, 2023.
Kamalloo, E., N. Thakur, C. Lassance, X. Ma, J-H. Yang, and J. Lin, "Resources for Brewing BEIR: Reproducible Reference Models and An Official Leaderboard", ArXiv, vol. abs/2306.07471, 2023.
Huo, S., N. Arabzadeh, and C. Clarke, "Retrieving Supporting Evidence for Generative Question Answering", ArXiv, vol. abs/2309.11392, 2023.
Huo, S., N. Arabzadeh, and C. Clarke, "Retrieving Supporting Evidence for LLMs Generated Answers", ArXiv, vol. abs/2306.13781, 2023.
Tamber, M. Singh, R. Pradeep, and J. Lin, "Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking With Seq2seq Encoder-Decoder Models", ArXiv, vol. abs/2312.16098, 2023.
Lin, J., and T. Teofili, "Searching Dense Representations With Inverted Indexes", ArXiv, vol. abs/2312.01556, 2023.
Sheshbolouki, A., and T. Ozsu, "sGrow: Explaining the Scale-Invariant Strength Assortativity of Streaming Butterflies", ACM Transactions on the Web, vol. 17, issue 3, pp. 24:1--24:46, 2023.
Zeng, L., L. Zou, and T. Ozsu, "SGSI - A Scalable GPU-Friendly Subgraph Isomorphism Algorithm", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 35, issue 11, pp. 11899--11916, 2023.
Lin, J., D. Alfonso-Hermelo, V. Jeronymo, E. Kamalloo, C. Lassance, R. Frassetto Nogueira, O. Ogundepo, M. Rezagholizadeh, N. Thakur, J-H. Yang, et al., "Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval", ArXiv, vol. abs/2304.01019, 2023.
Li, M., S-C. Lin, X. Ma, and J. Lin, "SLIM: Sparsified Late Interaction for Multi-Vector Retrieval With Inverted Indexes", ArXiv, vol. abs/2302.06587, 2023.
Seltzer, J., J. Pan, K. Cheng, Y. Sun, S. Kolagati, J. Lin, and S. Zong, "SmartProbe: A Virtual Moderator for Market Research Surveys", ArXiv, vol. abs/2305.08271, 2023.
Akiki, C., O. Ogundepo, A. Piktus, X. Zhang, A. Oladipo, J. Lin, and M. Potthast, "Spacerini: Plug-and-Play Search Engines With Pyserini and Hugging Face", ArXiv, vol. abs/2302.14534, 2023.
Thakur, N., K. Wang, I. Gurevych, and J. Lin, "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-Shot Neural Sparse Retrieval", ArXiv, vol. abs/2307.10488, 2023.
Salem, K., "TECHNICAL PERSPECTIVE: Ad Hoc Transactions: What They Are And Why We Should Care", SIGMOD Record, vol. 52, issue 1, pp. 6, 2023.
Wu, Z., A. Anand Deshmukh, Y. Wu, J. Lin, and L. Mou, "Unsupervised Chunking With Hierarchical RNN", ArXiv, vol. abs/2309.04919, 2023.
Lin, J., R. Pradeep, T. Teofili, and J. Xian, "Vector Search With OpenAI Embeddings: Lucene Is All You Need", ArXiv, vol. abs/2308.14963, 2023.
Tang, R., X. Zhang, J. Lin, and F. Türe, "What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations", ArXiv, vol. abs/2311.18812, 2023.
Zong, S., J. Seltzer, J. Pan, K. Cheng, and J. Lin, "Which Model Shall I Choose? Cost/Quality Trade-Offs for Text Classification Tasks", ArXiv, vol. abs/2301.07006, 2023.
Adeyemi, M., A. Oladipo, R. Pradeep, and J. Lin, "Zero-Shot Cross-Lingual Reranking With Large Language Models for Low-Resource Languages", ArXiv, vol. abs/2312.16159, 2023.
Ma, X., X. Zhang, R. Pradeep, and J. Lin, "Zero-Shot Listwise Document Reranking With a Large Language Model", ArXiv, vol. abs/2305.02156, 2023.

2022

Trotman, A., J. Mackenzie, P. Parameswaran, and J. Lin, "A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Borgida, A., E. Franconi, D. Toman, and G. Weddell, "Accessing Document Data Sources Using Referring Expression Types", International Workshop on Description Logics (DL), 2022.
Ogundepo, O., X. Zhang, S. Sun, K. Duh, and J. Lin, "AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Devins, J., J. Tibshirani, and J. Lin, "Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini", Web Search and Data Mining (WSDM), 2022.
Parsa, M. S., H. Shi, Y. Xu, A. Yim, Y. Yin, and L. Golab, "Analyzing Climate Change Discussions on Reddit", International Conference on Computational Science and Computational Intelligence (CSCI), 2022.
Ma, X., K. Sun, R. Pradeep, M. Li, and J. Lin, "Another Look at DPR: Reproduction of Training and Replication Of Retrieval", European Conference on Information Retrieval (ECIR), 2022.
Liu, Y., C. Hu, and J. Lin, "Another Look at Information Retrieval as Statistical Translation", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Zhong, W., Y. Xie, and J. Lin, "Applying Structural and Dense Semantic Matching for the ARQMath Lab 2022, Clef", Conference and Labs of the Evaluation Forum (CLEF), 2022.
Li, M., X. Zhang, J. Xin, H. Zhang, and J. Lin, "Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Chambers, O., R. Cohen, M. Grossman, and Q. Chen, "Creating a User Model to Support User-Specific Explanations of AI Systems", User Modeling, Adaptation, and Personalization (UMAP), 2022.
Shi, P., L. Song, L. Jin, H. Mi, H. Bai, J. Lin, and D. Yu, "Cross-Lingual Text-to-SQL Semantic Parsing With Representation Mixup", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Karegar, R., M. Mirsafian, P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Discovering Domain Orders via Order Dependencies", IEEE International Conference on Data Engineering (ICDE), 2022.
Ma, X., R. Pradeep, R. Nogueira, and J. Lin, "Document Expansion Baselines and Learned Sparse Lexical Representations For MS MARCO V1 and V2", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Kane, A., Y. Ki Ng, and F. Tompa, "Dowsing for Answers to Math Questions: Doing Better With Less", Conference and Labs of the Evaluation Forum (CLEF), 2022.
Shehata, D., N. Arabzadeh, and C. Clarke, "Early Stage Sparse Retrieval With Entity Linking", International Conference on Information and Knowledge Management (CIKM), 2022.
Pacaci, A., A. Bonifati, and T. Ozsu, "Evaluating Complex Queries on Streaming Graphs", IEEE International Conference on Data Engineering (ICDE), 2022.
Zhong, W., J-H. Yang, Y. Xie, and J. Lin, "Evaluating Token-Level and Passage-Level Dense Retrieval Models For Math Information Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Chen, Y., G. Xiao, T. Ozsu, Z. Tang, A. Y. Zomaya, and K. Li, "Exploiting Hierarchical Parallelism and Reusability in Tensor Kernel Processing on Heterogeneous HPC Systems", IEEE International Conference on Data Engineering (ICDE), 2022.
Jiang, Z., Y. Dai, J. Xin, M. Li, and J. Lin, "Few-Shot Non-Parametric Learning With Deep Latent Variable Model", Conference on Neural Information Processing Systems (NeurIPS), 2022.
Vezvaei, A., L. Golab, M. Kargar, D. Srivastava, J. Szlichta, and M. Zihayat, "Fine-Tuning Dependencies With Parameters", International Conference on Extending Database Technology (EDBT), 2022.
Toman, D., and G. Weddell, "First Order Rewritability in Ontology-Mediated Querying in Horn Description Logics", AAAI Conference on Artificial Intelligence (AAAI), 2022.
Seltzer, J., K. Cheng, S. Zong, and J. Lin, "Flipping the Script: Inverse Information Seeking Dialogues for Market Research", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Lin, J., D. Campos, N. Craswell, B. Mitra, and E. Yilmaz, "Fostering Coopetition While Plugging Leaks: The Design and Implementation Of the MS MARCO Leaderboards", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Chopra, S., and L. Golab, "Gender Differences in Early Career Performance Reviews: A Text Mining Study", International Conference on Extending Database Technology (EDBT), 2022.
Kalavri, V., and S. Salihoglu, "GRADES-NDA'22: 5th International Workshop on Graph Data Management Experiences and Systems (GRADES) and Network Data Analytics (NDA)", ACM International Conference on Management of Data (SIGMOD), 2022.
Jin, G., N. Anzum, and S. Salihoglu, "GRainDB: A Relational-Core Graph-Relational DBMS", Conference on Innovative Data Systems Research (CIDR), 2022.
Dehghan, M., D. Kumar, and L. Golab, "GRS: Combining Generation and Revision in Unsupervised Sentence Simplification", Association for Computational Linguistics (ACL), 2022.
Yan, X., C. Luo, C. Clarke, N. Craswell, E. M. Voorhees, and P. Castells, "Human Preferences as Dueling Bandits", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Guo, R., V. Guo, A. Kim, J. Hildred, and K. Daudjee, "Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers", Conference on Machine Learning and Systems (MLSys), 2022.
Zhong, Y., J. Xiao, T. Vetterli, M. Matin, E. Loo, J. Lin, R. Bourgon, and O. Shapira, "Improving Precancerous Case Characterization via Transformer-Based Ensemble Learning", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Li, H., S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "Improving Query Representations for Dense Retrieval With Pseudo Relevance Feedback: A Reproducibility Study", European Conference on Information Retrieval (ECIR), 2022.
Yang, M. Y. R., S. Yang, and J. Lin, "Integration of Text and Geospatial Search for Hydrographic Datasets Using the Lucene Search Library", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2022.
Zhang, D., A. Vakili Tahami, M. Abualsaud, and M. Smucker, "Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Feng, E., D. Toman, and G. Weddell, "Magic Sets in Interpolation-Based Rule Driven Query Optimization", International Web Rule Symposium (RuleML), 2022.
Peng, P., T. Ozsu, L. Zou, C. Yan, and C. Liu, "MPC: Minimum Property-Cut RDF Graph Partitioning", IEEE International Conference on Data Engineering (ICDE), 2022.
Pradeep, R., Y. Li, Y. Wang, and J. Lin, "Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Craswell, N., B. Mitra, E. Yilmaz, D. Campos, J. Lin, E. M. Voorhees, and I. Soboroff, "Overview of the TREC 2022 Deep Learning Track", Text Retrieval Conference (TREC), 2022.
Hebert, L., L. Golab, and R. Cohen, "Predicting Hateful Discussions on Reddit Using Graph Transformer Networks And Communal Context", IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2022.
Abebe, M., H. Lazu, and K. Daudjee, "Proteus: Autonomous Adaptive Storage for Mixed Workloads", ACM International Conference on Management of Data (SIGMOD), 2022.
Li, H., S. Zhuang, X. Ma, J. Lin, and G. Zuccon, "Pseudo-Relevance Feedback With Dense Retrievers in Pyserini", Australasian Document Computing Symposium (ADCS), 2022.
Kamphuis, C., F. Hasibi, J. Lin, and A. P. de Vries, "REBL: Entity Linking at Scale (Prototype)", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2022.
Ilyas, I., T. Rekatsinas, V. Konda, J. Pound, X. Qi, and M. A. Soliman, "Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale", ACM International Conference on Management of Data (SIGMOD), 2022.
Lin, J., D. Alfonso-Hermelo, V. Jeronymo, E. Kamalloo, C. Lassance, R. Frassetto Nogueira, O. Ogundepo, M. Rezagholizadeh, N. Thakur, J-H. Yang, et al., "Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval", Text Retrieval Conference (TREC), 2022.
Tang, R., K. Kumar, G. Yang, A. Pandey, Y. Mao, V. Belyaev, M. Emmadi, C. G. Murray, F. Türe, and J. Lin, "SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Pradeep, R., Y. Liu, X. Zhang, Y. Li, A. Yates, and J. Lin, "Squeezing Water From a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking", European Conference on Information Retrieval (ECIR), 2022.
Tang, R., K. Kumar, J. Xin, P. Vyas, W. Li, G. Yang, Y. Mao, C. G. Murray, and J. Lin, "Temporal Early Exiting for Streaming Speech Commands Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.
Abualsaud, M., and M. Smucker, "The Dark Side of Relevance: The Effect of Non-Relevant Results On Search Behavior", Conference on Human Information Interaction and Retrieval (CHIIR), 2022.
Mohapatra, S., S. Sasy, X. He, G. Kamath, and O. Thakkar, "The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection", AAAI Conference on Artificial Intelligence (AAAI), 2022.
Li, H., S. Wang, S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "To Interpolate or Not to Interpolate: PRF, Dense and Sparse Retrievers", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Voorhees, E. M., N. Craswell, and J. Lin, "Too Many Relevants: Whither Cranfield Test Collections?", International Conference on Research and Development in Information Retrieval (SIGIR), 2022.
Xue, H., F. D. Salim, Y. Ren, and C. Clarke, "Translating Human Mobility Forecasting Through Natural Language Generation", Web Search and Data Mining (WSDM), 2022.
Borgida, A., E. Franconi, D. Toman, and G. Weddell, "Understanding Document Data Sources Using Ontologies With Referring Expressions", Australian Joint Conference on Artificial Intelligence (AUS-AI), 2022.
Arabzadeh, N., M. Seifikar, and C. Clarke, "Unsupervised Question Clarity Prediction Through Retrieved Item Coherency", International Conference on Information and Knowledge Management (CIKM), 2022.
Tahami, A. Vakili, D. Zhang, and M. Smucker, "UWaterlooMDS at the TREC 2022 Health Misinformation Track", Text Retrieval Conference (TREC), 2022.
Durvasula, S., R. Kiguru, S. Mathur, J. Xu, J. Lin, and N. Vijaykumar, "VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks", International Conference on Parallel Architectures and Compilation Techniques (PACT), 2022.
Huo, S., X. Yan, and C. Clarke, "WaterlooClarke at the TREC 2022 Conversational Assistant Track", Text Retrieval Conference (TREC), 2022.
Shi, P., R. Zhang, H. Bai, and J. Lin, "XRICL: Cross-Lingual Retrieval-Augmented in-Context Learning For Cross-Lingual Text-to-SQL Semantic Parsing", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Lin, S-C., and J. Lin, "A Dense Representation Framework for Lexical and Semantic Matching", ArXiv, vol. abs/2206.09912, 2022.
Chen, J., Y. Huang, M. Wang, S. Salihoglu, and K. Salem, "Accurate Summary-Based Cardinality Estimation Through the Lens Of Cardinality Estimation Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 8, pp. 1533--1545, 2022.
Ogundepo, O., X. Zhang, and J. Lin, "Better Than Whitespace: Information Retrieval for Languages Without Custom Tokenizers", ArXiv, vol. abs/2210.05481, 2022.
Lin, J., "Building a Culture of Reproducibility in Academic Research", ArXiv, vol. abs/2212.13534, 2022.
Xin, J., R. Tang, Z. Jiang, Y. Yu, and J. Lin, "Building an Efficiency Pipeline: Commutativity and Cumulativeness Of Efficiency Operators for Transformers", ArXiv, vol. abs/2208.00483, 2022.
Mazmudar, M., T. Humphries, J. Liu, M. Rafuse, and X. He, "Cache Me if You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration", ArXiv, vol. abs/2211.15732, 2022.
Mazmudar, M., T. Humphries, J. Liu, M. Rafuse, and X. He, "Cache Me if You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration", Proceedings of the VLDB Endowment (PVLDB), vol. 16, issue 4, pp. 574--586, 2022.
Voorhees, E. M., I. Soboroff, and J. Lin, "Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?", ArXiv, vol. abs/2201.11086, 2022.
Li, M., X. Zhang, J. Xin, H. Zhang, and J. Lin, "Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking", ArXiv, vol. abs/2205.09638, 2022.
Li, M., S-C. Lin, B. Oguz, A. Ghoshal, J. Lin, Y. Mehdad, W-tau. Yih, and X. Chen, "CITADEL: Conditional Token Interaction via Dynamic Lexical Routing For Efficient and Effective Multi-Vector Retrieval", ArXiv, vol. abs/2211.10411, 2022.
Kassaie, B., E. L. Irving, and F. Tompa, "Computer-Assisted Cohort Identification in Practice", ACM Transactions on Computing for Healthcare, vol. 3, issue 2, pp. 17:1--17:28, 2022.
Zheng, Z., L. Zheng, M. Alipour Langouri, F. Chiang, L. Golab, J. Szlichta, and S. Baskaran, "Contextual Data Cleaning With Ontology Functional Dependencies", Journal of Data and Information Quality, vol. 14, issue 3, pp. 20:1--20:26, 2022.
Sadri, N., and G. Cormack, "Continuous Active Learning Using Pretrained Transformers", ArXiv, vol. abs/2208.06955, 2022.
Ilyas, I., and F. Naumann, "Data Errors: Symptoms, Causes and Origins", IEEE Data Engineering Bulletin, vol. 45, issue 1, pp. 4--9, 2022.
Thakur, N., N. Reimers, and J. Lin, "Domain Adaptation for Memory-Efficient Dense Retrieval", ArXiv, vol. abs/2205.11498, 2022.
Pappachan, P., S. Zhang, X. He, and S. Mehrotra, "Don't Be a Tattle-Tale: Preventing Leakages Through Data Dependencies On Access Control Protected Data", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 2437--2449, 2022.
Pappachan, P., S. Zhang, X. He, and S. Mehrotra, "Don't Be a Tattle-Tale: Preventing Leakages Through Data Dependencies On Access Control Protected Data", ArXiv, vol. abs/2207.08757, 2022.
Shehata, D., N. Arabzadeh, and C. Clarke, "Early Stage Sparse Retrieval With Entity Linking", ArXiv, vol. abs/2208.04887, 2022.
Artikis, A., N. Tatbul, L. Golab, and M. Sadoghi, "Editorial", Information Systems, vol. 109, pp. 102088, 2022.
Kargar, M., L. Golab, D. Srivastava, J. Szlichta, and M. Zihayat, "Effective Keyword Search Over Weighted Graphs", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 34, issue 2, pp. 601--616, 2022.
Zhong, W., J-H. Yang, and J. Lin, "Evaluating Token-Level and Passage-Level Dense Retrieval Models For Math Information Retrieval", ArXiv, vol. abs/2203.11163, 2022.
Dadvar, V., L. Golab, and D. Srivastava, "Exploring Data Using Patterns: A Survey", Information Systems, vol. 108, pp. 101985, 2022.
Hebert, L., L. Golab, P. Poupart, and R. Cohen, "FedFormer: Contextual Federation With Attention in Reinforcement Learning", ArXiv, vol. abs/2205.13697, 2022.
Jiang, Z., Y. Dai, J. Xin, M. Li, and J. Lin, "Few-Shot Non-Parametric Learning With Deep Latent Variable Model", ArXiv, vol. abs/2206.11573, 2022.
Yan, D., G. Guo, J. Khalil, T. Ozsu, W-S. Ku, and J. C. S. Lui, "G-Thinker: A General Distributed Framework for Finding Qualified Subgraphs In a Big Graph With Load Balancing", The VLDB Journal, vol. 31, issue 2, pp. 287--320, 2022.
Dehghan, M., D. Kumar, and L. Golab, "GRS: Combining Generation and Revision in Unsupervised Sentence Simplification", ArXiv, vol. abs/2203.09742, 2022.
Yan, X., C. Luo, C. Clarke, N. Craswell, E. M. Voorhees, and P. Castells, "Human Preferences as Dueling Bandits", ArXiv, vol. abs/2204.10362, 2022.
Zhong, Y., J. Xiao, T. Vetterli, M. Matin, E. Loo, J. Lin, R. Bourgon, and O. Shapira, "Improving Precancerous Case Characterization via Transformer-Based Ensemble Learning", ArXiv, vol. abs/2212.05150, 2022.
Herodotou, H., P. K. Chrysanthis, S. Chen, M. Hsu, K. Daudjee, Y. Wu, and C. Costa, "Introduction to the special issue on self‑managing and hardware‑optimized database systems 2020", Distributed and Parallel Databases, vol. 40, issue 1, pp. 1--3, 2022.
Xia, K., W. Zhao, A. Jolfaei, and T. Ozsu, "Introduction to the Special Section on Edge/Fog Computing for Infectious Disease Intelligence", ACM Transactions on Internet Technology (TOIT), vol. 22, issue 3, pp. 63e:1--63e:2, 2022.
Jiang, Z., M. Y. R. Yang, M. Tsirlin, R. Tang, and J. Lin, "Less Is More: Parameter-Free Text Classification With Gzip", ArXiv, vol. abs/2212.09410, 2022.
Ilyas, I., and T. Rekatsinas, "Machine Learning and Data Cleaning: Which Serves the Other?", Journal of Data and Information Quality, vol. 14, issue 3, pp. 13:1--13:11, 2022.
Zhang, X., N. Thakur, O. Ogundepo, E. Kamalloo, D. Alfonso-Hermelo, X. Li, Q. Liu, M. Rezagholizadeh, and J. Lin, "Making a MIRACL: Multilingual Information Retrieval Across a Continuum Of Languages", ArXiv, vol. abs/2210.09984, 2022.
Jin, G., and S. Salihoglu, "Making RDBMSs Efficient on Graph Workloads Through Predefined Joins", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 5, pp. 1011--1023, 2022.
Ghayyur, S., D. Ghosh, X. He, and S. Mehrotra, "MIDE: Accuracy Aware Minimally Invasive Data Exploration for Decision Support", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 2653--2665, 2022.
Mhedhbi, A., and S. Salihoglu, "Modern Techniques for Querying Graph-Structured Relations: Foundations, System Implementations, and Open Challenges", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 12, pp. 3762--3765, 2022.
Ammar, K., S. Sahu, S. Salihoglu, and T. Ozsu, "Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs", ArXiv, vol. abs/2208.00273, 2022.
Ammar, K., S. Sahu, S. Salihoglu, and T. Ozsu, "Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 3186--3198, 2022.
Dadvar, V., L. Golab, and D. Srivastava, "POEM: Pattern-Oriented Explanations of CNN Models", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 12, pp. 3618--3621, 2022.
Gao, L., X. Ma, J. Lin, and J. Callan, "Precise Zero-Shot Dense Retrieval Without Relevance Labels", ArXiv, vol. abs/2212.10496, 2022.
Liu, L., M. Li, J. Lin, S. Riedel, and P. Stenetorp, "Query Expansion Using Contextual Clue Sampling With Language Models", ArXiv, vol. abs/2210.07093, 2022.
Ozsu, T., "Reminiscences on Influential Papers", SIGMOD Record, vol. 51, issue 2, pp. 44--46, 2022.
Yamamoto, T., Z. Dou, N. Kando, C. Clarke, M. P. Kato, and Y. Liu, "Report on the 16th Round of NII Testbeds and Community for Information Access Research (NTCIR-16)", SIGIR Forum, vol. 56, issue 2, pp. 7:1--7:8, 2022.
Ilyas, I., T. Rekatsinas, V. Konda, J. Pound, X. Qi, and M. A. Soliman, "Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale", ArXiv, vol. abs/2204.07309, 2022.
Sheshbolouki, A., and T. Ozsu, "sGrapp: Butterfly Approximation in Streaming Graphs", ACM Transactions on Knowledge Discovery from Data, vol. 16, issue 4, pp. 76:1--76:43, 2022.
Arabzadeh, N., A. Vtyurina, X. Yan, and C. Clarke, "Shallow Pooling for Sparse Labels", Information Retrieval Journal, vol. 25, issue 4, pp. 365--385, 2022.
Li, Y., L. Zou, T. Ozsu, and D. Zhao, "Space-Efficient Subgraph Search Over Streaming Graph With Timing Order Constraint", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 34, issue 9, pp. 4453--4467, 2022.
Tang, R., K. Kumar, G. Yang, A. Pandey, Y. Mao, V. Belyaev, M. Emmadi, C. G. Murray, F. Türe, and J. Lin, "SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale", ArXiv, vol. abs/2211.11740, 2022.
Gao, L., X. Ma, J. Lin, and J. Callan, "Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval", ArXiv, vol. abs/2203.05765, 2022.
Wang, R., J. Wang, S. Idreos, T. Ozsu, and W. G. Aref, "The Case for Distributed Shared-Memory Databases With RDMA-Enabled Memory Disaggregation", ArXiv, vol. abs/2207.03027, 2022.
Wang, R., J. Wang, S. Idreos, T. Ozsu, and W. G. Aref, "The Case for Distributed Shared-Memory Databases With RDMA-Enabled Memory Disaggregation", Proceedings of the VLDB Endowment (PVLDB), vol. 16, issue 1, pp. 15--22, 2022.
Abebe, M., H. Lazu, and K. Daudjee, "Tiresias: Enabling Predictive Autonomous Storage and Indexing", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 11, pp. 3126--3136, 2022.
Li, H., S. Wang, S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "To Interpolate or Not to Interpolate: PRF, Dense and Sparse Retrievers", ArXiv, vol. abs/2205.00235, 2022.
Zhang, X., K. Ogueji, X. Ma, and J. Lin, "Towards Best Practices for Training Multilingual Dense Retrieval Models", ArXiv, vol. abs/2204.02363, 2022.
Arabzadeh, N., M. Seifikar, and C. Clarke, "Unsupervised Question Clarity Prediction Through Retrieved Item Coherency", ArXiv, vol. abs/2208.04882, 2022.
Nanayakkara, P., J. Bater, X. He, J. Hullman, and J. Rogers, "Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases", ArXiv, vol. abs/2201.05964, 2022.
Nanayakkara, P., J. Bater, X. He, J. Hullman, and J. Rogers, "Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases", Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2022, issue 2, pp. 601--618, 2022.
Durvasula, S., R. Kiguru, S. Mathur, J. Xu, J. Lin, and N. Vijaykumar, "VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks", ArXiv, vol. abs/2210.08729, 2022.
Tang, R., A. Pandey, Z. Jiang, G. Yang, K. Kumar, J. Lin, and F. Türe, "What the DAAM: Interpreting Stable Diffusion Using Cross Attention", ArXiv, vol. abs/2210.04885, 2022.
Shi, P., R. Zhang, H. Bai, and J. Lin, "XRICL: Cross-Lingual Retrieval-Augmented in-Context Learning For Cross-Lingual Text-to-SQL Semantic Parsing", ArXiv, vol. abs/2210.13693, 2022.

2021

Lin, J., R. Nogueira, and A. Yates, Pretrained Transformers for Text Ranking: BERT and Beyond: Morgan & Claypool, 2021.
Mhedhbi, A., P. Gupta, S. Khaliq, and S. Salihoglu, "A+ Indexes: Tunable and Space-Efficient Adjacency Lists in Graph Database Management Systems", IEEE International Conference on Data Engineering (ICDE), 2021.
Parsa, M. S., and L. Golab, "Academic Integrity in Online Education During the COVID-19 Pandemic: A Social Media Mining Study", Educational Data Mining (EDM), 2021.
Chopra, S., and L. Golab, "Analyzing Ranking Strategies to Characterize Competition for Co-Operative Work Placements", Educational Data Mining (EDM), 2021.
Zhong, W., X. Zhang, J. Xin, R. Zanibbi, and J. Lin, "Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens", Conference and Labs of the Evaluation Forum (CLEF), 2021.
Brown, D. G., L. Byl, and M. Grossman, "Are Machine Learning Corpora "Fair Dealing" Under Canadian Law?", International Conference on Computational Creativity (ICCC), 2021.
Xin, J., R. Tang, Y. Yu, and J. Lin, "BERxiT: Early Exiting for BERT With Better Fine-Tuning and Extension To Regression", Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.
Alway, K., E. Blais, and S. Salihoglu, "Box Covers and Domain Orderings for Beyond Worst-Case Join Processing", International Conference on Database Theory (ICDT), 2021.
Zhang, E., S-C. Lin, J-H. Yang, R. Pradeep, R. Nogueira, and J. Lin, "Chatty Goose: A Python Framework for Conversational Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Zhang, X., A. Yates, and J. Lin, "Comparing Score Aggregation Approaches for Document Retrieval With Pretrained Transformers", European Conference on Information Retrieval (ECIR), 2021.
Lin, S-C., J-H. Yang, and J. Lin, "Contextualized Query Embeddings for Conversational Search", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Glasbergen, B., F. Wu, and K. Daudjee, "Dendrite: Bolt-on Adaptivity for Data Systems", ACM International Conference on Management of Data (SIGMOD), 2021.
Zhang, M., L. Tan, Z. Fu, K. Xiong, J. Lin, M. Li, and Z. Tu, "Don't Change Me! User-Controllable Selective Paraphrase Generation", Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.
Ng, Y. Ki, D. J. Fraser, B. Kassaie, and F. Tompa, "Dowsing for Answers to Math Questions: Ongoing Viability of Traditional MathIR", Conference and Labs of the Evaluation Forum (CLEF), 2021.
Ng, Y. Ki, D. J. Fraser, B. Kassaie, and F. Tompa, "Dowsing for Math Answers", Conference and Labs of the Evaluation Forum (CLEF), 2021.
Xia, S., B. Chang, K. Knopf, Y. He, Y. Tao, and X. He, "DPGraph: A Benchmark Platform for Differentially Private Graph Analysis", ACM International Conference on Management of Data (SIGMOD), 2021.
Kargar, M., L. Golab, D. Srivastava, J. Szlichta, and M. Zihayat, "Effective Keyword Search in Weighted Graphs (Extended Abstract)", IEEE International Conference on Data Engineering (ICDE), 2021.
Karegar, R., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Efficient Discovery of Approximate Order Dependencies", International Conference on Extending Database Technology (EDBT), 2021.
Hofstätter, S., S-C. Lin, J-H. Yang, J. Lin, and A. Hanbury, "Efficiently Teaching an Effective Dense Retriever With Balanced Topic Aware Sampling", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Clarke, C., C. Luo, and M. Smucker, "Evaluation Measures Based on Preference Graphs", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Golab, L., and D. Srivastava, "Exploring Data Using Pa Erns: A Survey and Open Problems", International Workshop on Data Warehousing and OLAP (DOLAP), 2021.
Jiang, K., R. Pradeep, and J. Lin, "Exploring Listwise Evidence Reasoning With T5 for Fact Verification", Association for Computational Linguistics (ACL), 2021.
Chen, H. H., S. Mohapatra, G. Michalopoulos, X. He, and I. McKillop, "Federated Deep Learning Architecture for Personalized Healthcare", Medical Informatics Europe (MIE), 2021.
Toman, D., and G. Weddell, "FO Rewritability for OMQ Using Beth Definability and Interpolation", International Workshop on Description Logics (DL), 2021.
Sahu, S., and S. Salihoglu, "Graphsurge: Graph Analytics on View Collections Using Differential Computation", ACM International Conference on Management of Data (SIGMOD), 2021.
Jiang, Z., R. Tang, J. Xin, and J. Lin, "How Does BERT Rerank Passages? An Attribution Analysis With Information Bottlenecks", Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2021.
Lin, S-C., J-H. Yang, and J. Lin, "In-Batch Negatives for Knowledge Distillation With Tightly-Coupled Teachers for Dense Retrieval", Workshop on Representation Learning for NLP (RepL4NLP), 2021.
Farhat, O., K. Daudjee, and L. Querzoni, "Klink: Progress-Aware Scheduling for Streaming Data Systems", ACM International Conference on Management of Data (SIGMOD), 2021.
Xia, S., N. Anzum, S. Salihoglu, and J. Zhao, "KTabulator: Interactive Ad Hoc Table Creation Using Knowledge Graphs", ACM Conference on Human Factors in Computing Systems (CHI), 2021.
Zhang, Y., C. Hu, Y. Liu, H. Fang, and J. Lin, "Learning to Rank in the Age of Muppets: Effectiveness-Efficiency Tradeoffs In Multi-Stage Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Craswell, N., B. Mitra, E. Yilmaz, D. Campos, and J. Lin, "MS MARCO: Benchmarking Ranking Models in the Large-Data Regime", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Li, M., M. Li, K. Xiong, and J. Lin, "Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Langendoen, K., B. Glasbergen, and K. Daudjee, "NIR-Tree: A Non-Intersecting R-Tree", International Conference on Statistical and Scientific Database Management (SSDBM), 2021.
Lin, J., X. Ma, J. Mackenzie, and A. Mallia, "On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2021.
Craswell, N., B. Mitra, E. Yilmaz, D. Campos, and J. Lin, "Overview of the TREC 2021 Deep Learning Track", Text Retrieval Conference (TREC), 2021.
Clarke, C., M. Maistro, and M. Smucker, "Overview of the TREC 2021 Health Misinformation Track", Text Retrieval Conference (TREC), 2021.
Shafieinejad, M., F. Kerschbaum, and I. Ilyas, "PCOR: Private Contextual Outlier Release via Differentially Private Search", ACM International Conference on Management of Data (SIGMOD), 2021.
He, X., J. Rogers, J. Bater, A. Machanavajjhala, C. Wang, and X. Wang, "Practical Security and Privacy for Database Systems", ACM International Conference on Management of Data (SIGMOD), 2021.
Arabzadeh, N., X. Yan, and C. Clarke, "Predicting Efficiency/Effectiveness Trade-Offs for Dense vs. Sparse Retrieval Strategy Selection", International Conference on Information and Knowledge Management (CIKM), 2021.
Yates, A., R. Nogueira, and J. Lin, "Pretrained Transformers for Text Ranking: BERT and Beyond", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Yates, A., R. Nogueira, and J. Lin, "Pretrained Transformers for Text Ranking: BERT and Beyond", Web Search and Data Mining (WSDM), 2021.
Toman, D., and G. Wedell, "Projective Beth Definability and Craig Interpolation for Relational Query Optimization (Material to Accompany Invited Talk)", International Conference on Principles of Knowledge Representation and Reasoning (KR), 2021.
Livshits, E., R. Kochirgan, S. Tsur, I. Ilyas, B. Kimelfeld, and S. Roy, "Properties of Inconsistency Measures for Databases", ACM International Conference on Management of Data (SIGMOD), 2021.
Zhong, W., and J. Lin, "PYA0: A Python Toolkit for Accessible Math-Aware Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Lin, J., X. Ma, S-C. Lin, J-H. Yang, R. Pradeep, and R. Nogueira, "Pyserini: A Python Toolkit for Reproducible Information Retrieval Research With Sparse and Dense Representations", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Anzum, N., and S. Salihoglu, "R2GSync and Edge Views: Practical RDBMS to GDBMS Synchronization", ACM International Conference on Management of Data (SIGMOD), 2021.
Odunayo, O., N. N. Sookoo, G. Bathla, A. Cavallin, B. D. Persaud, K. Szigeti, P. Van Cappellen, and J. Lin, "Rescuing Historical Climate Observations to Support Hydrological Research: A Case Study of Solar Radiation Data", ACM Symposium on Document Engineering (DocEng), 2021.
Nemec, J., H. Davoudi, L. Golab, M. Kargar, Y. Lytvyn, P. Mierzejewski, J. Szlichta, and M. Zihayat, "RW-Team: Robust Team Formation Using Random Walk", International Conference on Information and Knowledge Management (CIKM), 2021.
Pradeep, R., X. Ma, R. Frassetto Nogueira, and J. Lin, "Scientific Claim Verification With VerT5erini", International Workshop on Health Text Mining and Information Analysis (Louhi), 2021.
Bai, H., P. Shi, J. Lin, Y. Xie, L. Tan, K. Xiong, W. Gao, and M. Li, "Segatron: Segment-Aware Transformer for Language Modeling and Understanding", AAAI Conference on Artificial Intelligence (AAAI), 2021.
Bai, H., P. Shi, J. Lin, L. Tan, K. Xiong, W. Gao, J. Liu, and M. Li, "Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation With GPT2", Association for Computational Linguistics (ACL), 2021.
Anand, M., J. Zhang, S. Ding, J. Xin, and J. Lin, "Serverless BM25 Search and BERT Reranking", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2021.
Lin, J., D. Campos, N. Craswell, B. Mitra, and E. Yilmaz, "Significant Improvements Over the State of the Art? A Case Study Of the MS MARCO Document Ranking Leaderboard", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Ma, X., M. Li, K. Sun, J. Xin, and J. Lin, "Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Xin, J., R. Tang, Y. Yu, and J. Lin, "The Art of Abstention: Selective Prediction and Error Regularization For Natural Language Processing", Association for Computational Linguistics (ACL), 2021.
Han, X., Y. Liu, and J. Lin, "The Simplest Thing That Can Possibly Work: (Pseudo-)Relevance Feedback Via Text Classification", International Conference on the Theory of Information Retrieval (ICTIR), 2021.
Mitra, A., C. Gorenflo, L. Golab, and S. Keshav, "TimeFabric: Trusted Time for Permissioned Blockchains", International Symposium on Foundations and Applications of Blockchain (FAB) , 2021.
Deshmukh, A. Anand, Q. Zhang, M. Li, J. Lin, and L. Mou, "Unsupervised Chunking as Syntactic Structure Induction With a Knowledge-Transfer Approach", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Abualsaud, M., K. Ghajar, L. Nhi Phan Minh, D. Zhang, I. Xiangyi Chen, M. Smucker, and A. Vakili Tahami, "UWaterlooMDS at the TREC 2021 Health Misinformation Track", Text Retrieval Conference (TREC), 2021.
Pradeep, R., X. Ma, R. Nogueira, and J. Lin, "Vera: Prediction Techniques for Reducing Harmful Misinformation In Consumer Health Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2021.
Abualsaud, M., M. Smucker, and C. Clarke, "Visualizing Searcher Gaze Patterns", Conference on Human Information Interaction and Retrieval (CHIIR), 2021.
Tang, R., K. Kumar, K. Chalkley, J. Xin, L. Zhang, W. Li, G. Yang, Y. Mao, J. Shin, G. Craig Murray, et al., "Voice Query Auto Completion", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Yan, X., C. Clarke, and N. Arabzadeh, "WaterlooClarke at the TREC 2021 Conversational Assistant Track", Text Retrieval Conference (TREC), 2021.
Lin, J., "A Proposed Conceptual Framework for a Representational Approach To Information Retrieval", SIGIR Forum, vol. 55, issue 2, pp. 4:1--4:29, 2021.
Ma, X., K. Sun, R. Pradeep, and J. Lin, "A Replication Study of Dense Passage Retriever", ArXiv, vol. abs/2104.05740, 2021.
Chen, J., Y. Huang, M. Wang, S. Salihoglu, and K. Salem, "Accurate Summary-Based Cardinality Estimation Through the Lens Of Cardinality Estimation Graphs", ArXiv, vol. abs/2105.08878, 2021.
Clarke, C., A. Vtyurina, and M. Smucker, "Assessing Top- Preferences", ACM Transactions on Information Systems (TOIS), vol. 39, issue 3, pp. 33:1--33:21, 2021.
Liu, J., K. Knopf, Y. Tan, B. Ding, and X. He, "Catch a Blowfish Alive: A Demonstration of Policy-Aware Differential Privacy for Interactive Data Exploration", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 12, pp. 2859--2862, 2021.
Parsa, M. S., L. Golab, and S. Keshav, "Climate Action During COVID-19 Recovery and Beyond: A Twitter Text Mining Study", ArXiv, vol. abs/2105.12190, 2021.
Gupta, P., A. Mhedhbi, and S. Salihoglu, "Columnar Storage and List-Based Processing for Graph Database Management Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 11, pp. 2491--2504, 2021.
Lin, S-C., J-H. Yang, and J. Lin, "Contextualized Query Embeddings for Conversational Search", ArXiv, vol. abs/2104.08707, 2021.
Shi, P., R. Zhang, H. Bai, and J. Lin, "Cross-Lingual Training With Dense Retrieval for Document Retrieval", ArXiv, vol. abs/2109.01628, 2021.
Lin, S-C., and J. Lin, "Densifying Sparse Representations for Passage Retrieval by Representational Slicing", ArXiv, vol. abs/2112.04666, 2021.
Near, J. P., and X. He, "Differential Privacy for Databases", Foundations and Trends in Databases, vol. 11, issue 2, pp. 109--225, 2021.
Zheng, Z., L. Zheng, M. Alipour Langouri, F. Chiang, L. Golab, and J. Szlichta, "Discovery and Contextual Data Cleaning With Ontology Functional Dependencies", ArXiv, vol. abs/2105.08105, 2021.
Valduriez, P., R. Jiménez-Peris, and T. Ozsu, "Distributed Database Systems: The Case for NewSQL", Transactions on Large-Scale Data- and Knowledge-Centered Systems, vol. 48, pp. 1--15, 2021.
Wagh, S., X. He, A. Machanavajjhala, and P. Mittal, "DP-cryptography: Marrying Differential Privacy and Cryptography In Emerging Applications", Communications of the ACM, vol. 64, issue 2, pp. 84--93, 2021.
Karegar, R., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Efficient Discovery of Approximate Order Dependencies", ArXiv, vol. abs/2101.02174, 2021.
Hofstätter, S., S-C. Lin, J-H. Yang, J. Lin, and A. Hanbury, "Efficiently Teaching an Effective Dense Retriever With Balanced Topic Aware Sampling", ArXiv, vol. abs/2104.06967, 2021.
Suri, S., I. Ilyas, C. Ré, and T. Rekatsinas, "Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins", ArXiv, vol. abs/2106.01501, 2021.
Suri, S., I. Ilyas, C. Ré, and T. Rekatsinas, "Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins", Proceedings of the VLDB Endowment (PVLDB), vol. 15, issue 3, pp. 699--712, 2021.
Li, M., and J. Lin, "Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering", ArXiv, vol. abs/2110.01599, 2021.
Pacaci, A., A. Bonifati, and T. Ozsu, "Evaluating Complex Queries on Streaming Graphs", ArXiv, vol. abs/2101.12305, 2021.
Fritz, S., I. Milligan, N. Ruest, and J. Lin, "Fostering Community Engagement Through Datathon Events: The Archives Unleashed Experience", Digital Humanities Quarterly, vol. 15, issue 1, 2021.
Chen, Y., T. Ozsu, G. Xiao, Z. Tang, and K. Li, "GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra - Full Version", ArXiv, vol. abs/2106.14038, 2021.
Li, H., S. Zhuang, A. Mourad, X. Ma, J. Lin, and G. Zuccon, "Improving Query Representations for Dense Retrieval With Pseudo Relevance Feedback: A Reproducibility Study", ArXiv, vol. abs/2112.06400, 2021.
Gupta, P., A. Mhedhbi, and S. Salihoglu, "Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems", ArXiv, vol. abs/2103.02284, 2021.
Nogueira, R., Z. Jiang, and J. Lin, "Investigating the Limitations of the Transformers With Simple Arithmetic Tasks", ArXiv, vol. abs/2102.13019, 2021.
Ge, C., S. Mohapatra, X. He, and I. Ilyas, "Kamino: Constraint-Aware Differentially Private Data Synthesis", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 10, pp. 1886--1899, 2021.
Jin, G., and S. Salihoglu, "Making RDBMSs Efficient on Graph Workloads Through Predefined Joins", ArXiv, vol. abs/2108.10540, 2021.
Zhang, X., X. Ma, P. Shi, and J. Lin, "Mr. TyDi: A Multi-Lingual Benchmark for Dense Retrieval", ArXiv, vol. abs/2108.08787, 2021.
Craswell, N., B. Mitra, E. Yilmaz, D. Campos, and J. Lin, "MS MARCO: Benchmarking Ranking Models in the Large-Data Regime", ArXiv, vol. abs/2105.04021, 2021.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting", ACM Transactions on Information Systems (TOIS), vol. 39, issue 4, pp. 48:1--48:29, 2021.
Peng, P., Q. Ge, L. Zou, T. Ozsu, Z. Xu, and D. Zhao, "Optimizing Multi-Query Evaluation in Federated RDF Systems", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 33, issue 4, pp. 1692--1707, 2021.
Mhedhbi, A., C. Kankanamge, and S. Salihoglu, "Optimizing One-Time and Continuous Subgraph Queries Using Worst-Case Optimal Joins", ACM Transactions on Database Systems (TODS), vol. 46, issue 2, pp. 6:1--6:45, 2021.
Shafieinejad, M., F. Kerschbaum, and I. Ilyas, "PCOR: Private Contextual Outlier Release via Differentially Private Search", ArXiv, vol. abs/2103.05173, 2021.
Arabzadeh, N., X. Yan, and C. Clarke, "Predicting Efficiency/Effectiveness Trade-Offs for Dense vs. Sparse Retrieval Strategy Selection", ArXiv, vol. abs/2109.10739, 2021.
Lin, J., X. Ma, S-C. Lin, J-H. Yang, R. Pradeep, and R. Nogueira, "Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research With Sparse and Dense Representations", ArXiv, vol. abs/2102.10073, 2021.
Saxena, H., L. Golab, S. Idreos, and I. Ilyas, "Real-Time LSM-Trees for HTAP Workloads", ArXiv, vol. abs/2101.06801, 2021.
Kato, M. P., Y. Liu, N. Kando, and C. Clarke, "Report on the 15th Round of NII Testbeds and Community for Information Access Research (NTCIR-15)", SIGIR Forum, vol. 55, issue 2, pp. 21:1--21:6, 2021.
Sheshbolouki, A., and T. Ozsu, "Scale-Invariant Strength Assortativity of Streaming Butterflies", ArXiv, vol. abs/2111.12217, 2021.
Sheshbolouki, A., and T. Ozsu, "sGrapp: Butterfly Approximation in Streaming Graphs", ArXiv, vol. abs/2101.12334, 2021.
Arabzadeh, N., A. Vtyurina, X. Yan, and C. Clarke, "Shallow Pooling for Sparse Labels", ArXiv, vol. abs/2109.00062, 2021.
Lin, J., D. Campos, N. Craswell, B. Mitra, and E. Yilmaz, "Significant Improvements Over the State of the Art? A Case Study Of the MS MARCO Document Ranking Leaderboard", ArXiv, vol. abs/2102.12887, 2021.
Yang, J-H., X. Ma, and J. Lin, "Sparsifying Sparse Representations for Passage Retrieval by Top-K Masking", ArXiv, vol. abs/2112.09628, 2021.
Grossman, M., and G. Cormack, "The eDiscovery Medicine Show", ArXiv, vol. abs/2109.13908, 2021.
Pradeep, R., R. Nogueira, and J. Lin, "The Expando-Mono-Duo Design Pattern for Text Ranking With Pretrained Sequence-to-Sequence Models", ArXiv, vol. abs/2101.05667, 2021.
Sakr, S., A. Bonifati, H. Voigt, A. Iosup, K. Ammar, R. Angles, W. G. Aref, M. Arenas, M. Besta, P. A. Boncz, et al., "The Future Is Big Graphs: A Community View on Graph Processing Systems", Communications of the ACM, vol. 64, issue 9, pp. 62--71, 2021.
Gauch, M., J. Mai, and J. Lin, "The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction", Environmental Modelling and Software, vol. 135, pp. 104926, 2021.
Mohapatra, S., S. Sasy, X. He, G. Kamath, and O. Thakkar, "The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection", ArXiv, vol. abs/2111.04906, 2021.
Xue, H., F. D. Salim, Y. Ren, and C. Clarke, "Translating Human Mobility Forecasting Through Natural Language Generation", ArXiv, vol. abs/2112.11481, 2021.
Covington, C., X. He, J. Honaker, and G. Kamath, "Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy", ArXiv, vol. abs/2110.14465, 2021.
Mackenzie, J., A. Trotman, and J. Lin, "Wacky Weights in Learned Sparse Representations and the Revenge Of Score-at-a-Time Query Evaluation", ArXiv, vol. abs/2110.11540, 2021.

2020

Ozsu, T., and P. Valduriez, Principles of Distributed Database Systems, 4th Edition: Springer, 2020.
Kassaie, B., and F. Tompa, "A Framework for Extracted View Maintenance", ACM Symposium on Document Engineering (DocEng), 2020.
Yilmaz, Z. Akkalyoncu, C. Clarke, and J. Lin, "A Lightweight Environment for Learning Experimental IR Research Practices", International Conference on Research and Development in Information Retrieval (SIGIR), 2020.
Zhang, X., A. Yates, and J. Lin, "A Little Bit Is Worse Than None: Ranking With Limited Training Data", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Vtyurina, A., C. Clarke, E. Law, J. R. Trippas, and H. Bota, "A Mixed-Method Analysis of Text and Audio Search Interfaces With Varying Task Complexity", International Conference on the Theory of Information Retrieval (ICTIR), 2020.
Ghenai, A., M. Smucker, and C. Clarke, "A Think-Aloud Study to Understand Factors Affecting Online Health Search", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Gauch, M., J. Bai, J. Mai, and J. Lin, "An Open-Source Interface to the Canadian Surface Prediction Archive", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
Tu, Z., W. Yang, Z. Fu, Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin, "Approximate Nearest Neighbor Search and Lightweight Dense Vector Reranking In Multi-Stage Retrieval Architectures", International Conference on the Theory of Information Retrieval (ICTIR), 2020.
Wu, R., A. Zhang, I. Ilyas, and T. Rekatsinas, "Attention-Based Learning for Missing Data Imputation in HoloClean", Conference on Machine Learning and Systems (MLSys), 2020.
Yates, A., S. Arora, X. Zhang, W. Yang, K. Martin Jose, and J. Lin, "Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval", Web Search and Data Mining (WSDM), 2020.
Glasbergen, B., K. Langendoen, M. Abebe, and K. Daudjee, "ChronoCache: Predictive and Adaptive Mid-Tier Query Result Caching", ACM International Conference on Management of Data (SIGMOD), 2020.
Tao, Y., X. He, A. Machanavajjhala, and S. Roy, "Computing Local Sensitivities of Counting Queries With Joins", ACM International Conference on Management of Data (SIGMOD), 2020.
Agarwal, R. Raj, D. Kumar, L. Golab, and S. Keshav, "Consentio: Managing Consent to Data Access Using Permissioned Blockchains", IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2020.
Adewoye, T., X. Han, N. Ruest, I. Milligan, S. Fritz, and J. Lin, "Content-Based Exploration of Archival Images Using Neural Networks", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
Zhang, E., N. Gupta, R. Tang, X. Han, R. Pradeep, K. Lu, Y. Zhang, R. Nogueira, K. Cho, H. Fang, et al., "Covidex: Neural Ranking Models and Keyword Search Infrastructure For The COVID-19 Open Research Dataset", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Shi, P., H. Bai, and J. Lin, "Cross-Lingual Training of Neural Models for Document Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Chowdhury, A. Roy, C. Wang, X. He, A. Machanavajjhala, and S. Jha, "Crypt?: Crypto-Assisted Differential Privacy on Untrusted Servers", ACM International Conference on Management of Data (SIGMOD), 2020.
Ding, S., E. Zhang, and J. Lin, "Cydex: Neural Search Infrastructure for the Scholarly Literature", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Xin, J., R. Tang, J. Lee, Y. Yu, and J. Lin, "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference", Association for Computational Linguistics (ACL), 2020.
Yang, J-H., S-C. Lin, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Designing Templates for Eliciting Commonsense Knowledge From Pretrained Sequence-to-Sequence Models", International Conference on Computational Linguistics (COLING), 2020.
Xie, Y., W. Yang, L. Tan, K. Xiong, N. Jing Yuan, B. Huai, M. Li, and J. Lin, "Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering", The Web Conference (WWW), 2020.
Nogueira, R., Z. Jiang, R. Pradeep, and J. Lin, "Document Ranking With a Pretrained Sequence-to-Sequence Model", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Ng, Y. Ki, D. J. Fraser, B. Kassaie, G. Labahn, M. S. Marzouk, F. Tompa, and K. Wang, "Dowsing for Math Answers With Tangent-L", Conference and Labs of the Evaluation Forum (CLEF), 2020.
Abebe, M., B. Glasbergen, and K. Daudjee, "DynaMast: Adaptive Dynamic Mastering for Replicated Systems", IEEE International Conference on Data Engineering (ICDE), 2020.
Xin, J., R. Nogueira, Y. Yu, and J. Lin, "Early Exiting BERT for Efficient Document Ranking", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Zhang, X., T. Ozsu, and L. Chen, "ELite: Cost-Effective Approximation of Exploration-Based Graph Analysis", ACM International Conference on Management of Data (SIGMOD), 2020.
Szlichta, J., P. Godfrey, L. Golab, M. Kargar, and D. Srivastava, "Erratum for Discovering Order Dependencies Through Order Compatibility (Edbt 2019)", International Conference on Extending Database Technology (EDBT), 2020.
Nogueira, R., Z. Jiang, K. Cho, and J. Lin, "Evaluating Pretrained Transformer Models for Citation Recommendation", International Workshop on Bibliometric-enhanced Information Retrieval (BIR), 2020.
Adhikari, A., A. Ram, R. Tang, W. L. Hamilton, and J. Lin, "Exploring the Limits of Simple Learners in Knowledge Distillation For Document Classification With DocBERT", Workshop on Representation Learning for NLP (RepL4NLP), 2020.
Toman, D., and G. Weddell, "First Order Rewritability for Ontology Mediated Querying in Horn-DLFD", International Workshop on Description Logics (DL), 2020.
Yates, A., K. Martin Jose, X. Zhang, and J. Lin, "Flexible IR Pipelines With Capreolus", International Conference on Information and Knowledge Management (CIKM), 2020.
Grand, A., R. Muir, J. Ferenczi, and J. Lin, "From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance", European Conference on Information Retrieval (ECIR), 2020.
Yan, D., G. Guo, M. Mashiur Ra Chowdhury, T. Ozsu, W-S. Ku, and J. C. S. Lui, "G-Thinker: A Distributed Framework for Mining Subgraphs in a Big Graph", IEEE International Conference on Data Engineering (ICDE), 2020.
Lin, J., C. Zhong, D. Hu, C. Rudin, and M. I. Seltzer, "Generalized and Scalable Optimal Sparse Decision Trees", International Conference on Machine Learning (ICML), 2020.
Zeng, L., L. Zou, T. Ozsu, L. Hu, and F. Zhang, "GSI: GPU-friendly Subgraph Isomorphism", IEEE International Conference on Data Engineering (ICDE), 2020.
Pradeep, R., X. Ma, X. Zhang, H. Cui, R. Xu, R. Nogueira, and J. Lin, "H2oloo at TREC 2020: When All You Got Is a Hammer... Deep Learning, Health Misinformation, and Precision Medicine", Text Retrieval Conference (TREC), 2020.
Jiang, Z., R. Tang, J. Xin, and J. Lin, "Inserting Information Bottleneck for Attribution in Transformers", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Kumar, D., L. Mou, L. Golab, and O. Vechtomova, "Iterative Edit-Based Unsupervised Sentence Simplification", Association for Computational Linguistics (ACL), 2020.
Farhat, O., H. Bindra, and K. Daudjee, "Leaving Stragglers at the Window: Low-Latency Stream Sampling With Accuracy Guarantees", Distributed Event-Based Systems (DEBS), 2020.
Xiang, Z., B. Ding, X. He, and J. Zhou, "Linear and Range Counting Under Metric-Based Local Differential Privacy", International Symposium on Information Theory (ISIT), 2020.
Agarwal, R. Raj, R. Cohen, L. Golab, and A. Tsang, "Locating Influential Agents in Social Networks: Budget-Constrained Seed Set Selection", Canadian Conference on Artificial Intelligence (AI), 2020.
Buchanan, G., D. McKay, C. Clarke, L. Azzopardi, and J. R. Trippas, "Made to Measure: A Workshop on Human-Centred Metrics for Information Seeking", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Li, Q., T. Ozsu, and H. Xiong, "Message From the General Chairs of DSC 2020", International Conference on Data Science in Cyberspace (DSC), 2020.
Grossman, M., G. Cormack, and B'. Pham, "MRG_UWaterloo Participation in the TREC 2020 Precision Medicine Track", Text Retrieval Conference (TREC), 2020.
Clarke, C., M. Smucker, and A. Vtyurina, "Offline Evaluation by Maximum Similarity to an Ideal Ranking", International Conference on Information and Knowledge Management (CIKM), 2020.
Clarke, C., A. Vtyurina, and M. Smucker, "Offline Evaluation Without Gain", International Conference on the Theory of Information Retrieval (ICTIR), 2020.
Clarke, C., S. Rizvi, M. Smucker, M. Maistro, and G. Zuccon, "Overview of the TREC 2020 Health Misinformation Track", Text Retrieval Conference (TREC), 2020.
Meng, X., and L. Golab, "Parallel Scheduling of Data-Intensive Tasks", European Conference on Parallel Processing (Euro-Par), 2020.
Khan, A., and L. Golab, "Reddit Mining to Understand Gendered Movements", International Conference on Extending Database Technology (EDBT), 2020.
Jacobs, A., S. Chopra, and L. Golab, "Reddit Mining to Understand Women's Issues in STEM", International Conference on Extending Database Technology (EDBT), 2020.
Pacaci, A., A. Bonifati, and T. Ozsu, "Regular Path Query Evaluation on Streaming Graphs", ACM International Conference on Management of Data (SIGMOD), 2020.
Lin, J., and Q. Zhang, "Reproducibility Is a Process, Not an Achievement: The Replicability Of IR Reproducibility Experiments", European Conference on Information Retrieval (ECIR), 2020.
Guo, R. Benson, and K. Daudjee, "Research Challenges in Deep Reinforcement Learning-Based Join Query Optimization", ACM International Conference on Management of Data (SIGMOD), 2020.
Mior, M. J., and K. Salem, "ReSpark: Automatic Caching for Iterative Applications in Apache Spark", IEEE International Conference on Big Data (IEEE BigData), 2020.
Glasbergen, B., M. Abebe, K. Daudjee, D. Vogel, and J. Zhao, "Sentinel: Understanding Data Systems", ACM International Conference on Management of Data (SIGMOD), 2020.
Tang, R., J. Lee, J. Xin, X. Liu, Y. Yu, and J. Lin, "Showing Your Work Doesn't Always Work", Association for Computational Linguistics (ACL), 2020.
Satuluri, V., Y. Wu, X. Zheng, Y. Qian, B. Wichers, Q. Dai, G. Ming Tang, J. Jiang, and J. Lin, "SimClusters: Community-Based Representations for Heterogeneous Recommendations At Twitter", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020.
Parsa, M. S., and L. Golab, "Social Media Mining to Understand the Impact of Co-Operative Education On Mental Health", Educational Data Mining (EDM), 2020.
Ozsu, T., "Streaming Graph Processing and Analytics", Distributed Event-Based Systems (DEBS), 2020.
Lin, J., J. M. Mackenzie, C. Kamphuis, C. Macdonald, A. Mallia, M. Siedlaczek, A. Trotman, and A. P. de Vries, "Supporting Interoperability Between Open-Source Search Engines With The Common Index File Format", International Conference on Research and Development in Information Retrieval (SIGIR), 2020.
Naseem, S. Saad, D. Kumar, M. S. Parsa, and L. Golab, "Text Mining of COVID-19 Discussions on Reddit", IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2020.
Ruest, N., J. Lin, I. Milligan, and S. Fritz, "The Archives Unleashed Project: Technology, Process, and Community To Improve Scholarly Access to Web Archives", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020.
Lin, S-C., J-H. Yang, and J. Lin, "TREC 2020 Notebook: CAsT Track", Text Retrieval Conference (TREC), 2020.
Shahidi, H., M. Li, and J. Lin, "Two Birds, One Stone: A Simple, Unified Model for Text Generation From Structured and Unstructured Data", Association for Computational Linguistics (ACL), 2020.
Sequiera, R., L. Tan, Y. Zhang, and J. Lin, "Update Delivery Mechanisms for Prospective Information Needs: A Reproducibility Study", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Arabzadeh, N., and C. Clarke, "WaterlooClarke at the Trec 2020 Conversational Assistant Track", Text Retrieval Conference (TREC), 2020.
Lin, J., I. Milligan, D. W. Oard, N. Ruest, and K. Shilton, "We Could, but Should We?: Ethical Considerations for Providing Access To GeoCities and Other Historical Digital Collections", Conference on Human Information Interaction and Retrieval (CHIIR), 2020.
Kamphuis, C., A. P. de Vries, L. Boytsov, and J. Lin, "Which BM25 Do You Mean? A Large-Scale Reproducibility Study Of Scoring Variants", European Conference on Information Retrieval (ECIR), 2020.
Gorenflo, C., L. Golab, and S. Keshav, "XOX Fabric: A Hybrid Approach to Blockchain Transaction Execution", IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2020.
Gauch, M., and J. Lin, "A Data Scientist's Guide to Streamflow Prediction", ArXiv, vol. abs/2006.12975, 2020.
Lin, J., "A Prototype of Serverless Lucene", ArXiv, vol. abs/2002.01447, 2020.
Ozsu, T., "A Systematic View of Data Science", IEEE Data Engineering Bulletin, vol. 43, issue 3, pp. 3--11, 2020.
Mhedhbi, A., P. Gupta, S. Khaliq, and S. Salihoglu, "A+ Indexes: Lightweight and Highly Flexible Adjacency Lists For Graph Database Management Systems", ArXiv, vol. abs/2004.00130, 2020.
Chen, Y., G. Xiao, T. Ozsu, C. Liu, A. Y. Zomaya, and T. Li, "aeSpTV: An Adaptive and Efficient Framework for Sparse Tensor-Vector Product Kernel on a High-Performance Computing Platform", IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 31, issue 10, pp. 2329--2345, 2020.
Livshits, E., A. Heidari, I. Ilyas, and B. Kimelfeld, "Approximate Denial Constraints", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 10, pp. 1682--1695, 2020.
Livshits, E., A. Heidari, I. Ilyas, and B. Kimelfeld, "Approximate Denial Constraints", ArXiv, vol. abs/2005.08540, 2020.
Clarke, C., A. Vtyurina, and M. Smucker, "Assessing Top-K Preferences", ArXiv, vol. abs/2007.11682, 2020.
Oliveira, P. H., D. S. Kaster, C. Traina, Jr., and I. Ilyas, "Batchwise Probabilistic Incremental Data Cleaning", ArXiv, vol. abs/2011.04730, 2020.
Fritz, S., I. Milligan, N. Ruest, and J. Lin, "Building Community at Distance: A Datathon During COVID-19", Digital Library Perspectives, vol. 36, issue 4, pp. 415--428, 2020.
Khan, A., L. Golab, M. Kargar, J. Szlichta, and M. Zihayat, "Compact Group Discovery in Attributed Graphs and Social Networks", Information Processing and Management, vol. 57, issue 2, pp. 102054, 2020.
Tao, Y., X. He, A. Machanavajjhala, and S. Roy, "Computing Local Sensitivities of Counting Queries With Joins", ArXiv, vol. abs/2004.04656, 2020.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Conversational Question Reformulation via Sequence-to-Sequence Architectures And Pretrained Language Models", ArXiv, vol. abs/2004.01909, 2020.
Zhang, E., N. Gupta, R. Tang, X. Han, R. Pradeep, K. Lu, Y. Zhang, R. Nogueira, K. Cho, H. Fang, et al., "Covidex: Neural Ranking Models and Keyword Search Infrastructure For The COVID-19 Open Research Dataset", ArXiv, vol. abs/2007.07846, 2020.
Xin, J., R. Tang, J. Lee, Y. Yu, and J. Lin, "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference", ArXiv, vol. abs/2004.12993, 2020.
Kassaie, B., and F. Tompa, "Detecting Opportunities for Differential Maintenance of Extracted Views", ArXiv, vol. abs/2007.01973, 2020.
Karegar, R., M. Mirsafian, P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Discovering Domain Orders Through Order Dependencies", ArXiv, vol. abs/2005.14068, 2020.
Lin, S-C., J-H. Yang, and J. Lin, "Distilling Dense Representations for Ranking Using Tightly-Coupled Teachers", ArXiv, vol. abs/2010.11386, 2020.
Nogueira, R., Z. Jiang, and J. Lin, "Document Ranking With a Pretrained Sequence-to-Sequence Model", ArXiv, vol. abs/2003.06713, 2020.
Wagh, S., X. He, A. Machanavajjhala, and P. Mittal, "DP-Cryptography: Marrying Differential Privacy and Cryptography In Emerging Applications", ArXiv, vol. abs/2004.08887, 2020.
Zhang, H., G. Cormack, M. Grossman, and M. Smucker, "Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval", Information Retrieval Journal, vol. 23, issue 1, pp. 1--26, 2020.
Gorenflo, C., S. Lee, L. Golab, and S. Keshav, "FastFabric: Scaling Hyperledger Fabric to 20 000 Transactions Per Second", International Journal of Network Management, vol. 30, issue 5, 2020.
Lin, J., C. Zhong, D. Hu, C. Rudin, and M. I. Seltzer, "Generalized Optimal Sparse Decision Trees", ArXiv, vol. abs/2006.08690, 2020.
Sahu, S., and S. Salihoglu, "Graphsurge: Graph Analytics on View Collections Using Differential Computation", ArXiv, vol. abs/2004.05297, 2020.
Tang, R., J. Lee, A. Razi, J. Cambre, I. Bicking, J. Kaye, and J. Lin, "Howl: A Deployed, Open-Source Wake Word Detection System", ArXiv, vol. abs/2008.09606, 2020.
Jiang, Z., R. Tang, J. Xin, and J. Lin, "Inserting Information Bottlenecks for Attribution in Transformers", ArXiv, vol. abs/2012.13838, 2020.
Chen, S., P. K. Chrysanthis, K. Daudjee, M. Hsu, and M. Sadoghi, "Introduction to the Special Issue on Self-Managing and Hardware-Optimized Database Systems 2019", Distributed and Parallel Databases, vol. 38, issue 4, pp. 767--769, 2020.
Kumar, D., L. Mou, L. Golab, and O. Vechtomova, "Iterative Edit-Based Unsupervised Sentence Simplification", ArXiv, vol. abs/2006.09639, 2020.
Ge, C., S. Mohapatra, X. He, and I. Ilyas, "Kamino: Constraint-Aware Differentially Private Data Synthesis", ArXiv, vol. abs/2012.15713, 2020.
Li, M., H. Bai, L. Tan, K. Xiong, M. Li, and J. Lin, "Latte-Mix: Measuring Sentence Semantic Similarity With Latent Categorical Mixtures", ArXiv, vol. abs/2010.11351, 2020.
Chen, L., and L. Golab, "Micro-Journal Mining to Understand Mood Triggers", Computing, vol. 102, issue 5, pp. 1227--1244, 2020.
Abebe, M., B. Glasbergen, and K. Daudjee, "MorphoSys: Automatic Physical Design Metamorphosis for Distributed Database Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 13, pp. 3573--3587, 2020.
Nogueira, R., Z. Jiang, K. Cho, and J. Lin, "Navigation-Based Candidate Expansion and Pretrained Language Models For Citation Recommendation", Scientometrics, vol. 125, issue 3, pp. 3001--3016, 2020.
Nogueira, R., Z. Jiang, K. Cho, and J. Lin, "Navigation-Based Candidate Expansion and Pretrained Language Models For Citation Recommendation", ArXiv, vol. abs/2001.08687, 2020.
Heidari, A., S. Kushagra, and I. Ilyas, "On Sampling From Data With Duplicate Records", ArXiv, vol. abs/2008.10549, 2020.
Wang, X-J., M. Grossman, and S. Gyu Hyun, "Participation in TREC 2020 COVID Track Using Continuous Active Learning", ArXiv, vol. abs/2011.01453, 2020.
Lin, J., R. Nogueira, and A. Yates, "Pretrained Transformers for Text Ranking: BERT and Beyond", ArXiv, vol. abs/2010.06467, 2020.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "Query Reformulation Using Query History for Passage Retrieval in Conversational Search", ArXiv, vol. abs/2005.02230, 2020.
Gauch, M., F. Kratzert, D. Klotz, G. Nearing, J. Lin, and S. Hochreiter, "Rainfall-Runoff Prediction at Multiple Timescales With a Single Long Short-Term Memory Network", ArXiv, vol. abs/2010.07921, 2020.
Zhang, R., W. Yang, L. Lin, Z. Tu, Y. Xie, Z. Fu, Y. Xie, L. Tan, K. Xiong, and J. Lin, "Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents", ArXiv, vol. abs/2002.01861, 2020.
Tang, R., R. Nogueira, E. Zhang, N. Gupta, P. Cam, K. Cho, and J. Lin, "Rapidly Bootstrapping a Question Answering Dataset for COVID-19", ArXiv, vol. abs/2004.11339, 2020.
Zhang, E., N. Gupta, R. Nogueira, K. Cho, and J. Lin, "Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned", ArXiv, vol. abs/2004.05125, 2020.
Heidari, A., G. Michalopoulos, S. Kushagra, I. Ilyas, and T. Rekatsinas, "Record Fusion: A Learning Approach", ArXiv, vol. abs/2006.10208, 2020.
Pacaci, A., A. Bonifati, and T. Ozsu, "Regular Path Query Evaluation on Streaming Graphs", ArXiv, vol. abs/2004.02012, 2020.
Bryson, S., H. Davoudi, L. Golab, M. Kargar, Y. Lytvyn, P. Mierzejewski, J. Szlichta, and M. Zihayat, "Robust Keyword Search in Large Attributed Graphs", Information Retrieval Journal, vol. 23, issue 5, pp. 502--524, 2020.
Bater, J., Y. Park, X. He, X. Wang, and J. Rogers, "SAQE: Practical Privacy-Preserving Approximate Query Processing For Data Federations", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 11, pp. 2691--2705, 2020.
Guo, G., D. Yan, T. Ozsu, Z. Jiang, and J. Khalil, "Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach", Proceedings of the VLDB Endowment (PVLDB), vol. 14, issue 4, pp. 573--585, 2020.
Guo, G., D. Yan, T. Ozsu, and Z. Jiang, "Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach", ArXiv, vol. abs/2005.00081, 2020.
Pradeep, R., X. Ma, R. Nogueira, and J. Lin, "Scientific Claim Verification With VERT5ERINI", ArXiv, vol. abs/2010.11930, 2020.
Bai, H., P. Shi, J. Lin, L. Tan, K. Xiong, W. Gao, and M. Li, "SegaBERT: Pre-Training of Segment-Aware BERT for Language Understanding", ArXiv, vol. abs/2004.14996, 2020.
Bai, H., P. Shi, J. Lin, L. Tan, K. Xiong, W. Gao, J. Liu, and M. Li, "Semantics of the Unwritten", ArXiv, vol. abs/2004.02251, 2020.
Glasbergen, B., M. Abebe, K. Daudjee, and A. Levi, "Sentinel: Universal Analysis and Insight for Data Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 11, pp. 2720--2733, 2020.
Tang, R., J. Lee, J. Xin, X. Liu, Y. Yu, and J. Lin, "Showing Your Work Doesn't Always Work", ArXiv, vol. abs/2004.13705, 2020.
Salem, K., "Special Issue on Best Papers of DaMoN 2018", The VLDB Journal, vol. 29, issue 2-3, pp. 755, 2020.
Boncz, P. A., and K. Salem, "Special Issue on Best Papers of VLDB 2017", The VLDB Journal, vol. 29, issue 1, pp. 483--484, 2020.
Lin, J., J. M. Mackenzie, C. Kamphuis, C. Macdonald, A. Mallia, M. Siedlaczek, A. Trotman, and A. P. de Vries, "Supporting Interoperability Between Open-Source Search Engines With The Common Index File Format", ArXiv, vol. abs/2003.08276, 2020.
Ruest, N., J. Lin, I. Milligan, and S. Fritz, "The Archives Unleashed Project: Technology, Process, and Community To Improve Scholarly Access to Web Archives", ArXiv, vol. abs/2001.05399, 2020.
Sakr, S., A. Bonifati, H. Voigt, A. Iosup, K. Ammar, R. Angles, W. G. Aref, M. Arenas, M. Besta, P. A. Boncz, et al., "The Future Is Big Graphs! A Community View on Graph Processing Systems", ArXiv, vol. abs/2012.06171, 2020.
Sahu, S., A. Mhedhbi, S. Salihoglu, J. Lin, and T. Ozsu, "The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey", The VLDB Journal, vol. 29, issue 2-3, pp. 595--618, 2020.
Zhang, M., L. Tan, Z. Tu, Z. Fu, K. Xiong, M. Li, and J. Lin, "To Paraphrase or Not to Paraphrase: User-Controllable Selective Paraphrase Generation", ArXiv, vol. abs/2008.09290, 2020.
Lin, S-C., J-H. Yang, R. Nogueira, M-F. Tsai, C-J. Wang, and J. Lin, "TTTTTackling WinoGrande Schemas", ArXiv, vol. abs/2003.08380, 2020.
Toman, D., and G. Weddell, "Using Feature-Based Description Logics to Avoid Duplicate Elimination In Object-Relational Query Languages", German Journal of Artificial Intelligence (KI), vol. 34, issue 3, pp. 355--363, 2020.

2019

Ilyas, I., and X. Chu, Data Cleaning: ACM, 2019.
Ilyas, I., "Data Unification at Scale: Data Tamer", Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker: ACM / Morgan & Claypool, 2019.
Salihoglu, S., and N. Yakovets, "Graph Query Processing", Encyclopedia of Big Data Technologies: Springer, 2019.
Golab, L., "Types of Stream Processing Algorithms", Encyclopedia of Big Data Technologies: Springer, 2019.
De Sa, C., I. Ilyas, B. Kimelfeld, C. Ré, and T. Rekatsinas, "A Formal Framework for Probabilistic Unclean Databases", International Conference on Database Theory (ICDT), 2019.
Kushagra, S., H. Saxena, I. Ilyas, and S. Ben-David, "A Semi-Supervised Framework of Clustering Selection for De-Duplication", IEEE International Conference on Data Engineering (ICDE), 2019.
Yang, H-W., Y. Zou, P. Shi, W. Lu, J. Lin, and X. Sun, "Aligning Cross-Lingual Entities With Multi-Aspect Information", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Ge, C., X. He, I. Ilyas, and A. Machanavajjhala, "APEx: Accuracy-Aware Differentially Private Data Exploration", ACM International Conference on Management of Data (SIGMOD), 2019.
Yilmaz, Z. Akkalyoncu, S. Wang, W. Yang, H. Zhang, and J. Lin, "Applying BERT to Document Retrieval With Birch", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Heidari, A., I. Ilyas, and T. Rekatsinas, "Approximate Inference in Structured Instances With Noisy Categorical Observations", Conference on Uncertainty in Artificial Intelligence (UAI), 2019.
Rao, J., L. Liu, Y. Tay, H-W. Yang, P. Shi, and J. Lin, "Bridging the Gap Between Relevance Matching and Semantic Matching For Short Text Similarity Modeling", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Davoudi, H., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Bring Order to Data", Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), 2019.
Milligan, I., N. Casemajor, S. Fritz, J. Lin, N. Ruest, M. S. Weber, and N. Worby, "Building Community and Tools for Analyzing Web Archives Through Datathons", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Ilyas, I., "Building Scalable Machine Learning Solutions for Data Cleaning", Datenbanksysteme für Business, Technologie und Web(BTW), 2019.
Türe, F., J. Rao, R. Tang, and J. Lin, "Challenges and Opportunities in Understanding Spoken Queries Directed At Modern Entertainment Platforms", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, W., K. Lu, P. Yang, and J. Lin, "Critically Examining the "Neural Hype": Weak Baselines and the Additivity Of Effectiveness Gains From Neural Ranking Models", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yilmaz, Z. Akkalyoncu, W. Yang, H. Zhang, and J. Lin, "Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Neumann, T., and K. Salem, "DaMoN 19: The 15th International Workshop on Data Management on New Hardware", ACM International Conference on Management of Data (SIGMOD), 2019.
Yang, W., L. Tan, C. Lu, A. Cui, H. Li, X. Chen, K. Xiong, M. Wang, M. Li, J. Pei, et al., "Detecting Customer Complaint Escalation With Recurrent Neural Networks And Manually-Engineered Features", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Saxena, H., L. Golab, and I. Ilyas, "Distributed Discovery of Functional Dependencies", IEEE International Conference on Data Engineering (ICDE), 2019.
Alonso, G., C. Binnig, I. Pandis, K. Salem, J. Skrzypczak, R. Stutsman, L. Thostrup, T. Wang, Z. Wang, and T. Ziegler, "DPI: The Data Processing Interface for Modern Networks", Conference on Innovative Data Systems Research (CIDR), 2019.
Cormack, G., H. Zhang, N. Ghelani, M. Abualsaud, M. Smucker, M. Grossman, S. Rahbariasl, and A. Ghenai, "Dynamic Sampling Meets Pooling", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, W., Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin, "End-to-End Open-Domain Question Answering With BERTserini", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Toman, D., and G. Weddell, "Exhaustive Query Answering via Referring Expressions", International Workshop on Description Logics (DL), 2019.
Pacaci, A., and T. Ozsu, "Experimental Analysis of Streaming Algorithms for Graph Partitioning", ACM International Conference on Management of Data (SIGMOD), 2019.
Le Guilly, M., J-M. Petit, V-M. Scuturici, and I. Ilyas, "ExplIQuE: Interactive Databases Exploration With SQL", International Conference on Information and Knowledge Management (CIKM), 2019.
Gorenflo, C., S. Lee, L. Golab, and S. Keshav, "FastFabric: Scaling Hyperledger Fabric to 20, 000 Transactions Per Second", IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2019.
Toman, D., and G. Weddell, "Finding ALL Answers to OBDA Queries Using Referring Expressions", Australian Joint Conference on Artificial Intelligence (AUS-AI), 2019.
McIntyre, S., D. Toman, and G. Weddell, "FunDL - A Family of Feature-Based Description Logics, With Applications In Querying Structured Data Sources", Description Logic, Theory Combination, and All That - Essays Dedicated to Franz Baader, 2019.
Chopra, S., A. Khan, M. Mirsafian, and L. Golab, "Gender Differences in Science and Engineering: A Data Mining Approach", International Conference on Extending Database Technology (EDBT), 2019.
Chopra, S., A. Khan, M. Mirsafian, and L. Golab, "Gender Differences in Work-Integrated Learning Assessments", Educational Data Mining (EDM), 2019.
Anzum, N., S. Salihoglu, and D. Vogel, "GraphWrangler: An Interactive Graph View on Relational Data", ACM International Conference on Management of Data (SIGMOD), 2019.
Heidari, A., J. McGrath, I. Ilyas, and T. Rekatsinas, "HoloDetect: Few-Shot Learning for Error Detection", ACM International Conference on Management of Data (SIGMOD), 2019.
Lee, J., R. Tang, and J. Lin, "Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
McCoy, A. B., D. F. Sittig, J. Lin, and A. Wright, "Identification and Ranking of Biomedical Informatics Researcher Citation Statistics Through a Google Scholar Scraper", American Medical Informatics Association Annual Symposium (AMIA), 2019.
Toman, D., and G. Weddell, "Identity Resolution in Ontology Based Data Access to Structured Data Sources", Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2019.
Liu, L., W. Yang, J. Rao, R. Tang, and J. Lin, "Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Clancy, R., J. Lee, Z. Akkalyoncu Yilmaz, and J. Lin, "Information Retrieval Meets Scalable Text Analytics: Solr Integration With Spark", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Vollmer, M., L. Golab, K. Böhm, and D. Srivastava, "Informative Summarization of Numeric Data", International Conference on Statistical and Scientific Database Management (SSDBM), 2019.
Clarke, C., "Length Normalization in the Era of Neural Rankers", International Workshop on Evaluating Information Access (EVIA), 2019.
Gorenflo, C., L. Golab, and S. Keshav, "Mitigating Trust Issues in Electric Vehicle Charging Using a Blockchain", Energy-Efficient Computing and Networking (e-Energy), 2019.
Rao, J., W. Yang, Y. Zhang, F. Türe, and J. Lin, "Multi-Perspective Relevance Matching With Hierarchical ConvNets For Social Media Search", AAAI Conference on Artificial Intelligence (AAAI), 2019.
Tang, R., Y. Lu, and J. Lin, "Natural Language Generation for Effective Knowledge Distillation", Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing (DeepLo), 2019.
McIntyre, S., A. Borgida, D. Toman, and G. Weddell, "On Limited Conjunctions and Partial Features in Parameter-Tractable Feature Logics", AAAI Conference on Artificial Intelligence (AAAI), 2019.
Borgida, A., D. Toman, and G. Weddell, "On Special Description Logics for Processes and Plans", International Workshop on Description Logics (DL), 2019.
Kumar, D., R. Cohen, and L. Golab, "Online Abuse Detection: The Value of Preprocessing and Neural Attention Models", Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), 2019.
Clancy, R., N. Ferro, C. Hauff, J. Lin, T. Sakai, and Z. Zhong Wu, "Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Abualsaud, M., and M. Smucker, "Patterns of Search Result Examination: Query to First Action", International Conference on Information and Knowledge Management (CIKM), 2019.
Kassaie, B., and F. Tompa, "Predictable and Consistent Information Extraction", ACM Symposium on Document Engineering (DocEng), 2019.
Rogers, J., J. Bater, X. He, A. Machanavajjhala, M. Suresh, and X. Wang, "Privacy Changes Everything", Very Large Data Bases Conference (VLDB), 2019.
Cormack, G., and M. Grossman, "Quantifying Bias and Variance of System Rankings", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, J-H., S-C. Lin, C-J. Wang, J. Lin, and M-F. Tsai, "Query and Answer Expansion From Conversation History", Text Retrieval Conference (TREC), 2019.
Yang, P., and J. Lin, "Reproducing and Generalizing Semantic Term Matching in Axiomatic Information Retrieval", European Conference on Information Retrieval (ECIR), 2019.
Adhikari, A., A. Ram, R. Tang, and J. Lin, "Rethinking Complex Neural Network Architectures for Document Classification", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Yang, H-W., L. Liu, I. Milligan, N. Ruest, and J. Lin, "Scalable Content-Based Analysis of Images in Web Archives With TensorFlow And the Archives Unleashed Toolkit", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Kushagra, S., S. Ben-David, and I. Ilyas, "Semi-Supervised Clustering for De-Duplication", International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
Kazhamiaka, M., B. Naveed Memon, C. Kankanamge, S. Sahu, S. Rizvi, B. Wong, and K. Daudjee, "Sift: Resource-Efficient Consensus With RDMA", Conference on Emerging Network Experiment and Technology (CoNEXT), 2019.
Shi, P., J. Rao, and J. Lin, "Simple Attention-Based Representation Learning for Ranking Short Social Media Posts", North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Yu, R., Y. Xie, and J. Lin, "Simple Techniques for Cross-Collection Relevance Feedback", European Conference on Information Retrieval (ECIR), 2019.
Clancy, R., T. Eskildsen, N. Ruest, and J. Lin, "Solr Integration in the Anserini Information Retrieval Toolkit", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yan, D., G. Guo, M. Mashiur Ra Chowdhury, T. Ozsu, J. C. S. Lui, and W. Tan, "T-Thinker: A Task-Centric Distributed Framework for Compute-Intensive Divide-and-Conquer Algorithms", ACM Symposium on Principles & Practice of Parallel Programming (PPoPP), 2019.
Deschamps, R., N. Ruest, J. Lin, S. Fritz, and I. Milligan, "The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration of Web Archives", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Deschamps, R., S. Fritz, J. Lin, I. Milligan, and N. Ruest, "The Cost of a WARC: Analyzing Web Archives in the Cloud", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Lin, J., and P. Yang, "The Impact of Score Ties on Repeatability in Document Ranking", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Clancy, R., N. Ferro, C. Hauff, J. Lin, T. Sakai, and Z. Zhong Wu, "The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Li, Y., L. Zou, T. Ozsu, and D. Zhao, "Time Constrained Continuous Subgraph Search Over Streaming Graphs", IEEE International Conference on Data Engineering (ICDE), 2019.
Rahbariasl, S., and M. Smucker, "Time-Limits and Summaries for Faster Relevance Assessing", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Cormack, G., and M. Grossman, "Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Lee, J., R. Tang, and J. Lin, "Universal Voice-Enabled User Interfaces Using JavaScript", International Conference on Intelligent User Interfaces (IUI), 2019.
Clancy, R., Z. Akkalyoncu Yilmaz, Z. Zhong Wu, and J. Lin, "University of Waterloo Docker Images for OSIRRC at SIGIR 2019", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Deng, D., W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, G. Li, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Unsupervised String Transformation Learning for Entity Consolidation", IEEE International Conference on Data Engineering (ICDE), 2019.
Abualsaud, M., F. C. Beylunioglu, M. Smucker, and R. P. Duimering, "UWaterlooMDS at the TREC 2019 Decision Track", Text Retrieval Conference (TREC), 2019.
Ruest, N., I. Milligan, and J. Lin, "Warclight: A Rails Engine for Web Archive Discovery", ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019.
Abebe, M., B. Glasbergen, and K. Daudjee, "WatDFS: A Project for Understanding Distributed Systems in the Undergraduate Curriculum", Technical Symposium on Computer Science Education (SIGCSE), 2019.
Clarke, C., "WaterlooClarke at the TREC 2019 Conversational Assistant Track", Text Retrieval Conference (TREC), 2019.
Xin, J., J. Lin, and Y. Yu, "What Part of the Neural Network Does This? Understanding LSTMs By Measuring and Dissecting Neurons", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Tang, R., F. Türe, and J. Lin, "Yelling at Your TV: An Analysis of Speech Recognition Errors And Subsequent User Behavior on Entertainment Systems", International Conference on Research and Development in Information Retrieval (SIGIR), 2019.
Yang, H-W., Y. Zou, P. Shi, W. Lu, J. Lin, and X. Sun, "Aligning Cross-Lingual Entities With Multi-Aspect Information", ArXiv, vol. abs/1910.06575, 2019.
Heidari, A., I. Ilyas, and T. Rekatsinas, "Approximate Inference in Structured Instances With Noisy Categorical Observations", ArXiv, vol. abs/1907.00141, 2019.
Liu, L., H. Wang, J. Lin, R. Socher, and C. Xiong, "Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation For Pretrained Models", ArXiv, vol. abs/1911.03588, 2019.
Alway, K., E. Blais, and S. Salihoglu, "Box Covers and Domain Orderings for Beyond Worst-Case Join Processing", ArXiv, vol. abs/1909.12102, 2019.
Aluç, G., T. Ozsu, and K. Daudjee, "Building Self-Clustering RDF Databases Using Tunable-LSH", The VLDB Journal, vol. 28, issue 2, pp. 173--195, 2019.
Agarwal, R. Raj, D. Kumar, L. Golab, and S. Keshav, "Consentio: Managing Consent to Data Access Using Permissioned Blockchains", ArXiv, vol. abs/1910.07110, 2019.
Zhang, X., and T. Ozsu, "Correlation Constraint Shortest Path Over Large Multi-Relation Graphs", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 5, pp. 488--501, 2019.
Shi, P., and J. Lin, "Cross-Lingual Relevance Transfer for Document Retrieval", ArXiv, vol. abs/1911.02989, 2019.
Ehsan, N., A. Shakery, and F. Tompa, "Cross-Lingual Text Alignment for Fine-Grained Plagiarism Detection", Journal of Information Science, vol. 45, issue 4, 2019.
Yang, W., Y. Xie, L. Tan, K. Xiong, M. Li, and J. Lin, "Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering", ArXiv, vol. abs/1904.06652, 2019.
Xiang, Z., B. Ding, X. He, and J. Zhou, "Design of Algorithms Under Policy-Aware Local Differential Privacy: Utility-Privacy Trade-Offs", ArXiv, vol. abs/1909.11778, 2019.
Karyakin, A., and K. Salem, "DimmStore: Memory Power Optimization for Database Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1499--1512, 2019.
Tang, R., Y. Lu, L. Liu, L. Mou, O. Vechtomova, and J. Lin, "Distilling Task-Specific Knowledge From BERT Into Simple Neural Networks", ArXiv, vol. abs/1903.12136, 2019.
Saxena, H., L. Golab, and I. Ilyas, "Distributed Dependency Discovery", ArXiv, vol. abs/1903.05228, 2019.
Saxena, H., L. Golab, and I. Ilyas, "Distributed Implementations of Dependency Discovery Algorithms", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1624--1636, 2019.
Adhikari, A., A. Ram, R. Tang, and J. Lin, "DocBERT: BERT for Document Classification", ArXiv, vol. abs/1904.08398, 2019.
Nogueira, R., W. Yang, J. Lin, and K. Cho, "Document Expansion by Query Prediction", ArXiv, vol. abs/1904.08375, 2019.
Yang, W., Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin, "End-to-End Open-Domain Question Answering With BERTserini", ArXiv, vol. abs/1902.01718, 2019.
Godfrey, P., L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "Errata Note: Discovering Order Dependencies Through Order Compatibility", ArXiv, vol. abs/1905.02010, 2019.
Ram, A., J. Xin, M. Nagappan, Y. Yu, R. Cabrera Lozoya, A. Sabetta, and J. Lin, "Exploiting Token and Path-Based Representations of Code for Identifying Security-Relevant Commits", ArXiv, vol. abs/1911.07620, 2019.
Gorenflo, C., S. Lee, L. Golab, and S. Keshav, "FastFabric: Scaling Hyperledger Fabric to 20, 000 Transactions Per Second", ArXiv, vol. abs/1901.00910, 2019.
Zeng, L., L. Zou, T. Ozsu, L. Hu, and F. Zhang, "GSI: GPU-friendly Subgraph Isomorphism", ArXiv, vol. abs/1906.03420, 2019.
Heidari, A., J. McGrath, I. Ilyas, and T. Rekatsinas, "HoloDetect: Few-Shot Learning for Error Detection", ArXiv, vol. abs/1904.02285, 2019.
Liu, C., X. He, T. Chanyaswad, S. Wang, and P. Mittal, "Investigating Statistical Privacy Frameworks From the Perspective Of Hypothesis Testing", Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2019, issue 3, pp. 233--254, 2019.
Teofili, T., and J. Lin, "Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors", ArXiv, vol. abs/1910.10208, 2019.
Azmy, M., P. Shi, J. Lin, and I. Ilyas, "Matching Entities Across Different Knowledge Graphs With Graph Embeddings", ArXiv, vol. abs/1903.06607, 2019.
Nogueira, R., W. Yang, K. Cho, and J. Lin, "Multi-Stage Document Ranking With BERT", ArXiv, vol. abs/1910.14424, 2019.
Mhedhbi, A., and S. Salihoglu, "Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1692--1704, 2019.
Mhedhbi, A., and S. Salihoglu, "Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins", ArXiv, vol. abs/1903.02076, 2019.
Chowdhury, A. Roy, C. Wang, X. He, A. Machanavajjhala, and S. Jha, "Outis: Crypto-Assisted Differential Privacy on Untrusted Servers", ArXiv, vol. abs/1902.07756, 2019.
Livshits, E., I. Ilyas, B. Kimelfeld, and S. Roy, "Principles of Progress Indicators for Database Repairing", ArXiv, vol. abs/1904.06492, 2019.
Kotsogiannis, I., Y. Tao, X. He, M. Fanaeepour, A. Machanavajjhala, M. Hay, and G. Miklau, "PrivateSQL: A Differentially Private SQL Query Engine", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 11, pp. 1371--1384, 2019.
Ge, C., I. Ilyas, and F. Kerschbaum, "Secure Multi-Party Functional Dependency Discovery", Proceedings of the VLDB Endowment (PVLDB), vol. 13, issue 2, pp. 184--196, 2019.
Yang, W., H. Zhang, and J. Lin, "Simple Applications of BERT for Ad Hoc Document Retrieval", ArXiv, vol. abs/1903.10972, 2019.
Shi, P., and J. Lin, "Simple BERT Models for Relation Extraction and Semantic Role Labeling", ArXiv, vol. abs/1904.05255, 2019.
Sun, J., D. Deng, I. Ilyas, G. Li, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Technical Report: Optimizing Human Involvement for Entity Matching And Consolidation", ArXiv, vol. abs/1906.06574, 2019.
Lin, J., "The Neural Hype, Justified!: A Recantation", SIGIR Forum, vol. 53, issue 2, pp. 88--93, 2019.
Lin, J., L. Paniak, and G. Boerke, "The Performance Envelope of Inverted Indexing on Modern Hardware", ArXiv, vol. abs/1910.11028, 2019.
Gauch, M., J. Mai, and J. Lin, "The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction", ArXiv, vol. abs/1911.07249, 2019.
Lee, J., R. Tang, and J. Lin, "What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning", ArXiv, vol. abs/1911.03090, 2019.
Gorenflo, C., L. Golab, and S. Keshav, "XOX Fabric: A Hybrid Approach to Transaction Execution", ArXiv, vol. abs/1906.11229, 2019.

2018

Abedjan, Z., L. Golab, F. Naumann, and T. Papenbrock, Data Profiling: Morgan & Claypool, 2018.
Liu, L., and T. Ozsu, Encyclopedia of Database Systems, Second Edition: Springer, 2018.
Chomicki, J., and D. Toman, "Abstract Versus Concrete Temporal Query Languages", Encyclopedia of Database Systems: Springer, 2018.
Machanavajjhala, A., and X. He, "Analyzing Your Location Data With Provable Privacy Guarantees", Springer Handbooks: Springer, 2018.
Ozsu, T., "Client-Server Architecture", Encyclopedia of Database Systems: Springer, 2018.
Ozsu, T., "Data Manipulation Language (DML)", Encyclopedia of Database Systems: Springer, 2018.
Golab, L., "Data Stream", Encyclopedia of Database Systems: Springer, 2018.
Ozsu, T., "Database", Encyclopedia of Database Systems: Springer, 2018.
Ozsu, T., "Database Administrator (DBA)", Encyclopedia of Database Systems: Springer, 2018.
Tompa, F., "Document Databases", Encyclopedia of Database Systems: Springer, 2018.
Tompa, F., "Enterprise Content Management", Encyclopedia of Database Systems: Springer, 2018.
Tompa, F., "Hypertexts", Encyclopedia of Database Systems: Springer, 2018.
Toman, D., "Point-Stamped Temporal Models", Encyclopedia of Database Systems: Springer, 2018.
Ilyas, I., "Rank-Aware Query Processing", Encyclopedia of Database Systems: Springer, 2018.
Ilyas, I., "Rank-Join", Encyclopedia of Database Systems: Springer, 2018.
Salem, K., "Sagas", Encyclopedia of Database Systems: Springer, 2018.
Golab, L., "Stream Models", Encyclopedia of Database Systems: Springer, 2018.
Lin, J., "Summarization", Encyclopedia of Database Systems: Springer, 2018.
Chomicki, J., and D. Toman, "Temporal Logic in Database Query Languages", Encyclopedia of Database Systems: Springer, 2018.
Chomicki, J., and D. Toman, "Temporal Relational Calculus", Encyclopedia of Database Systems: Springer, 2018.
Roddick, J. F., and D. Toman, "Temporal Vacuuming", Encyclopedia of Database Systems: Springer, 2018.
Ilyas, I., "Top-K Queries", Encyclopedia of Database Systems: Springer, 2018.
Clarke, C., "Web Question Answering", Encyclopedia of Database Systems: Springer, 2018.
Zhang, H., M. Abualsaud, and M. Smucker, "A Study of Immediate Requery Behavior in Search", Conference on Human Information Interaction and Retrieval (CHIIR), 2018.
Abualsaud, M., N. Ghelani, H. Zhang, M. Smucker, G. Cormack, and M. Grossman, "A System for Efficient High-Recall Retrieval", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Koutris, P., S. Salihoglu, and D. Suciu, "Algorithmic Aspects of Parallel Query Processing", ACM International Conference on Management of Data (SIGMOD), 2018.
Tang, R., W. Wang, Z. Tu, and J. Lin, "An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
Glasbergen, B., M. Abebe, K. Daudjee, S. Foggo, and A. Pacaci, "Apollo: Learning Query Correlations for Predictive Caching in Geo-Distributed Systems", International Conference on Extending Database Technology (EDBT), 2018.
Cormack, G., and M. Grossman, "Beyond Pooling", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Mansour, E., D. Deng, R. Castro Fernandez, A. Ali Qahtan, W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, et al., "Building Data Civilizer Pipelines With an Advanced Workflow Engine", IEEE International Conference on Data Engineering (ICDE), 2018.
Yan, X., L. Yang, H. Zhang, X. Charles Lin, B. Wong, K. Salem, and T. Brecht, "Carousel: Low-Latency Transaction Processing for Globally-Distributed Data", ACM International Conference on Management of Data (SIGMOD), 2018.
Fraser, D. J., A. Kane, and F. Tompa, "Choosing Math Features for BM25 Ranking With Tangent-L", ACM Symposium on Document Engineering (DocEng), 2018.
Liang, Y., Z. Tu, L. Huang, and J. Lin, "CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities", North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
Lin, J., "Computing Without Servers, V8, Rocket Ships, and Other Batsh*t Crazy Ideas in Data Systems", Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES), 2018.
Langouri, M. Alipour, Z. Zheng, F. Chiang, L. Golab, and J. Szlichta, "Contextual Data Cleaning", IEEE International Conference on Data Engineering (ICDE), 2018.
Chopra, S., Y. Helen Jiang, A. Toulis, and L. Golab, "Data Analytics to Improve Co-Operative Education", International Conference on Extending Database Technology (EDBT), 2018.
Tang, R., and J. Lin, "Deep Residual Learning for Small-Footprint Keyword Spotting", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
Pacaci, A., and T. Ozsu, "Distribution-Aware Stream Partitioning for Distributed Stream Processing Systems", ACM International Conference on Management of Data (SIGMOD), 2018.
Abebe, M., K. Daudjee, B. Glasbergen, and Y. Tian, "EC-Store: Bridging the Gap Between Storage and Latency in Distributed Erasure Coded Systems", IEEE International Conference on Distributed Computing Systems (ICDCS), 2018.
Zihayat, M., A. An, L. Golab, M. Kargar, and J. Szlichta, "Effective Team Formation in Expert Networks", Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), 2018.
Zhang, H., M. Abualsaud, N. Ghelani, M. Smucker, G. Cormack, and M. Grossman, "Effective User Interaction for High-Recall Retrieval: Less Is More", International Conference on Information and Knowledge Management (CIKM), 2018.
Azmy, M., P. Shi, J. Lin, and I. Ilyas, "Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia", International Conference on Computational Linguistics (COLING), 2018.
Tompa, F., "Fashioning a Search Engine to Support Humanities Research", ACM Symposium on Document Engineering (DocEng), 2018.
Mihaylov, A., P. Godfrey, L. Golab, M. Kargar, D. Srivastava, and J. Szlichta, "FASTOD: Bringing Order to Data", IEEE International Conference on Data Engineering (ICDE), 2018.
Zheng, Z., M. Alipour, Z. Qu, I. Currie, F. Chiang, L. Golab, and J. Szlichta, "FastOFD: Contextual Data Cleaning With Ontology Functional Dependencies", International Conference on Extending Database Technology (EDBT), 2018.
Chopra, S., H. Gautreau, A. Khan, M. Mirsafian, and L. Golab, "Gender Differences in Undergraduate Engineering Applicants: A Text Mining Approach", Educational Data Mining (EDM), 2018.
Yu, R., Y. Xie, and J. Lin, "H2oloo at TREC 2018: Cross-Collection Relevance Transfer for The Common Core Track", Text Retrieval Conference (TREC), 2018.
Toman, D., and G. Weddell, "Identity Resolution in Conjunctive Querying Over DL-Based Knowledge Bases", International Workshop on Description Logics (DL), 2018.
Chopra, S., and L. Golab, "Job Description Mining to Understand Work-Integrated Learning", Educational Data Mining (EDM), 2018.
Grossman, M., and G. Cormack, "MRG_UWaterloo Participation in the TREC 2018 Common Core Track", Text Retrieval Conference (TREC), 2018.
Peng, P., L. Zou, T. Ozsu, and D. Zhao, "Multi-Query Optimization in Federated RDF Systems", International Conference on Database Systems for Advanced Applications (DASFAA), 2018.
Rao, J., F. Türe, and J. Lin, "Multi-Task Learning With Neural Networks for Voice Query Understanding On an Entertainment Platform", ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2018.
McIntyre, S., A. Borgida, D. Toman, and G. Weddell, "On Limited Conjunctions in Polynomial Feature Logics, With Applications In OBDA", International Conference on Principles of Knowledge Representation and Reasoning (KR), 2018.
Sequiera, R., L. Tan, and J. Lin, "Overview of the TREC 2018 Real-Time Summarization Track", Text Retrieval Conference (TREC), 2018.
Tu, Z., M. Li, and J. Lin, "Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures", North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
Mackenzie, J. M., S. J. Culpepper, R. Blanco, M. Crane, C. Clarke, and J. Lin, "Query Driven Algorithm Selection in Early Stage Retrieval", Web Search and Data Mining (WSDM), 2018.
Memon, B. Naveed, X. Charles Lin, A. Mufti, A. Scott Wesley, T. Brecht, K. Salem, B. Wong, and B. Cassell, "RaMP: A Lightweight RDMA Abstraction for Loosely Coupled Applications", USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2018.
Grewal, A., J. Jiang, G. Lam, T. Jung, L. Vuddemarri, Q. Li, A. Landge, and J. Lin, "RecService: Distributed Real-Time Graph Processing at Twitter", USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2018.
Ghelani, N., G. Cormack, and M. Smucker, "Refresh Strategies in Continuous Active Learning", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Mior, M. J., and K. Salem, "Renormalization of NoSQL Database Schemas", International Conference on Conceptual Modeling (ER), 2018.
Yang, P., S. Thiagarajan, and J. Lin, "Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter", ACM International Conference on Management of Data (SIGMOD), 2018.
Fernandez, R. Castro, E. Mansour, A. Ali Qahtan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang, "Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery", IEEE International Conference on Data Engineering (ICDE), 2018.
Kim, Y., and J. Lin, "Serverless Data Analytics With Flint", IEEE International Conference on Cloud Computing (CLOUD), 2018.
Aleardi, L. Castelli, S. Salihoglu, G. Singh, and M. Ovsjanikov, "Spectral Measures of Distortion for Change Detection in Dynamic Graphs", International Workshop on Complex Networks & Their Applications, 2018.
Kane, A., and F. Tompa, "Split-Lists and Initial Thresholds for WAND-based Search", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Gao, L., L. Golab, T. Ozsu, and G. Aluç, "Stream WatDiv: A Streaming RDF Benchmark", ACM International Conference on Management of Data (SIGMOD), 2018.
Mohammed, S., P. Shi, and J. Lin, "Strong Baselines for Simple Question Answering Over Knowledge Graphs With and Without Neural Networks", North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
Cormack, G., and M. Grossman, "Technology-Assisted Review in Empirical Medicine: Waterloo Participation In CLEF eHealth 2018", Conference and Labs of the Evaluation Forum (CLEF), 2018.
Grewal, A., and J. Lin, "The Evolution of Content Analysis for Personalized Recommendations At Twitter", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Cormack, G., and M. Grossman, "The Quest for Total Recall", ACM Symposium on Document Engineering (DocEng), 2018.
Ma, W., M. C. Keet, W. Oldford, D. Toman, and G. Weddell, "The Utility of the Abstract Relational Model and Attribute Paths In SQL", International Conference Knowledge Engineering and Knowledge Management (EKAW), 2018.
Glasbergen, B., M. Abebe, and K. Daudjee, "Tutorial: Adaptive Replication and Partitioning in Data Systems", International Middleware Conference (Middleware), 2018.
Lin, J., S. Mohammed, R. Sequiera, and L. Tan, "Update Delivery Mechanisms for Prospective Information Needs: An Analysis Of Attention in Mobile Users", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Abualsaud, M., G. Cormack, N. Ghelani, A. Ghenai, M. Grossman, S. Rahbariasl, H. Zhang, and M. Smucker, "UWaterlooMDS at the TREC 2018 Common Core Track", Text Retrieval Conference (TREC), 2018.
Rao, J., F. Türe, and J. Lin, "What Do Viewers Say to Their TVs?: An Analysis of Voice Queries To Entertainment Systems", International Conference on Research and Development in Information Retrieval (SIGIR), 2018.
Korkmaz, M., M. Karsten, K. Salem, and S. Salihoglu, "Workload-Aware CPU Performance Scaling for Transactional Database Systems", ACM International Conference on Management of Data (SIGMOD), 2018.
De Sa, C., I. Ilyas, B. Kimelfeld, C. Ré, and T. Rekatsinas, "A Formal Framework for Probabilistic Unclean Databases", ArXiv, vol. abs/1801.06750, 2018.
Ren, Y., M. Tomko, F. Dilys Salim, J. Chan, C. Clarke, and M. Sanderson, "A Location-Query-Browse Graph for Contextual Recommendation", IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 30, issue 2, pp. 204--218, 2018.
Tang, R., and J. Lin, "Adaptive Pruning of Neural Language Models for Mobile Devices", ArXiv, vol. abs/1809.10282, 2018.
Koutris, P., S. Salihoglu, and D. Suciu, "Algorithmic Aspects of Parallel Data Processing", Foundations and Trends in Databases, vol. 8, issue 4, pp. 239--370, 2018.
Yang, P., H. Fang, and J. Lin, "Anserini: Reproducible Ranking Baselines Using Lucene", Journal of Data and Information Quality, vol. 10, issue 4, pp. 16:1--16:20, 2018.
Tang, G., S. Keshav, L. Golab, and K. Wu, "Bikeshare Pool Sizing for Bike-and-Ride Multimodal Transit", IEEE Transactions on Intelligent Transportation Systems, vol. 19, issue 7, pp. 2279--2289, 2018.
Stonebraker, M., and I. Ilyas, "Data Integration: The Current Status and the Way Forward", IEEE Data Engineering Bulletin, vol. 41, issue 2, pp. 3--9, 2018.
Ammar, K., F. McSherry, S. Salihoglu, and M. Joglekar, "Distributed Evaluation of Subgraph Queries Using Worst-Case Optimal And Low-Memory Dataflows", Proceedings of the VLDB Endowment (PVLDB), vol. 11, issue 6, pp. 691--704, 2018.
Ammar, K., F. McSherry, S. Salihoglu, and M. Joglekar, "Distributed Evaluation of Subgraph Queries Using Worstcase Optimal LowMemory Dataflows", ArXiv, vol. abs/1802.03760, 2018.
Szlichta, J., P. Godfrey, L. Golab, M. Kargar, and D. Srivastava, "Effective and Complete Discovery of Bidirectional Order Dependencies Via Set-Based Axioms", The VLDB Journal, vol. 27, issue 4, pp. 573--591, 2018.
Lamb, C., D. G. Brown, and C. Clarke, "Evaluating Computational Creativity: An Interdisciplinary Tutorial", ACM Computing Surveys, vol. 51, issue 2, pp. 28:1--28:34, 2018.
Zhang, H., G. Cormack, M. Grossman, and M. Smucker, "Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval", ArXiv, vol. abs/1803.08988, 2018.
Hopfgartner, F., A. Hanbury, H. Müller, I. Eggel, K. Balog, T. Brodt, G. Cormack, J. Lin, J. Kalpathy-Cramer, N. Kando, et al., "Evaluation-as-a-Service for the Computational Sciences: Overview And Outlook", Journal of Data and Information Quality, vol. 10, issue 4, pp. 15:1--15:32, 2018.
Ammar, K., and T. Ozsu, "Experimental Analysis of Distributed Graph Systems", Proceedings of the VLDB Endowment (PVLDB), vol. 11, issue 10, pp. 1151--1164, 2018.
Ammar, K., and T. Ozsu, "Experimental Analysis of Distributed Graph Systems", ArXiv, vol. abs/1806.08082, 2018.
Gebaly, K. El, G. Feng, L. Golab, F. Korn, and D. Srivastava, "Explanation Tables", IEEE Data Engineering Bulletin, vol. 41, issue 3, pp. 43--51, 2018.
Tang, R., A. Adhikari, and J. Lin, "FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks", ArXiv, vol. abs/1811.03060, 2018.
Gebaly, K. El, and J. Lin, "In-Browser Split-Execution Support for Interactive Analytics in The Cloud", ArXiv, vol. abs/1804.08822, 2018.
Rao, J., W. Yang, Y. Zhang, F. Türe, and J. Lin, "Multi-Perspective Relevance Matching With Hierarchical ConvNets For Social Media Search", ArXiv, vol. abs/1805.08159, 2018.
Tang, R., and J. Lin, "Progress and Tradeoffs in Neural Language Models", ArXiv, vol. abs/1811.00942, 2018.
Lin, J., and P. Yang, "Repeatability Corner Cases in Document Ranking: The Impact of Score Ties", ArXiv, vol. abs/1807.05798, 2018.
Liu, Y., M. P. Kato, C. Clarke, N. Kando, and T. Sakai, "Report on NTCIR-13: The Thirteenth Round of NII Testbeds and Community For Information Access Research", SIGIR Forum, vol. 52, issue 1, pp. 102--110, 2018.
J. Culpepper, S., F. Diaz, and M. Smucker, "Research Frontiers in Information Retrieval: Report From the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018)", SIGIR Forum, vol. 52, issue 1, pp. 34--90, 2018.
Salihoglu, S., and T. Ozsu, "Response to "Scale Up or Scale Out for Graph Processing"", IEEE Internet Computing, vol. 22, issue 5, pp. 18--24, 2018.
El-Roby, A., K. Ammar, A. Aboulnaga, and J. Lin, "Sapphire: Querying RDF Data Made Simple", ArXiv, vol. abs/1805.11728, 2018.
Lin, J., "Scale Up or Scale Out for Graph Processing?", IEEE Internet Computing, vol. 22, issue 3, pp. 72--78, 2018.
Kushagra, S., S. Ben-David, and I. Ilyas, "Semi-Supervised Clustering for De-Duplication", ArXiv, vol. abs/1810.04361, 2018.
Kim, Y., and J. Lin, "Serverless Data Analytics With Flint", ArXiv, vol. abs/1803.06354, 2018.
Bater, J., X. He, W. Ehrich, A. Machanavajjhala, and J. Rogers, "Shrinkwrap: Differentially-Private Query Processing in Private Data Federations", ArXiv, vol. abs/1810.01816, 2018.
Bater, J., X. He, W. Ehrich, A. Machanavajjhala, and J. Rogers, "ShrinkWrap: Efficient SQL Query Processing in Differentially Private Data Federations", Proceedings of the VLDB Endowment (PVLDB), vol. 12, issue 3, pp. 307--320, 2018.
Shi, P., J. Rao, and J. Lin, "Simple Attention-Based Representation Learning for Ranking Short Social Media Posts", ArXiv, vol. abs/1811.01013, 2018.
Tang, R., G. Yang, H. Wei, Y. Mao, F. Türe, and J. Lin, "Streaming Voice Query Recognition Using Causal Convolutional Recurrent Neural Networks", ArXiv, vol. abs/1812.07754, 2018.
Lin, J., "The Neural Hype and Comparisons Against Weak Baselines", SIGIR Forum, vol. 52, issue 2, pp. 40--51, 2018.
Li, Y., L. Zou, T. Ozsu, and D. Zhao, "Time Constrained Continuous Subgraph Search Over Streaming Graphs", ArXiv, vol. abs/1801.09240, 2018.
He, X., Policy Driven Data Sharing With Provable Privacy Guarantees: Duke University, Durham, NC, USA, 2018.

2017

Shen, C., T. Shen, and J. Lin, "Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future", Algorithms for Next-Generation Sequencing Data, Techniques, Approaches, and Applications: Springer, 2017.
Crane, M., S. J. Culpepper, J. Lin, J. M. Mackenzie, and A. Trotman, "A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation", Web Search and Data Mining (WSDM), 2017.
Baruah, G., R. McCreadie, and J. Lin, "A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries", International Conference on Information and Knowledge Management (CIKM), 2017.
Fernandez, R. Castro, D. Deng, E. Mansour, A. Ali Qahtan, W. Tao, Z. Abedjan, A. K. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, et al., "A Demo of the Data Civilizer System", ACM International Conference on Management of Data (SIGMOD), 2017.
Karyakin, A., and K. Salem, "An Analysis of Memory Power Consumption in Database Systems", International Workshop on Data Management on New Hardware (DaMoN), 2017.
Crane, M., and J. Lin, "An Exploration of Serverless Architectures for Information Retrieval", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
He, H., K. Ganjam, N. Jain, J. Lundin, R. White, and J. Lin, "An Insight Extraction System on BioMedical Literature With Deep Neural Networks", Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
Toman, D., and G. Weddell, "An Interpolation-Based Compiler and Optimizer for Relational Queries (System Design Report)", International Conference on Logic Programming and Automated Reasoning (LPAR), 2017.
Yang, P., H. Fang, and J. Lin, "Anserini: Enabling the Use of Lucene for Information Retrieval Research", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Zihayat, M., A. An, L. Golab, M. Kargar, and J. Szlichta, "Authority-Based Team Discovery in Social Networks", International Conference on Extending Database Technology (EDBT), 2017.
Grossman, M., G. Cormack, and A. Roegiest, "Automatic and Semi-Automatic Document Selection for Technology-Assisted Review", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Zhang, H., J. Rao, J. Lin, and M. Smucker, "Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
He, X., A. Machanavajjhala, C. J. Flynn, and D. Srivastava, "Composing Differential Privacy and Secure Computation: A Case Study On Scaling Private Record Linkage", Conference on Computer and Communications Security (CCS), 2017.
Borgida, A., D. Toman, and G. Weddell, "Concerning Referring Expressions in Query Answers", International Joint Conference on Artificial Intelligence (IJCAI), 2017.
Abedjan, Z., L. Golab, and F. Naumann, "Data Profiling: A Tutorial", ACM International Conference on Management of Data (SIGMOD), 2017.
Bejnordi, B. Ehteshami, J. Lin, B. Glass, M. Mullooly, G. L. Gierach, M. E. Sherman, N. Karssemeijer, J. van der Laak, and A. H. Beck, "Deep Learning-Based Assessment of Tumor-Associated Stroma for Diagnosing Breast Cancer in Histopathology Images", IEEE International Symposium on Biomedical Imaging (ISBI), 2017.
Machanavajjhala, A., X. He, and M. Hay, "Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges", ACM International Conference on Management of Data (SIGMOD), 2017.
Pacaci, A., A. Zhou, J. Lin, and T. Ozsu, "Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications", International Workshop on Graph Data Management Experiences and Systems (GRADES), 2017.
Baskaran, S., A. Keller, F. Chiang, L. Golab, and J. Szlichta, "Efficient Discovery of Ontology Functional Dependencies", International Conference on Information and Knowledge Management (CIKM), 2017.
Ghelani, N., S. Mohammed, S. Wang, and J. Lin, "Event Detection on Curated Tweet Streams", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Rao, J., H. He, and J. Lin, "Experiments With Convolutional Neural Network Models for Answer Selection", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Vtyurina, A., D. Savenkov, E. Agichtein, and C. Clarke, "Exploring Conversational Search With Humans, Assistants, and Wizards", ACM Conference on Human Factors in Computing Systems (CHI), 2017.
Sequiera, R., and J. Lin, "Finally, a Downloadable Test Collection of Tweets", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Toulis, A., and L. Golab, "Graph Mining to Characterize Competition for Employment", ACM International Conference on Management of Data (SIGMOD), 2017.
Kankanamge, C., S. Sahu, A. Mhedhbi, J. Chen, and S. Salihoglu, "Graphflow: An Active Graph Database", ACM International Conference on Management of Data (SIGMOD), 2017.
Afrati, F. N., M. R. Joglekar, C. Ré, S. Salihoglu, and J. D. Ullman, "GYM: A Multiround Distributed Join Algorithm", International Conference on Database Theory (ICDT), 2017.
Fink, S. Dominik, L. Golab, S. Keshav, and H. de Meer, "How Similar Is the Usage of Electric Cars and Electric Bicycles?", Energy-Efficient Computing and Networking (e-Energy), 2017.
Gebaly, K. El, and J. Lin, "In-Browser Interactive SQL Analytics With Afterburner", ACM International Conference on Management of Data (SIGMOD), 2017.
Lamb, C., D. G. Brown, and C. Clarke, "Incorporating Novelty, Meaning, Reaction and Craft Into Computational Poetry: A Negative Experimental Result", International Conference on Computational Creativity (ICCC), 2017.
Gorenflo, C., L. Golab, and S. Keshav, "Managing Sensor Data Streams: Lessons Learned From the WeBike Project", International Conference on Statistical and Scientific Database Management (SSDBM), 2017.
Rao, J., F. Türe, X. Niu, and J. Lin, "Mining the Temporal Statistics of Query Terms for Searching Social Media Posts", International Conference on the Theory of Information Retrieval (ICTIR), 2017.
Grossman, M., and G. Cormack, "MRG_UWaterloo and WaterlooCormack Participation in the TREC 2017 Common Core Track", Text Retrieval Conference (TREC), 2017.
Grossman, M., and G. Cormack, "MRG_UWaterloo and WaterlooCormack Participation in the TREC 2017 Common Core Track", Text Retrieval Conference (TREC), 2017.
Cormack, G., and M. Grossman, "Navigating Imprecision in Relevance Assessments on the Road to Total Recall: Roger and Me", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Cui, X., M. Mior, B. Wong, K. Daudjee, and S. Rizvi, "Netstore: Leveraging Network Optimizations to Improve Distributed Transaction Processing Performance", International Middleware Conference (Middleware), 2017.
Toman, D., and G. Weddell, "On Partial Features in the DLF Dialects of Description Logic With Inverse Features", International Workshop on Description Logics (DL), 2017.
Tan, L., G. Baruah, and J. Lin, "On the Reusability of "Living Labs" Test Collections: : A Case Study Of Real-Time Summarization", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Roegiest, A., L. Tan, and J. Lin, "Online in-Situ Interleaved Evaluation of Real-Time Push Notification Systems", International Conference on Research and Development in Information Retrieval (SIGIR), 2017.
Meng, X., and L. Golab, "Optimal Reducer Placement to Minimize Data Transfer in MapReduce-style Processing", IEEE International Conference on Big Data (IEEE BigData), 2017.
Lin, J., S. Mohammed, R. Sequiera, L. Tan, N. Ghelani, M. Abualsaud, R. McCreadie, D. Milajevs, and E. M. Voorhees, "Overview of the TREC 2017 Real-Time Summarization Track", Text Retrieval Conference (TREC), 2017.