PhD Seminar: Logging Statement Analysis and Automation in Software Systems with Data Mining and Machine Learning Techniques

Friday, May 21, 2021 2:00 pm - 2:00 pm EDT (GMT -04:00)

Candidate: Sina Gholamian

Title: Logging Statement Analysis and Automation in Software Systems with Data Mining and Machine Learning Techniques

Date: May 21, 2021

Time: 2:00 PM

Place: REMOTE ATTENDANCE

Supervisor(s): Ward, Paul

Abstract:

Log files are widely used to record runtime information of software systems, such as the timestamp of an event, the unique ID of the source of the log, and a part of the state of task execution. The rich information of logs enables system developers (and operators) to monitor the runtime behaviors of their systems and further track down system problems in production settings. With the ever-increasing scale and complexity of modern systems, the volume of logs is rapidly growing, e. g., at a rate of gigabytes of logs per minute. Therefore, the traditional way of log analysis that largely relies on manual inspection (e.g., searching for error/warning keywords or grep) has become an inefficient, labor-intensive, and error-prone task. To address this challenge, many efforts have recently tried to automate log analysis by use of data-mining techniques. However, the current logging process is mostly manual, and thus, proper placement and content of logging statements remain as challenges. To overcome these challenges, methods that aim to automate log placement and content prediction, i.e., ‘where and what to log’, are of high interest.

Thus, in this research, we focus on predicting the log statements, and for this purpose, we perform an experimental study on open-source Java projects. We introduce a log-aware code-clone detection method to predict the location and description of logging statements. Additionally, we incorporate natural language processing (NLP) deep learning methods to further enhance the performance of the log statements’ description prediction. We also analyze execution logs and extract natural language characteristics of logs to enable the application of natural language models for automated log file analysis. Finally, we propose an automated tool for analyzing log files and measure the information gain from logs for different log analysis tasks such as anomaly detection.