Candidate: Zishuo Ding
Date: October 23, 2023
Time: 10:00 AM
Place: EV3 1408
Supervisor(s): Shang, Weiyi
Abstract:
Machine learning-based approaches have been widely used to address natural language processing (NLP) problems. Considering the similarity between natural language text and source code, researchers have been working on applying techniques from NLP to deal with code. On the other hand, source code and natural language are by nature different. For example, code is highly structured and executable. Thus, directly applying the NLP techniques may not be optimal, and how to effectively optimize these NLP techniques to adapt to software engineering (SE) tasks remains a challenge. Therefore, to tackle the challenge, in this dissertation, we focus on two research directions: 1) distributed code representations, and 2) logging statements, which are two important intersections between the natural language and source code. For distributed code representations, we first discuss the limitations of existing code embedding techniques, and then, we propose a novel approach to learn more generalizable code embeddings in a task-agnostic manner. For logging statements, we first propose an automated deep learning-based approach to automatically generate accurate logging texts by translating the related source code into short textual descriptions. Then, we make the first attempt to comprehensively study the temporal relations between logging and its corresponding source code, which is later used to detect issues in logging statements. We anticipate that our study can provide useful suggestions and support to developers in utilizing NLP techniques to assist in SE tasks.