Angshuman
Ghosh,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Mobile App reviews contain information relevant to developers. They can check if users are complaining about a particular app issue. However, the volume of incoming reviews is huge and therefore hard to gain such intelligence from. Existing research that attempts to extract this information suffers from two major issues: supervised methods are usually pre-trained, and thus, does not provide the developers the freedom to define the app issue they are interested in, whereas unsupervised methods do not guarantee that a particular app issue topic will be discovered.
In this thesis, we attempt to devise a framework that would allow developers to define topics related to app issues at any time, and with minimal effort, discover as many reviews related to the issue as possible. Scalable Continuous Active Learning (S-CAL) is an algorithm that can be used to quickly train a model to retrieve documents with high recall.
First, we investigate whether S-CAL can be used as a tool for training models to retrieve reviews about a specific app issue. We also investigate whether a model trained to retrieve reviews about a specific issue for one app can be used to do the same for a separate app facing the same issue. We further investigate transfer learning methods to improve retrieval performance for the separate apps. Through various experiments, we show that S-CAL can be used to quickly train models to retrieve reviews about a particular issue.
We show that developers can discover relevant information during the process of training the model and that the information discovered is more than the information that can be discovered using keyword search under similar time restrictions. Then, we show that models trained using S-CAL can indeed be reused for retrieving reviews for a separate app and that performing additional training using transfer learning protocols can improve performance for models that performed below expectation. Finally, we compare the performance of the models trained by S-CAL at retrieving reviews for a separate app against that of two state-of-the-art app review analysis methods out of which one uses supervised learning, whereas the other uses unsupervised learning. We show that at the task of retrieving relevant reviews about a particular topic, models trained by S-CAL consistently outperform existing state-of-the-art methods.