Citation:
Davoudi, H. , & An, A. . (2015). Ontology-Based Topic Labeling and Quality Prediction. In Foundations of Intelligent Systems (pp. 171–179). Springer International Publishing.
Abstract:
Probabilistic topic models based on Latent Dirichlet Allocation (LDA) are increasingly used to discover hidden structure behind big text corpora. Although topic models are extremely useful tools for exploring and summarizing large text collections, most of time the inferred topics are not easy to understand and interpret by human. In addition, some inferred topics may be described by words that are not much relevant to each other and are thus considered low quality topics. In this paper, we propose a novel method that not only assigns a label to each topic but also identifies low quality topics by providing a reliability score for the label of each topic. Our rationale is that a topic labeling method cannot provide a good label for a low quality topic, and thus predicting label reliability is as important as topic labeling itself. We propose a novel measure (Ontology-Based Coherence) that can assess coherence of topics with respect to an ontology structure effectively. Empirical results on a real dataset and our user study show that the proposed predictive model using the defined measures can predict the label reliability better than two alternative methods.