Speaker: Zhenmei Gu
In information extraction (IE), training statistical models like HMMs (hidden Markov models) usually requires a considerably large set of labeled data. Such a requirement may not be easily met in a practical IE application. In this talk, we investigate how to adapt a fully supervised IE learner to a semi-supervised one so that the learner is able to make use of unlabeled data to help train a more robust model from very limited labeled data. In particular, we consider applying the Co-EM learning strategy for this purpose. A semi-supervised two-view algorithm is then proposed to train HMMs to learn extraction patterns from both the term sequences and the Part-of-Speech sequences of texts. Our initial experimental results show that the proposed algorithm yields better, in some cases significantly better, extraction performance than the learner using only the labeled data.
Food: Dana Wilkinson