An unsupervised method for lezical acquisition based on Bootstrapping
In this paper, we present an unsupervised method called Mutual Screening Graph Algorithm based on Bootstrapping (MSGA-Bootstrapping) for lexical acquisition. Bootstrapping is a weakly supervised algorithm that has been the focus of attention in many Natural Language Processing(NLP) and Information Extraction(IE) fields, especially in learning semantic lexicons. Our approach only needs unannotated corpuses to learn new words for each semantic category. MSGA-Bootstrapping hypothesizes the semantic class of a word based on collective information over a large body of extraction pattern contexts and the extraction patterns and words can mutual reinforced. Although there are some former algorithms on this task, their precision and stability can be enhanced. By counting on the impact of both the quality information and quantity information of words and patterns when scoring the words and patterns created by them, we improve the former bootstrapping algorithm. We also make MSGA-Bootstrapping run as an unsupervised method by changing the order of its processing. Experiments have shown that MSGA can outperform previous bootstrapping algorithm Basilisk and GMR (Graph Mutual Reinforcement based Bootstrapping). And the result of using MSGA-Bootstrapping as an unsupervised method is acceptable.
Lezical acquisition unsupervised method Bootstrapping
Yuhan Zhang Yanquan Zhou
Research Center of Intelligence Science and Technology, Beijing University of Posts and Telecommunications Beijing, China
国际会议
大连
英文
1-7
2009-09-24(万方平台首次上网日期,不代表论文的发表时间)