会议专题

An unsupervised method for lezical acquisition based on Bootstrapping

In this paper, we present an unsupervised method called Mutual Screening Graph Algorithm based on Bootstrapping (MSGA-Bootstrapping) for lexical acquisition. Bootstrapping is a weakly supervised algorithm that has been the focus of attention in many Natural Language Processing(NLP) and Information Extraction(IE) fields, especially in learning semantic lexicons. Our approach only needs unannotated corpuses to learn new words for each semantic category. MSGA-Bootstrapping hypothesizes the semantic class of a word based on collective information over a large body of extraction pattern contexts and the extraction patterns and words can mutual reinforced. Although there are some former algorithms on this task, their precision and stability can be enhanced. By counting on the impact of both the quality information and quantity information of words and patterns when scoring the words and patterns created by them, we improve the former bootstrapping algorithm. We also make MSGA-Bootstrapping run as an unsupervised method by changing the order of its processing. Experiments have shown that MSGA can outperform previous bootstrapping algorithm Basilisk and GMR (Graph Mutual Reinforcement based Bootstrapping). And the result of using MSGA-Bootstrapping as an unsupervised method is acceptable.

Lezical acquisition unsupervised method Bootstrapping

Yuhan Zhang Yanquan Zhou

Research Center of Intelligence Science and Technology, Beijing University of Posts and Telecommunications Beijing, China

国际会议

International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

大连

英文

1-7

2009-09-24(万方平台首次上网日期,不代表论文的发表时间)