Eztending WordNet with compound nouns for semi-automatic annotation in data integration systems

摘要：

The focus of data integration systems is on producing a comprehensive global schema successfully integrating data from heterogeneous data sources (heterogeneous in format and in structure). Starting from the “meanings associated to schema elements (i.e. class/attribute labels) and exploiting the structural knowledge of sources, it is possible to discover relationships among the elements of different schemata. Lexical annotation is the explicit inclusion of the “meaning of a data source element according to a lexical resource. Accuracy of semi-automatic lexical annotator tools is poor on real-world schemata due to the abundance of non-dictionary compound nouns. It follows that a large set of relationships among different schemata is discovered, including a great amount of false positive relationships. In this paper we propose a new method for the annotation of non-dictionary compound nouns, which draws its inspiration from works in the natural language disambiguation area. The method extends the lexical annotation module of the MOMIS data integration system.

关键词： Compound noun WordNet data integration annotation

作者: Domenico Beneventano Sonia Bergamaschi Serena Sorrentino

作者单位: DII, University of Modena and Reggio Emilia. Modena, Italy ICT Doctorate School, University of Modena and Reggio Emilia. Modena, Italy

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-8

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Eztending WordNet with compound nouns for semi-automatic annotation in data integration systems