Synonyms Extraction Using Web Content Focused Crawling
Documents or Web pages collected from the World Wide Web have been considered one of the most important sources for information.Using search engines to retrieve the documents can harvest lots of information,facilitating information exchange and knowledge sharing,including foreign information.However,to better understand by local readers,foreign words,like English,are often translated to local language such as Chinese.Due to different translators and the lack of translation standard,translating foreign words may pose a notorious headache and result in different transliterations,particularly in proper nouns like person names and geographical names.For example,Bin Laden is translated into terms 賓拉登(binladeng) or 本拉登(benladeng).Both are valid synonymous transliterations.In this research,we propose an approach to determining synonymous transliterations via mining Web pages retrieved by a search engine.Experiments show that the proposed approach can effectively extract synonymous transliterations given an input transliteration.
Transliteration Associated Word Unknown Words Focused Crawling Speech Sound Comparison
Chien-Hsing Chen Chung-Chian Hsu
National Yunlin University of Science and Technology,Taiwan
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
286-297
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)