会议专题

Chinese New Word Detection from Query Logs

Existing works in literature mostly resort to the web pages or other author-centric resources to detect new words, which require highly complex text processing. This paper exploits the visitor-centric resources, specifically, query logs from the commercial search engine, to detect new words. Since query logs are generated by the search engine users, and are segmented naturally, the complex text processing work can be avoided. By dynamic time warping, a new word detection algorithm based on the trajectory similarity is proposed to distinguish new words from the query logs. Experiments based on real world data sets show the effectiveness and efficiency of the proposed algorithm.

new word detection dynamic time warping query logs search engine

Yan Zhang Maosong Sun Yang Zhang

State Key Laboratory on Intelligent Technology and Systems Technology Deptment of Computer Science a State Key Laboratory on Intelligent Technology and Systems Technology Deptment of Computer Science a Sohu Inc. R&D center, Beijing 100084, China

国际会议

6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)

重庆

英文

233-243

2010-11-19(万方平台首次上网日期,不代表论文的发表时间)