会议专题

Extracting Domain Words from Chinese Video Titles Using Search Engine

Video title is the essential source of information used to describe video content in text-based video retrieval and recommendation system. Correctly extracting domain words in video titles is one of the most important tasks for understanding the titles’ meaning for accurate videorecommendation. For reasons of the characteristics of title naming and Chinese video-up-loaders’ habits, domain words are hard to recognize in video titles, which hinders the understanding of the titles for semantic based video retrieval and recommendation. This paper presents an automatic and unsupervised method which uses title’ search results in web search engine to identify and extract the domain words in the titles. In experiments, 62305 Chinese automotive video titles are crawled from web for testing our method. The result shows that domain words could be recognized, while many extracted words are informal domain words, which are not listed in domain dictionaries.

video title domain words information extraction mutual information clustering

Quan Qi Jing Dong Fangfang Li

School of ComputerBeijing Institute of TechnologyBeijing, China School of Computer Beijing Institute of Technology Beijing, China

国际会议

2011 International Conference on Information System and Computational Intelligence(2011 IEEE信息系统与计算智能国际会议 ICISCI 2011)

哈尔滨

英文

362-366

2011-01-18(万方平台首次上网日期,不代表论文的发表时间)