Extracting Domain Words from Chinese Video Titles Using Search Engine
Video title is the essential source of information used to describe video content in text-based video retrieval and recommendation system. Correctly extracting domain words in video titles is one of the most important tasks for understanding the titles’ meaning for accurate videorecommendation. For reasons of the characteristics of title naming and Chinese video-up-loaders’ habits, domain words are hard to recognize in video titles, which hinders the understanding of the titles for semantic based video retrieval and recommendation. This paper presents an automatic and unsupervised method which uses title’ search results in web search engine to identify and extract the domain words in the titles. In experiments, 62305 Chinese automotive video titles are crawled from web for testing our method. The result shows that domain words could be recognized, while many extracted words are informal domain words, which are not listed in domain dictionaries.
video title domain words information extraction mutual information clustering
Quan Qi Jing Dong Fangfang Li
School of ComputerBeijing Institute of TechnologyBeijing, China School of Computer Beijing Institute of Technology Beijing, China
国际会议
哈尔滨
英文
362-366
2011-01-18(万方平台首次上网日期,不代表论文的发表时间)