Extracting Domain Words from Chinese Video Titles Using Search Engine

摘要：

Video title is the essential source of information used to describe video content in text-based video retrieval and recommendation system. Correctly extracting domain words in video titles is one of the most important tasks for understanding the titles’ meaning for accurate videorecommendation. For reasons of the characteristics of title naming and Chinese video-up-loaders’ habits, domain words are hard to recognize in video titles, which hinders the understanding of the titles for semantic based video retrieval and recommendation. This paper presents an automatic and unsupervised method which uses title’ search results in web search engine to identify and extract the domain words in the titles. In experiments, 62305 Chinese automotive video titles are crawled from web for testing our method. The result shows that domain words could be recognized, while many extracted words are informal domain words, which are not listed in domain dictionaries.

关键词： video title domain words information extraction mutual information clustering

作者: Quan Qi Jing Dong Fangfang Li

作者单位: School of ComputerBeijing Institute of TechnologyBeijing, China School of Computer Beijing Institute of Technology Beijing, China

会议类型: 国际会议

会议名称: 2011 International Conference on Information System and Computational Intelligence(2011 IEEE信息系统与计算智能国际会议 ICISCI 2011)

会议地点: 哈尔滨

会议语种:英文

页码: 362-366

在线出版日期: 2011-01-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Extracting Domain Words from Chinese Video Titles Using Search Engine