A Topic Model Integrating Patent Classification Information for Patent Analysis
As a significant text mining techniques, topic model has been used increasingly in patent analysis.However, due to the specific characteristics of patent text, such as various terminologies consists of multiple words and numerous synonyms, topics extracted by traditional topic models like LDA are always hard to explain.In this paper we propose a new topic model-Patent Classification LDA, which takes advantage of patent classification taxonomy and class codes of patents to benefit topics interpretability.Then Gibbs sampling method is utilized to estimate corresponding parameters.Finally, experiment is conducted on the patents of hard disk drive head to demonstrate the feasibility and effectiveness of Patent Classification LDA, the result shows that it can not only provide key words to class codes from patent classification taxonomy, but also has a lower perplexity than LDA during fold in query process,which means Patent Classification LDA has a better generalization performance and prediction ability than LDA model.
Topic model patent analysis Gibbs sampling perplexity hard disk drive
CHEN Liang SHANG Weijiao YANG Guancan ZHANG Jing LEI Xiaoping
National Engineering Research Center of Science and Technology Information,Institute of Scientific a Research Institute of Forestry Policy and Information, Chinese Academy of Forestry,Beijing 100091,Ch
国际会议
武汉
英文
123-126
2016-10-21(万方平台首次上网日期,不代表论文的发表时间)