Generating and Mizing Feature Sets from Language Models for Sentiment Classification

摘要：

This paper presents methods for mixing feature sets in sentence-level sentiment analysis where a sentence is classified into one of three classes: positive, negative, and neutral. Motivated by the need to classify sentences in Korean whose sentiment-revealing expressions tend to have different effects according to their syntactic categories, we employed a language modeling (LM) approach with 162 different LMs based on syntactic categories that are effectively combined with a Logistic Regression classifier. The experimental results show that this approach significantly outperforms clue-based SVM classifiers. The enumeration of feature types arising from the LMs for the Logistic Regression classifier allowed us to show that domain specific models can be smoothed with a general model and that attaching a syntactic category to a feature helps improving effectiveness. The classification results are further improved by applying a clue-based classifier. The rationale behind this two-step process is to classify sentences with a relatively conservative classifier in picking positive and negative sentences and to apply a high-precision classifier to the sentences in the neutral class.

关键词： Tezt Categorization sentiment analysis polarity classification

作者: Yoonjae Jeong Youngho Kim Seongchan Kim Sung-Hyon Myaeng Hyo-Jung Oh

作者单位: Korea Advanced Institute of Science and Technology (KAIST) Daejeon, Republic of Korea Electronics and Telecommunications Research Institute (ETRI) Daejeon, Republic of Korea

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-8

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Generating and Mizing Feature Sets from Language Models for Sentiment Classification