Generating and Mizing Feature Sets from Language Models for Sentiment Classification
This paper presents methods for mixing feature sets in sentence-level sentiment analysis where a sentence is classified into one of three classes: positive, negative, and neutral. Motivated by the need to classify sentences in Korean whose sentiment-revealing expressions tend to have different effects according to their syntactic categories, we employed a language modeling (LM) approach with 162 different LMs based on syntactic categories that are effectively combined with a Logistic Regression classifier. The experimental results show that this approach significantly outperforms clue-based SVM classifiers. The enumeration of feature types arising from the LMs for the Logistic Regression classifier allowed us to show that domain specific models can be smoothed with a general model and that attaching a syntactic category to a feature helps improving effectiveness. The classification results are further improved by applying a clue-based classifier. The rationale behind this two-step process is to classify sentences with a relatively conservative classifier in picking positive and negative sentences and to apply a high-precision classifier to the sentences in the neutral class.
Tezt Categorization sentiment analysis polarity classification
Yoonjae Jeong Youngho Kim Seongchan Kim Sung-Hyon Myaeng Hyo-Jung Oh
Korea Advanced Institute of Science and Technology (KAIST) Daejeon, Republic of Korea Electronics and Telecommunications Research Institute (ETRI) Daejeon, Republic of Korea
国际会议
大连
英文
1-8
2009-09-24(万方平台首次上网日期,不代表论文的发表时间)