会议专题

Generating and Mizing Feature Sets from Language Models for Sentiment Classification

This paper presents methods for mixing feature sets in sentence-level sentiment analysis where a sentence is classified into one of three classes: positive, negative, and neutral. Motivated by the need to classify sentences in Korean whose sentiment-revealing expressions tend to have different effects according to their syntactic categories, we employed a language modeling (LM) approach with 162 different LMs based on syntactic categories that are effectively combined with a Logistic Regression classifier. The experimental results show that this approach significantly outperforms clue-based SVM classifiers. The enumeration of feature types arising from the LMs for the Logistic Regression classifier allowed us to show that domain specific models can be smoothed with a general model and that attaching a syntactic category to a feature helps improving effectiveness. The classification results are further improved by applying a clue-based classifier. The rationale behind this two-step process is to classify sentences with a relatively conservative classifier in picking positive and negative sentences and to apply a high-precision classifier to the sentences in the neutral class.

Tezt Categorization sentiment analysis polarity classification

Yoonjae Jeong Youngho Kim Seongchan Kim Sung-Hyon Myaeng Hyo-Jung Oh

Korea Advanced Institute of Science and Technology (KAIST) Daejeon, Republic of Korea Electronics and Telecommunications Research Institute (ETRI) Daejeon, Republic of Korea

国际会议

International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

大连

英文

1-8

2009-09-24(万方平台首次上网日期,不代表论文的发表时间)