Outlier Detection from Massive Short Documents using Domain Ontology

摘要：

With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short documents such as paper summary or conversations in open chatting rooms. It is useful to detect outliers from those documents in intelligence analysis applications. However, traditional outlier detecting methods based on vector space model can not get acceptable accuracy because the key words appear at low frequency. On the other hand, traditional outlier detecting algorithms become very inefficient or even unavailable when processing massive data. In this paper a density-based outlier detecting method using domain ontology is presented. This algorithm uses domain ontology to calculate the semantic distance between short documents which improves the accuracy. Parallel method is also used to get better performance and scalability.

关键词： massive short document outlier detection density domain ontology

作者: Yonghcng Wang Shenghong Yang

作者单位: School of Computer and Communication Hunan University Changsha, China

会议类型: 国际会议

会议名称: 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems(2010 IEEE 智能计算与智能系统国际会议 ICIS 2010)

会议地点: 厦门

会议语种:英文

页码: 558-562

在线出版日期: 2010-10-29（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Outlier Detection from Massive Short Documents using Domain Ontology