Improving Latent Semantic Indexing with Concepts Mapping Based on Domain Ontology
“Curse of dimensionality is a common problem in the area of information retrieval. It was verified that points in a vector space are projected to a random subspace of suitably high dimension, and then the distances between the points are approximately preserved. Although such a random projection can be used to reduce the dimension of the document space, it does not bring together semantically related documents. Latent Semantic Indexing (LSI) projects documents to lower dimensional LSI space from higher dimensional term space with singular-value decomposition (SVD) for the purpose of reducing the dimensions of the document space and bringing together semantically related documents. But the computation time of SVD is a bottleneck because of the higher dimensions of documents. In this paper, a novel method of dimension reduction for improving LSI is provided. A term-to-concept projection matrix based on domain ontology was created in this method. This way documents were projected to lower dimensional concept space by the projection matrix. LSI pre-computation was performed not on the original term by document matrix, but on the lower dimensional concept by document matrix at great computational savings. Experiments indicate that this method improves the efficiency of LSI. And the similarity judgment between documents is not disturbed.
Latent Semantic Indexing LSI dimension reduction domain ontology
Jingmin HAO Lejian LIAO Xiujie DONG
Beijing Laboratory of Intelligent Information Technology,School of Computer Science,Beijing Institute of Technology Beijing 100081,PRC
国际会议
北京
英文
2008-10-19(万方平台首次上网日期,不代表论文的发表时间)