Themes Discovery with a Generalized Dictionary Model
Discovery of patterns and functional modules from observed data sets is one of the most important problems in data mining and bioinformatics. In this paper, we propose an approach for discovering patterns and functional modules from categorical and text data sets.The potential patterns hidden in data sets are regarded as themes. In terms of a probabilistic model, we build a dictionary of these themes, and then we try to find these themes based on the likelihood. To evaluate the approach, we give simulation, and then we apply the approach to the traditional Chinese medicine, Chinese text mining and genome data. Compared with other approaches, the advantages of the approach proposed in this paper are that it can find smaller and weaker modules which may overlap heavily with very low false positive rate and it can present more complex relationships among variables.
dictionary model pattern identification text mining themes discovering
Ke Deng Jun S. Liu Zhi Geng Delin Liu
School of Mathematical Sciences, Pcking University, Beijing, China Department of Statistics, Harvard University, Cambridge, MA, USA China Academy of Chinese Medicine, Beijing, China
国际会议
北京
英文
11-37
2008-08-10(万方平台首次上网日期,不代表论文的发表时间)