会议专题

Themes Discovery with a Generalized Dictionary Model

Discovery of patterns and functional modules from observed data sets is one of the most important problems in data mining and bioinformatics. In this paper, we propose an approach for discovering patterns and functional modules from categorical and text data sets.The potential patterns hidden in data sets are regarded as themes. In terms of a probabilistic model, we build a dictionary of these themes, and then we try to find these themes based on the likelihood. To evaluate the approach, we give simulation, and then we apply the approach to the traditional Chinese medicine, Chinese text mining and genome data. Compared with other approaches, the advantages of the approach proposed in this paper are that it can find smaller and weaker modules which may overlap heavily with very low false positive rate and it can present more complex relationships among variables.

dictionary model pattern identification text mining themes discovering

Ke Deng Jun S. Liu Zhi Geng Delin Liu

School of Mathematical Sciences, Pcking University, Beijing, China Department of Statistics, Harvard University, Cambridge, MA, USA China Academy of Chinese Medicine, Beijing, China

国际会议

2008年京津地区青年概率统计研讨会

北京

英文

11-37

2008-08-10(万方平台首次上网日期,不代表论文的发表时间)