会议专题

COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations Detection

Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, namely, entropy correlation coefficients, other than chi-square test to detect correlations among columns in large datasets. We introduce a novel yet simple kind of multi-dimensional synopses named COCA-Hist to cope with different correlations in databases. With the aid of the precise metric of entropy correlation coefficients, correlations of various degrees can be detected effectively; when correlation coefficients testify to mutual independence among columns, the AVI (attribute value independence) assumption can be adopted undoubtedly. COCA can also serve as a data-mining tool with superior qualities as CORDS does. We demonstrate the effectiveness and accuracy of our approach by several experiments.

CAO Wei QIN Xiongpai WANG Shan

Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China),MOE,Beijing 100872,P.R.China;School of Information,Renmin University of China,Beijing,100872,P.R.China

国际会议

The Ninth International Conference on Web-Age Information Management(第九届web时代信息管理国际会议)(WAIM 2008)

张家界

英文

2008-07-20(万方平台首次上网日期,不代表论文的发表时间)