会议专题

DQFIRD: Towards Data Quality-based Filtering and Ranking of Datasets for Data Portals

  The Data on the Web Best Practices Working Group, as part of W3C Data Activity, is standardizing the Data Quality Vocabulary (DQV) for expressing data quality of datasets published on the Web.By exploiting such DQV-based quality metadata associated to the datasets in a data portal, data consumers can achieve data quality-based filtering and ranking of datasets on the portals conventional search results to obtain desired datasets with high data-quality.Despite the significant progress in standardization, there is a lack of systematic research on approaches and tools for data quality-based filtering and ranking of Web published datasets.This paper therefore proposes a generic software framework for Data Quality-based Filtering and Ranking of Datasets (DQFIRD) in data portals.DQFIRD adopts faceted search (or faceted exploration) techniques to filter the search results of a data portal based on quality metadata about the resulting datasets, and then ranks the filtered datasets according to numeric values of quality measurements in the metadata.We designed the main algorithms of DQFIRD and implemented a prototype of DQFIRD using Java and Jena API.Furthermore, we used the prototype to conduct case study experiments and time efficiency test on the Faceted Taxonomy Materialization (FTM) algorithm, the most time-consuming online operation algorithm in DQFIRD.The results indicate that the proposed DQFIRD approach is implementable and effective, and it has low time complexity because the run-time of the FTM algorithm exhibits approximately a linear growth rate as the size of the relevant dataset quality metadata increases.

data quality-based filtering and ranking datasets faceted search Data Quality Vocabulary (DQV) quality metadata data portal

Wenze Xia Zhuoming Xu Jie Wei Haimei Tian

College of Computer and Information Hohai University Nanjing, 210098, China

国际会议

The 13th Web Information Systems and Applications Conference(第十三届全国web信息系统及其应用学术会议)(WISA2016)、The 1st Symposium on Big Data Processing and Analysis)( BDPA 2016)第一届全国大数据处理与分析学术研讨会、The 1st Workshop on Information System Security)(ISS2016)(第一届全国信息系统安全研讨会

武汉

英文

18-23

2016-09-23(万方平台首次上网日期,不代表论文的发表时间)