DQFIRD: Towards Data Quality-based Filtering and Ranking of Datasets for Data Portals
The Data on the Web Best Practices Working Group, as part of W3C Data Activity, is standardizing the Data Quality Vocabulary (DQV) for expressing data quality of datasets published on the Web.By exploiting such DQV-based quality metadata associated to the datasets in a data portal, data consumers can achieve data quality-based filtering and ranking of datasets on the portals conventional search results to obtain desired datasets with high data-quality.Despite the significant progress in standardization, there is a lack of systematic research on approaches and tools for data quality-based filtering and ranking of Web published datasets.This paper therefore proposes a generic software framework for Data Quality-based Filtering and Ranking of Datasets (DQFIRD) in data portals.DQFIRD adopts faceted search (or faceted exploration) techniques to filter the search results of a data portal based on quality metadata about the resulting datasets, and then ranks the filtered datasets according to numeric values of quality measurements in the metadata.We designed the main algorithms of DQFIRD and implemented a prototype of DQFIRD using Java and Jena API.Furthermore, we used the prototype to conduct case study experiments and time efficiency test on the Faceted Taxonomy Materialization (FTM) algorithm, the most time-consuming online operation algorithm in DQFIRD.The results indicate that the proposed DQFIRD approach is implementable and effective, and it has low time complexity because the run-time of the FTM algorithm exhibits approximately a linear growth rate as the size of the relevant dataset quality metadata increases.
data quality-based filtering and ranking datasets faceted search Data Quality Vocabulary (DQV) quality metadata data portal
Wenze Xia Zhuoming Xu Jie Wei Haimei Tian
College of Computer and Information Hohai University Nanjing, 210098, China
国际会议
武汉
英文
18-23
2016-09-23(万方平台首次上网日期,不代表论文的发表时间)