Thumbnail
Access Restriction
Subscribed

Author Debattista, Jeremy ♦ Auer, Sren ♦ Lange, Christoph
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2016
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data quality ♦ Linked data ♦ Quality assessment
Abstract The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an $\textit{extensible}$ interface for defining new quality metrics; (2) an $\textit{interoperable},$ ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3) $\textit{scalable}$ dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a $\textit{customisable}$ ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.
Description Author Affiliation: Enterprise Information Systems, University of Bonn 8 Fraunhofer IAIS Bonn, Germany (Debattista, Jeremy; Auer, Sren; Lange, Christoph)
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2016-10-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 8
Issue Number 1
Page Count 32
Starting Page 1
Ending Page 32


Open content in new tab

   Open content in new tab
Source: ACM Digital Library