Access Restriction

Author Bleiholder, Jens ♦ Naumann, Felix
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2008
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data cleansing ♦ Data conflicts ♦ Data consolidation ♦ Data integration ♦ Data merging ♦ Data quality
Abstract The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation. This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.
ISSN 03600300
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2009-01-01
Publisher Place New York
e-ISSN 15577341
Journal ACM Computing Surveys (CSUR)
Volume Number 41
Issue Number 1
Page Count 41
Starting Page 1
Ending Page 41

Open content in new tab

   Open content in new tab
Source: ACM Digital Library