Thumbnail
Access Restriction
Subscribed

Author Nuray-Turan, Rabia ♦ Kalashnikov, Dmitri V ♦ Mehrotra, Sharad
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2013
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Entity resolution ♦ Connection strength ♦ Graph-based disambiguation ♦ Lookup ♦ Reference disambiguation ♦ Relationship analysis
Abstract Entity Resolution (ER) is a data quality challenge that deals with ambiguous references in data and whose task is to identify all references that co-refer. Due to practical significance of the ER problem, many creative ER techniques have been proposed in the past, including those that analyze relationships that exist among entities in data. Such approaches view the database as an entity-relationship graph, where direct and indirect relationships correspond to paths in the graph. These techniques rely on measuring the connection strength among various nodes in the graph by using a connection strength (CS) model. While such approaches have demonstrated significant advantage over traditional ER techniques, currently they also have a significant limitation: the CS models that they use are intuition-based fixed models that tend to behave well in general, but are very generic and not tuned to a specific domain, leading to suboptimal result quality. Hence, in this article we propose an approach that employs supervised learning to adapt the connection strength measure to the given domain using the available past/training data. The adaptive approach has several advantages: it increases both the quality and efficiency of ER and it also minimizes the domain analyst participation needed to tune the CS model to the given domain. The extensive empirical evaluation demonstrates that the proposed approach reaches up to 8% higher accuracy than the graph-based ER methods that use fixed and intuition-based CS models.
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2013-03-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 4
Issue Number 2
Page Count 22
Starting Page 1
Ending Page 22


Open content in new tab

   Open content in new tab
Source: ACM Digital Library