Thumbnail
Access Restriction
Subscribed

Author Yakout, Mohamed ♦ Atallah, Mikhail J ♦ Elmagarmid, Ahmed
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2012
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Record linkage ♦ Integration ♦ Linkage ♦ Privacy ♦ Private information retrieval ♦ Private linkage ♦ Secure scalar product
Abstract Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims.
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2012-08-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 3
Issue Number 3
Page Count 28
Starting Page 1
Ending Page 28


Open content in new tab

   Open content in new tab
Source: ACM Digital Library