Access Restriction

Author Fan, Xiaoming ♦ Wang, Jianyong ♦ Pu, Xu ♦ Zhou, Lizhu ♦ Lv, Bing
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2011
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Name disambiguation ♦ Clustering ♦ Graph ♦ Similarity
Abstract Name ambiguity stems from the fact that many people or objects share identical names in the real world. Such name ambiguity decreases the performance of document retrieval, Web search, information integration, and may cause confusion in other applications. Due to the same name spellings and lack of information, it is a nontrivial task to distinguish them accurately. In this article, we focus on investigating the problem in digital libraries to distinguish publications written by authors with identical names. We present an effective framework named GHOST (abbreviation for GrapHical framewOrk for name diSambiguaTion), to solve the problem systematically. We devise a novel similarity metric, and utilize only one type of attribute (i.e., coauthorship) in GHOST. Given the similarity matrix, intermediate results are grouped into clusters with a recently introduced powerful clustering algorithm called Affinity Propagation. In addition, as a complementary technique, user feedback can be used to enhance the performance. We evaluated the framework on the real DBLP and PubMed datasets, and the experimental results show that GHOST can achieve both high $\textit{precision}$ and $\textit{recall}.$
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2011-02-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 2
Issue Number 2
Page Count 23
Starting Page 1
Ending Page 23

Open content in new tab

   Open content in new tab
Source: ACM Digital Library