|Author||Ramadan, Banda ♦ Christen, Peter ♦ Liang, Huizhi ♦ Gayler, Ross W.|
|Source||ACM Digital Library|
|Publisher||Association for Computing Machinery (ACM)|
|Subject Domain (in DDC)||Computer science, information & general works ♦ Data processing & computer science|
|Subject Keyword||Braided tree ♦ Data matching ♦ Dynamic indexing ♦ Real-time query ♦ Record linkage|
|Abstract||Real-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been successfully used for ER of large static databases. However, because it is based on static sorted arrays and is designed for batch ER that resolves all records in a database rather than resolving those relating to a single query record, this technique is not suitable for real-time ER on dynamic databases that are constantly updated. We propose a tree-based technique that facilitates dynamic indexing based on the sorted neighborhood method, which can be used for real-time ER, and investigate both static and adaptive window approaches. We propose an approach to reduce query matching times by precalculating the similarities between attribute values stored in neighboring tree nodes. We also propose a multitree solution where different sorting keys are used to reduce the effects of errors and variations in attribute values on matching quality by building several distinct index trees. We experimentally evaluate our proposed techniques on large real datasets, as well as on synthetic data with different data quality characteristics. Our results show that as the index grows, no appreciable increase occurs in both record insertion and query times, and that using multiple trees gives noticeable improvements on matching quality with only a small increase in query time. Compared to earlier indexing techniques for real-time ER, our approach achieves significantly reduced indexing and query matching times while maintaining high matching accuracy.|
|Age Range||18 to 22 years ♦ above 22 year|
|Education Level||UG and PG|
|Learning Resource Type||Article|
|Publisher Place||New York|
|Journal||Journal of Data and Information Quality (JDIQ)|
Ministry of Human Resource Development (MHRD) under its National Mission on Education through Information and Communication Technology (NMEICT) has initiated the National Digital Library of India (NDLI) project to develop a framework of virtual repository of learning resources with a single-window search facility. Filtered and federated searching is employed to facilitate focused searching so that learners can find out the right resource with least effort and in minimum time. NDLI is designed to hold content of any language and provides interface support for leading vernacular languages, (currently Hindi, Bengali and several other languages are available). It is designed to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is being developed to help students to prepare for entrance and competitive examinations, to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is being developed at Indian Institute of Technology Kharagpur.
NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.
Ministry of Human Resource Development (MHRD), through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.
For any issue or feedback, please write to email@example.com