Thumbnail
Access Restriction
Open

Author Zaka, Bilal
Source CiteSeerX
Content type Text
File Format PDF
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Abstract The measurement of similarity between different objects is the fundamental function of any information retrieval, management, or data mining application. There are a number of ways to compute similarity or dissimilarity among various object representations. In a simplified classification these can be categorized as distance based, geometric, structural, feature and knowledge based techniques. These techniques are used depending on the characteristics of data and scope of application. Examples of information systems that are making collective use of similarity measure of different types are few and far between. A lot remains to be done for the realization of the semantics-aware system. The majority of the current systems in this realm exploit only pattern discovery techniques based on basic similarity measures. This dissertation explores the architecture and design of a framework that supports more elaborate and enhanced systems. A number of case studies have been included in the research to demonstrate the various aspects of this framework. This dissertation provides an introduction to different types of similarity detection techniques, possible enhancements and their applications in various public and corporate environments. The primary objective of this work is to help improve conventional information management and retrieval tools by adding to them the practical elements of semantic and distributed processing. The initial part of the dissertation describes the broader categories of similarity detection and commonly used techniques along with an introduction to the data processing approach. The following parts of the dissertation cover the case studies to exemplify the extended use and applications of these techniques. These parts shed light on the use of enhanced similarity detection approaches in the plagiarism detection and IPR areas. The work suggests a commonly accessible platform that allows the use and integration of different similarity detection services. The findings of the research are presented in the form of a successful implementation of a collaborative plagiarism detection and prevention network. In the second set of experiments, a successful integration of such services is described to aid personalized content delivery. It illustrates the use of similarity detection in semantic media adaptation and user interest profiling. Finally the work covers the applications of similarity measurement techniques for content organization, re-usability and objects de-duplication in heterogeneous data collections.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Thesis
Publisher Date 2009-01-01