Thumbnail
Access Restriction
Subscribed

Author Shixi Chen ♦ Haixun Wang ♦ Shuigeng Zhou
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2009
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword History ♦ Web search ♦ Venture capital ♦ Scattering ♦ Supervised learning ♦ Data engineering ♦ USA Councils ♦ Training data ♦ Unsupervised learning ♦ Euclidean distance
Abstract In Web search, a user refines his search several times before he finds the information he needs. It is very likely that, in the search log, similar sequences of searches appear many times, as many users had searched the Web with the same intent. Precisely interpreting the intent of the user is difficult, even with the help of the search log: there might be numerous instances of such intent scattering in small pieces in the log, but none of them is comprehensive enough to describe the concept precisely. This scenario occurs in many applications. For example, patterns in Web search, Internet traffic, program execution traces, network events, etc., are often non-stationary, yet the same patterns recur over time. In this paper, we argue that visible patterns are generated by hidden intent or hidden concepts, and precisely characterizing such concepts is only possible if we cluster as much data generated by such concepts as possible and learn from the clustered data as a whole, instead of learning from a single episode of such concept. The benefits is obvious as it enables us not only to better understand the underlying system that generates the data, but also to recognize future instance of a concept as soon as it occurs. To achieve this, we introduce a clustering based approach, where we adopt a novel clustering criterion, validation error minimization, to ensure that the found concepts are unique and precise. We propose a two step algorithm, which uses enhanced dynamic programming and EM like methods for clustering. Experiments show that in benchmark datasets, our approach achieves the highest accuracy with lowest cost in comparison with the current best approaches.
ISBN 9781424434220
ISSN 10844627
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2009-03-29
Publisher Place China
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 229.54 kB
Page Count 4
Starting Page 1327
Ending Page 1330


Source: IEEE Xplore Digital Library