Thumbnail
Access Restriction
Subscribed

Author Chidlovskii, B.
Sponsorship IEEE Comput. Soc. ♦ Inf. Technol. Res. Inst. ♦ Wright State Univ
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2002
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Special computer methods
Subject Keyword Data mining ♦ Transducers ♦ Application software ♦ Maintenance ♦ HTML ♦ Europe ♦ Electronic switching systems ♦ Web sites ♦ Information resources ♦ Humans
Abstract We address the problem of automatic maintenance of Web wrappers used in data integration systems to encapsulate an access to Web information providers. The maintenance of Web wrappers is critical as providers often changes the page format and/or structure making wrappers inoperable. The solution we propose extends the conventional wrapper architecture with a novel component of automatic maintenance and recovery. We consider the automatic recovery as special type of the classification problem and use ensemble methods of machine learning to build alternative views of provider pages. We combine extraction rules of conventional wrappers with content features of extracted information to accurate recovery from three types of format changes, namely, content, context and structural changes. We report results of the recovery performance for format changes at widely used Web providers.
Description Author affiliation: Xerox Res. Centre Eur., Meylan, France (Chidlovskii, B.)
ISBN 0769518494
ISSN 10823409
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2002-11-04
Publisher Place USA
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 263.59 kB
Page Count 8
Starting Page 399
Ending Page 406


Source: IEEE Xplore Digital Library