Thumbnail
Access Restriction
Subscribed

Author Tari, L. ♦ Phan Huy Tu ♦ Hakenberg, J. ♦ Yi Chen ♦ Son, T.C. ♦ Gonzalez, G. ♦ Baral, C.
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2010
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data mining ♦ Computer science ♦ Text processing ♦ Tagging ♦ Database languages ♦ Biomedical engineering ♦ Data engineering ♦ Biomedical informatics ♦ Pipelines ♦ Proposals
Abstract Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.
Description Author affiliation: Department of Biomedical Informatics, Arizona State University, Phoenix, 85004, USA (Gonzalez, G.) || Department of Computer Science, New Mexico State University, Las Cruces, 88003, USA (Son, T.C.) || Department of Computer Science and Engineering, Arizona State University, Tempe, 85287, USA (Tari, L.; Phan Huy Tu; Hakenberg, J.; Yi Chen; Baral, C.)
ISBN 9781424454457
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2010-03-01
Publisher Place USA
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
e-ISBN 9781424454464
Size (in Bytes) 191.20 kB
Page Count 4
Starting Page 1121
Ending Page 1124


Source: IEEE Xplore Digital Library