Access Restriction

Author Oro, E. ♦ Ruffolo, M.
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2008
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Special computer methods
Subject Keyword Ontologies ♦ Data mining ♦ Competitive intelligence ♦ Pattern recognition ♦ Artificial intelligence ♦ Intelligent structures ♦ Encoding ♦ Visualization ♦ Wrapping ♦ HTML ♦ ontology ♦ Information Extraction ♦ PDF format ♦ Knowledge representation and reasoning ♦ attribute grammars
Abstract Information extraction is of paramount importance in several real world applications in the areas of business intelligence, competitive and military intelligence. Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF unstructured documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.
Description Author affiliation: DEIS, Univ. of Calabria, Rende (Oro, E.) || ICAR-CNR, Univ. of Calabria, Rende (Ruffolo, M.)
ISBN 9780769534404
ISSN 10823409
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2008-11-03
Publisher Place USA
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 555.74 kB
Page Count 8
Starting Page 118
Ending Page 125

Source: IEEE Xplore Digital Library