Thumbnail
Access Restriction
Subscribed

Author Cadot, Martine ♦ di Martino, Joseph
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Text mining ♦ Data cleaning ♦ Perl scripts ♦ Latex files ♦ Statistical optimization
Abstract In this paper, we present our solution for the KDD CUP 2003 task 2 competition. Our approach is based on a data cleaning methodology using Perl scripts. These scripts contain regular expression for automatically extracting relevant information from the 35472 LaTeX texts. These expressions were optimized by statistical investigations on the texts. Our solution has permitted us to obtain 144,087 associations.
Description Affiliation: LORIA, Vandoeuvre-les-Nancy, France (Cadot, Martine; di Martino, Joseph)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2000-06-01
Publisher Place New York
Journal ACM SIGKDD Explorations Newsletter (SKDD)
Volume Number 5
Issue Number 2
Page Count 2
Starting Page 158
Ending Page 159


Open content in new tab

   Open content in new tab
Source: ACM Digital Library