Thumbnail
Access Restriction
Open

Author Deiana, Antonio ♦ Giansanti, Andrea
Source arXiv.org
Content type Text
File Format PDF
Date of Submission 2008-06-30
Language English
Subject Domain (in DDC) Natural sciences & mathematics ♦ Life sciences; biology
Subject Keyword Quantitative Biology - Biomolecules ♦ q-bio
Abstract The performance of single folding predictors and combination scores is critically evaluated. We test mean packing, mean pairwise energy and the new index gVSL2 on a dataset of 743 folded proteins and 81 natively unfolded proteins. These predictors have an individual performance comparable or even better than other proposed methods. We introduce here a strictly unanimous score S_{SU} that combines them but leaves undecided those sequences differently classified by two single predictors. The performance of the single predictors on a dataset purged from the proteins left unclassified by S_{SU}, significantly increases, indicating that unclassified proteins are mainly false predictions. Amino acid composition is the main determinant considered by these predictors, therefore unclassified proteins have a composition compatible with both folded and unfolded status. This is why purging a dataset from these ambiguous proteins increases the performance of single predictors. The percentage of proteins predicted as natively unfolded by S_{SU} in the three kingdoms are: 4.1% for Bacteria, 1.0% for Archaea and 20.0% for Eukarya; compatible with previous determinations. Evidence is given of a scaling law relating the number of natively unfolded proteins with the total number of proteins in a genome; a first estimate of the critical exponent is 1.95 +- 0.21
Educational Use Research
Learning Resource Type Article


Open content in new tab

   Open content in new tab