Neural network prediction of translation initiation sites
in eukaryotes: perspectives for EST and genome analysis.
A. G. Pedersen and H. Nielsen.
ISMB: 5, 226-233 1997.
The complete article in
Translation in eukaryotes does not always start at the first
AUG in an mRNA, implying that context information also plays a role.
This makes prediction of translation initiation sites a non-trivial
task, especially when analysing EST and genome data where the entire
mature mRNA sequence is not known. In this paper, we employ artificial
neural networks to predict which AUG triplet in an mRNA sequence is the
start codon. The trained networks correctly classified 88 % of
Arabidopsis and 85 % of vertebrate AUG triplets. We find that our
trained neural networks use a combination of local start codon context
and global sequence information. Furthermore, analysis of false
predictions shows that AUGs in frame with the actual start codon are
more frequently selected than out-of-frame AUGs, suggesting that our
networks use reading frame detection. A number of conflicts between
neural network predictions and database annotations are analysed in
detail, leading to identification of possible database errors.
Anders Gorm Pedersen,