Abstract
Splice site prediction in Arabidopsis thaliana pre-mRNA by
combining local and global sequence information.
S.M. Hebsgaard, P.G. Korning, N. Tolstrup, J. Engelbrecht, P. Rouze and S. Brunak,
Nucleic Acids Research, 1996, Vol. 24, No. 17, 3439-3452.
ABSTRACT
Artificial neural networks have been combined with a
rule based system to predict intron splice sites in the dicot plant
Arabidopsis thaliana. A two step prediction scheme, where a
global prediction of the coding potential regulates a cutoff level for
a local prediction of splice sites, is refined by rules based on splice
site confidence values, prediction scores, coding context, and
distances between potential splice sites. In this approach, the
prediction of splice sites mutually affect each other in a non-local
manner. The combined approach drastically reduces the large amount of
false positive splice sites normally haunting splice site prediction.
An analysis of the errors made by the networks in the first step of the
method revealed a previously unknown feature, a frequent T-tract
prolongation containing cryptic acceptor sites in the 5' end of exons.
The method presented here has been compared to three other approaches,
GeneFinder, GeneMark, and Grail. Overall the method presented here is
an order of magnitude better. We show that the new method is able to
find a donor site in the coding sequence for the jelly fish Green
Fluorescent Protein, exactly at the position that was experimentally
observed in thaliana transformants. Predictions for
alternatively spliced genes are also presented, together with examples
of genes from other dicots, monocots, and algae. The method has been
made available through electronic mail (
NetPlantGene@cbs.dtu.dk), or
the WWW at http://www.cbs.dtu.dk/NetPlantGene.html
Keywords: Arabidopsis thaliana; splice site prediction;
splice site pairing; plant biotechnology; neural networks; rule based
systems.
Prediction of Human mRNA Donor and Acceptor Sites from
the DNA Sequence.
Brunak, S., Engelbrecht, J., and Knudsen, S.,
Journal of Molecular
Biology, 1991, 220, 49-65.
ABSTRACT
Artificial neural networks have been applied to the prediction of
splice site location in human pre-mRNA. A joint prediction scheme
where prediction of transition regions between introns and exons
regulates a cutoff level for splice site assignment was able to
predict splice site locations with confidence levels far better than
previously reported in the literature. The problem of predicting
donor and acceptor sites in human genes is hampered by the presence
of numerous amounts of false positives - in the paper the
distribution of these false splice sites is examined and linked to a
possible scenario for the splicing mechanism in vivo. When the
presented method detects 95% of the true donor and acceptor sites it
makes less than 0.1% false donor site assignments and less than 0.4%
false acceptor site assignments. For the large data set used in this
study this means that on the average there are one and a half false
donor sites per true donor site and six false acceptor sites per true
acceptor site. With the joint assignment method more than a fifth of
the true donor sites and around one fourth of the true acceptor sites
could be detected without accompaniment of any false positive
predictions. Highly confident splice sites could not be isolated
with a widely used weight matrix method or by separate splice site
networks. A complementary relation between the confidence levels of
the coding/non-coding and the separate splice site networks was
observed, with many weak splice sites having sharp transitions in the
coding/non-coding signal and many stronger splice sites having more
ill-defined transitions between coding and non-coding.
CORRESPONDENCE
Søren Brunak,
|