FORMAT OF NetGene2 PREDICTION OUTPUT
The prediction output for both server and mailserver consist of the
prediction for both direct (+) and complentary (-) strand. The
output lists the predictions for donor and acceptor sites in the
submitted sequence, as well as branchpoint predictions (for A.
||The position of the splice site in your sequence given as first
(donor), or last (acceptor) nucleotide in the intron. The
numbering of the direct (+) strand proceeds from the 5'
end to the 3' end. For the complement (-) strand the
numbering is given in both directions.
||The predicted frame offset (1,2 or 3) of the acceptor/donor site.
|| The sequence strand (direct or complement).
||The level of confidence for the sites (relative to the
cutoff used to find nearly all true sites). Sites found
by using cutoff values for highly confident sites are
marked by the symbol H.
||Gives 20 bases of sequence around the predicted site.
|| The predicted branchpoint for an acceptor site (for A.
thaliana only) acceptor site and branchpoint site, as
well as a window of around the predicted site
Please observe that the lists contain predictions made by TWO
detection levels for true sites, one level where around 50% of the
true sites are detected with very few false positive, and another
level where nearly all true sites are found, but with more false
predictions as well. Sites indicated by (H) are highly confident, and
represent very seldom a false positive prediction, while those
comprising nearly all sites are not marked. The confindence values for
the predictions can be compared within each type only. This means that
confidence values not marked by (H) in some cases can be larger than
those for the (H) marked sites.
FORMAT OF NetGene2 GRAPHICS OUTPUT
The output from the prediction is displayed in the output page of
the prediction server.
The postscript files can be retrieved directly by Netscape by
selecting one of the two references in the bottom of the prediction
output. If your viewer is set up to handle postscript, it will
display the graphs. Otherwise you can retrieve the compressed postscript
files directly to your computer using Netscape.
The top part of the figure designated "Coding" is the activity of an
ensemble of coding predicting networks, values close to 0.0 indicate
intron region, while values close to 1.0 indicates exon. In the
"Donor" panel the activity of the ensemble of the donor site
predicting networks is shown as impulses. An impulse with a hight
close to 1.0 indicates a strong A. thaliana donor site. A cyan
impulse is a prediction that has been discarded during the refinement,
and a magenta colored impulse is a prediction that has been changed by
the rule based system. The variable threshold computed from the coding
predicting ensemble output, is used to select donor and acceptor site
FORMAT OF NetGene2 PREDICTION SCORE FILES
The predictions in a numerical form may be downloaded from
the output page of the prediction server.
They are useful for detailed analysis of a
sequence. The file produced contains
a line starting with the symbol
`>' followed by the name and the length of the
sequence. This is followed by twelve columns, with the
following information given by column number below.
- Position in the sequence numbered from 1 to the length of the sequence.
- Nucleotides of the sequence.
- Neural network donor site score.
- Neural network acceptor site score.
- Neural network coding score.
- Neural network frame score.
- 90% sensitivity level cutoff value for donor site predictions.
- 90% sensitivity level cutoff value for acceptor site predictions.
- Confidence of the donor site prediction.
- Confidence of the acceptor site prediction.
- HMM acceptor site branchpoint score.
- Branchpoint position.