World Wide Web Prediction Server
Center for Biological Sequence Analysis
As an example of a large-scale application of the finished method, we used the Haemophilus influenzae Rd genome - the first genome of a free living organism to be completed. The first 60 positions of each sequence were analyzed with networks trained on the Gram-negative data. The distribution of mean S-score (from position 1 to the position with maximal Y-score) is shown above.
From this analysis, we estimate that 330 out of the 1680 (20%) H. influenzae proteins are secretory, with cleavable signal peptides.
However, when taking the maximal Y-score (combined cleavage site score) into account, the estimate drops to approximately 15%. Some of the 330 sequences may be signal-anchor-like sequences of type II (single-spanning) or type IV (multi-spanning) membrane proteins. Our estimate is thus that 15-20% of the H. influenzae proteins are secretory.
The sequences (translated predicted coding regions) were downloaded from The H. influenzae Rd Genome Database at The Institute for Genomic Research.