TMHMM1.0 User's guide

This server is for prediction of transmembrane helices in proteins. The method is described in

Erik L.L. Sonnhammer, Gunnar von Heijne, and Anders Krogh:
A hidden Markov model for predicting transmembrane helices in protein sequences.
In Proc. of Sixth Int. Conf. on Intelligent Systems for Molecular Biology, p 175-182
Ed J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen
Menlo Park, CA: AAAI Press, 1998

Download compressed postscript file

Download pdf file

Please cite.

Other material (model, training data, etc) can be found here .
 
 

Input

The program takes proteins in FASTA format on standard input.  It  recognizes the 20 amino acids and B, Z, and X, which are all treated equally as unknown.

This is an example (one protein):

>5H2A_CRIGR you can have comments after the ID
MEILCEDNTSLSSIPNSLMQVDGDSGLYRNDFNSRDANSSDASNWTIDGENRTNLSFEGYLPPTCLSILHL
QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWP
LPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGVSMPIPVF
GLQDDSKVFKQGSCLLADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEATLCVSDLSTRAKLASFSFL
PQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNE
HVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILVNTIPALAYKSSQLQA
GQNKDSKEDAEPTDNDCSMVTLGKQQSEETCTDNINTVNEKVSCV

 

How to run it

Either give the name of the local file in which you have the proteins in the top half of the window, or paste the sequence into the lower part of the window.  Then press `Submit'.
 
 

Output

The server gives a list of the location of the predicted transmembrane helices and the predicted location of the intervening loop regions.

Here is an example:

# ID 5H2A_CRIGR
# Length: 471
# Log-odds: 37.647490 bits

5H2A_CRIGR      TMHMM1.0        outside     1      78
5H2A_CRIGR      TMHMM1.0        TMhelix    79     101
5H2A_CRIGR      TMHMM1.0        inside    102     107
5H2A_CRIGR      TMHMM1.0        TMhelix   108     130
5H2A_CRIGR      TMHMM1.0        outside   131     148
5H2A_CRIGR      TMHMM1.0        TMhelix   149     171
5H2A_CRIGR      TMHMM1.0        inside    172     192
5H2A_CRIGR      TMHMM1.0        TMhelix   193     215
5H2A_CRIGR      TMHMM1.0        outside   216     233
5H2A_CRIGR      TMHMM1.0        TMhelix   234     256
5H2A_CRIGR      TMHMM1.0        inside    257     325
5H2A_CRIGR      TMHMM1.0        TMhelix   326     348
5H2A_CRIGR      TMHMM1.0        outside   349     356
5H2A_CRIGR      TMHMM1.0        TMhelix   357     379
5H2A_CRIGR      TMHMM1.0        inside    380     471

If the whole sequence is labeled as inside or outside, the prediction  is that it contains no membrane
helices.  It is probably not wise to interpret it as a prediction of location.

The prediction gives the most probable location and orientation of transmembrane helices in the sequence. It is found by an algorithm called N-best (or 1-best in this case) that sums over all paths through the model with the same location and direction of the helices.
 
The log-odds score of the predicted structure with respect to the null model is given (in bits).  The higher it is the more confident is the prediction (there has been no systematic studies on this).
 
 

Alternative output format

If you press on the sequence id, you'll get the prediction in this format:

>5H2A_CRIGR
   MEILCEDNTSLSSIPNSLMQVDGDSGLYRNDFNSRDANSSDASNWTIDGENRTNLSFEGYLPPTCLSILHLQ
?0 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

   EKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWPLP
?0 ooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiMMMMMMMMMMMMMMMMMMMMMMMoooooooooooooo

   SKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGVSMPIPVFGLQ
?0 ooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMo

   DDSKVFKQGSCLLADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEATLCVSDLSTRAKLASFSFLPQSS
?0 oooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

   LSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNEHVIGA
?0 iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMMMMMMMMMMMMMMMMMMMMMMMooooooooMMMM

   LLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILVNTIPALAYKSSQLQAGQNKDS
?0 MMMMMMMMMMMMMMMMMMMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

   KEDAEPTDNDCSMVTLGKQQSEETCTDNINTVNEKVSCV
?0 iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
 

Predictions are:

i:    Inside (cytoplasmic).
M: Transmembrane helix.
O:  Outside or non-cytoplasmic for long outside loops (typically longer than 100).
o:   Outside short loop region.

Usually you will not have to distinguish between long and short.  See paper for more details.
 
 

Plot of probabilities

The plots show the posterior probabilities of inside/outside/TM helix.   Here one can see possible weak TM helices that were not predicted,  and one can get an idea of the certainty of each segment in the
prediction.

The plot is obtained by calculating the total probability that a  residue sits in helix, inside, or outside summed over all possible  paths through the model.  Sometimes it seems like the plot and the prediction are contradictory, but that is because the plot shows probabilities for each residue, whereas the prediction is the over-all most probable structure.  Therefore the plot should be seen as a complementary source of information.
 
Below the plot there are links to

 
 

Final remarks

Predicted TM segments in the n-terminal region sometime turn out to be signal peptides.

One of the most common mistakes by the program is to reverse the direction of proteins with one TM segment.

It is possible that the log-odds score can be used to distinguish between TM proteins and other proteins, but it is not obvious, and we have not looked into it yet.

Do not use the program to predict whether a non-membrane protein is cytoplasmic or not.