1. Providing input sequences
The sequences should be in FASTA
format. You can submit either one sequence (more details on section 1.1) or an alignment of sequences (more details on section 1.2).
Please pay attention to the following requirements:
1) All sequence headers/names MUST be different and contain only alpha-numeric characters (the letters [a..z
], the numbers [0..9
], and underscore "_
"). Any other characters, even spaces, will be converted to under
scores. It is very important not to start the header name with a character that will be converted to an underscore. Sequence headers will be limited to
2) All the input sequences must be in one-letter amino acid code : A C D E F G H I K L M N P Q R S T U V W Y -
3) Gaps should be represented only by "-".
Paste an alignment or single sequence in
format into the upper box of the main server page.
Select a FASTA
file on your local disk, either by typing the file name into the lower window
or by browsing the disk.
We have set a limit of 1000 sequences and a length limit of 4000 residues.
You can press "Submit" at this point to run the query using default parameters.
1.1. Submitting a single sequence
The input can be a single sequence in FASTA format, like this one:
When only one sequence is submitted, an automatic alignment will be generated. The query protein sequence is
compared to UniProt+TrEMBL via BLASTP and all matching sequences over a minimum of X % of the protein length will be
tested for compatibility. X is definable by the user in the front page, under "options for alignment generation". All compatible homologues will be aligned using either MAFFT (default), MUSCLE or ClustalW. These parameters have default values of 50% (both
of them). If your alignment does not seem satisfactory, you can try to run it again with other values. For instance, if your alignment is too conserved, you can try to reduce the value of Y (the second parameter on the page); if it becomes too divergent,
increase it. Similarly, if your original sequence is larger than the protein you are interested, you should decrease X, while if you start having problems with proteins that do not belong in the alignment appearing on it, by matching only particular doma
ins, you should increase it.
For the best results, we advise you to generate your own carefully manually curated alignment. Alternatively, you can run InterMap3D on a single sequence to create the alignment as described above, then inspect the sequences carefully and remove any you
1.2. Submitting a sequence alignment
As an alternative a sequence alignment in FASTA format can be submitted:
If this is the case, only one sequence will be used to search for a 3D structure. If the user wants to define which one, he/she should mark it by adding the underscore character, "_", in the beginning of the fasta format sequence header. For example: (>_
AC0023). The first sequence will be used by default. Keep in mind that any disallowed characters, including spaces, are automatically transformed to underscores, so it is very important to start all other sequences by numerical or alphabetic characters.
Please note that the alignment will be parsed by MaxAlign in order to maximize the alignment area by selecting an optimal subset of sequences.
InterMap3D offers options for the alignment generation, for coevolution analysis and for results visualization. The default values on these options will satisfy most users, but for some it might be important to change them.
2.1 Options on alignment generation
This set of options only applies when the user submits a single sequence.
Please read the section 1.1 on this page for more information.
The option refers to the minimum lenght of the protein to be matched. When looking for homologues, only protein whose homology to the query protein streches over a region larger than the value inserted in this option (measured in % of the query protein le
ngth) are included in the alignment. The default value is 50%. A value of, for instance, 80%, would mean that only proteins that were homologous to the user's protein in, at least, 80% of the user's protein length, would be included
2.2 Options on coevolution analysis
The coevolution analysis in InterMap3D can be done by three different methods (RCW MI, MI/Entropy, DEPENDENCY) or by the intersection of the results of any of these. The results are presented as an ordered list, with the pairs of sites with highest scores
appearing top of the list. The length of the list can be specified by the user. It is the first option under "Options on coevolution analysis"
The user can also choose which method it wants to use for performing the coevolution analysis. The default is RCW MI. Checking any method alone
will cause that method to perform the analysis. Checking several
methods will make them both perf
orm the analysis, and provide the user with the intersection of results only. The intersection of methods works as follows: Each method will output a large list of results (not presented to the user) and these lists will be intersected. The score of the i
ntersection is the average of the ranks of the hits in the original, single-method, result lists.
The final option on coevolution analysis refers only to the instances in which RCW MI is used. It is a algorithmic detail and only very rarely will it impact the results of any user. Nevertheless, it can be tuned if the user so desires.
RCW MI calculates the Mutual Information of all sites against all others, and then normalizes the Mutual Information between a pair of interest by the average background Mutual Information of the two sites in question. To avoid including real interactions
into the background, the average background Mutual Information of each site is its average Mutual Information (across all other sites) except the N highest Mutual Information values (the rationale being that these top hits are probably resulting from rea
l interactions). The user can specify the value of N+1 in the last checkbox of the coevolution analysis options. That is, if the user inserts the value "2" in the box, that means that it expects the occurence of coevolving pairs but not triplets in the da
taset. That will mean that N will then be 1, making that only the top hit of each particular site is not included in the background MI. This does not prevent, however, triplets from occuring in the results. These result from site A interacting with site B
, site B interacting with site C and site C with site A.
2.3 Options on visualization of results
In this set of options, the user can choose the layout of the results. The results comprise a figure of the 3D structure of the protein with the depiction of the pairs of predicted coevolving sites. The user can choose not to obtain this picture, obtainin
g it with a black background, or with a white background. While the black background is good for screen viewing, for printing most users will prefer the white background.
The other option concerns the numbering of residues to be followed. In the results there is an alignment (from which the coevolutionary analysis was performed) and a single protein (whose homologue is depicted in the figure). The numbering of the two does
not have to be the same. Therefore can the user choose which of the two numbering schemes is preferable.
For visualizing other properties of positions in your alignment in the 3D structure, you might want to try the server FeatureMap3D, also at the CBS home page.
3. Submit the job
Click on the "Submit"
button. The status of your job (either 'queued'
or 'running') will be displayed and constantly updated until completion, when your results will appear in the browser window.
At any time during job processing you may enter your e-mail address and close
the window. Your job will continue; you will be notified by e-mail when it has
terminated. The e-mail message will contain the URL under which the results are
stored; they will remain on the server for 24 hours for you to collect them.