Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

How to use InterMap3D



1. Providing input sequences

The sequences should be in FASTA format. You can submit either one sequence (more details on section 1.1) or an alignment of sequences (more details on section 1.2).

Please pay attention to the following requirements:

1) All sequence headers/names MUST be different and contain only alpha-numeric characters (the letters [a..z, A..Z], the numbers [0..9], and underscore "_"). Any other characters, even spaces, will be converted to under scores. It is very important not to start the header name with a character that will be converted to an underscore. Sequence headers will be limited to 100 characters.

2) All the input sequences must be in one-letter amino acid code : A C D E F G H I K L M N P Q R S T U V W Y -

3) Gaps should be represented only by "-".

  • Paste an alignment or single sequence in FASTA format into the upper box of the main server page.

  • Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.


We have set a limit of 1000 sequences and a length limit of 4000 residues.

You can press "Submit" at this point to run the query using default parameters.

1.1. Submitting a single sequence

The input can be a single sequence in FASTA format, like this one:

	>CDK2_HUMAN
	MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIR
	EISLLKELNHPNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIP
	LPLIKSYLFQLLQGLAFCHSHRVLHRDLKPQNLLINTEGAIKLADFGLAR
	AFGVPVRTYTHEVVTLWYRAPEILLGCKYYSTAVDIWSLGCIFAEMVTRR
	ALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSFPKWARQDFSK
	VVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL

When only one sequence is submitted, an automatic alignment will be generated. The query protein sequence is compared to UniProt+TrEMBL via BLASTP and all matching sequences over a minimum of X % of the protein length will be tested for compatibility. X is definable by the user in the front page, under "options for alignment generation". All compatible homologues will be aligned using either MAFFT (default), MUSCLE or ClustalW. These parameters have default values of 50% (both of them). If your alignment does not seem satisfactory, you can try to run it again with other values. For instance, if your alignment is too conserved, you can try to reduce the value of Y (the second parameter on the page); if it becomes too divergent, increase it. Similarly, if your original sequence is larger than the protein you are interested, you should decrease X, while if you start having problems with proteins that do not belong in the alignment appearing on it, by matching only particular doma ins, you should increase it.
For the best results, we advise you to generate your own carefully manually curated alignment. Alternatively, you can run InterMap3D on a single sequence to create the alignment as described above, then inspect the sequences carefully and remove any you consider innapropriate.

1.2. Submitting a sequence alignment

As an alternative a sequence alignment in FASTA format can be submitted:

>Q8GVD8_HELTU/1-294
MEQYEKVEKIGEGTYGVVYKARDKVTNETIALKKIR------LEQEDEGVPSTAIREISL
LKEMQHGNIVRLQDVVHSDKRLYLVFEYLDLDLKKHMD-SCPEFSKDPRLVKTFLYQILR
GIAYCHSHRVLHRDLKPQNLLIDR----RTNALKLADFGLARAFGIPVRTFTHEVVTLWY
RAPEILLGSRHYSTPVDVWSVGCIFAEMV-NQRPLFPGDSEIDELFKIFRIMGTPN---E
ETWPGVTSLPDFKSAFPKWSSKD--------LATVVPN---LEKAGLDLLCKMLWLDPSK
RITARTALEHEYFKDIGFVP
>Q9AUH4_9ROSI/1-294
MDQYEKVEKIGEGTYGVVYKARDRVTNETIALKKIR------LEQEDEGVPSTAIREISL
LKEMQHGNIVRLQDVVHSEKRLYLVFEYLDLDLKKHMD-SSPEFAKDPRLVKTFLYQILR
GIAYCHSHRVLHRDLKPQNLLIDR----RTNALKLADFGLARAFGIPVRTFTHEVVTLWY
RAPEILLGSRHYSTPVDVWSVGCIFAEMV-NQKPLFPGDSEIDELFKIFRILGTPN---E
DTWPGVTSLPDFKSAFPKWPSKD--------LATVVPT---LEKAGVDLLSKMLFLDPTK
RITARSALEHEYFKDIGFVP
>(...)
If this is the case, only one sequence will be used to search for a 3D structure. If the user wants to define which one, he/she should mark it by adding the underscore character, "_", in the beginning of the fasta format sequence header. For example: (>_ AC0023). The first sequence will be used by default. Keep in mind that any disallowed characters, including spaces, are automatically transformed to underscores, so it is very important to start all other sequences by numerical or alphabetic characters.

Please note that the alignment will be parsed by MaxAlign in order to maximize the alignment area by selecting an optimal subset of sequences.

2. Options

InterMap3D offers options for the alignment generation, for coevolution analysis and for results visualization. The default values on these options will satisfy most users, but for some it might be important to change them.

2.1 Options on alignment generation

This set of options only applies when the user submits a single sequence.
Please read the section 1.1 on this page for more information. The option refers to the minimum lenght of the protein to be matched. When looking for homologues, only protein whose homology to the query protein streches over a region larger than the value inserted in this option (measured in % of the query protein le ngth) are included in the alignment. The default value is 50%. A value of, for instance, 80%, would mean that only proteins that were homologous to the user's protein in, at least, 80% of the user's protein length, would be included

2.2 Options on coevolution analysis

The coevolution analysis in InterMap3D can be done by three different methods (RCW MI, MI/Entropy, DEPENDENCY) or by the intersection of the results of any of these. The results are presented as an ordered list, with the pairs of sites with highest scores appearing top of the list. The length of the list can be specified by the user. It is the first option under "Options on coevolution analysis". The user can also choose which method it wants to use for performing the coevolution analysis. The default is RCW MI. Checking any method alone will cause that method to perform the analysis. Checking several methods will make them both perf orm the analysis, and provide the user with the intersection of results only. The intersection of methods works as follows: Each method will output a large list of results (not presented to the user) and these lists will be intersected. The score of the i ntersection is the average of the ranks of the hits in the original, single-method, result lists.

The final option on coevolution analysis refers only to the instances in which RCW MI is used. It is a algorithmic detail and only very rarely will it impact the results of any user. Nevertheless, it can be tuned if the user so desires.
RCW MI calculates the Mutual Information of all sites against all others, and then normalizes the Mutual Information between a pair of interest by the average background Mutual Information of the two sites in question. To avoid including real interactions into the background, the average background Mutual Information of each site is its average Mutual Information (across all other sites) except the N highest Mutual Information values (the rationale being that these top hits are probably resulting from rea l interactions). The user can specify the value of N+1 in the last checkbox of the coevolution analysis options. That is, if the user inserts the value "2" in the box, that means that it expects the occurence of coevolving pairs but not triplets in the da taset. That will mean that N will then be 1, making that only the top hit of each particular site is not included in the background MI. This does not prevent, however, triplets from occuring in the results. These result from site A interacting with site B , site B interacting with site C and site C with site A.

2.3 Options on visualization of results

In this set of options, the user can choose the layout of the results. The results comprise a figure of the 3D structure of the protein with the depiction of the pairs of predicted coevolving sites. The user can choose not to obtain this picture, obtainin g it with a black background, or with a white background. While the black background is good for screen viewing, for printing most users will prefer the white background.

The other option concerns the numbering of residues to be followed. In the results there is an alignment (from which the coevolutionary analysis was performed) and a single protein (whose homologue is depicted in the figure). The numbering of the two does not have to be the same. Therefore can the user choose which of the two numbering schemes is preferable.

For visualizing other properties of positions in your alignment in the 3D structure, you might want to try the server FeatureMap3D, also at the CBS home page.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until completion, when your results will appear in the browser window.

At any time during job processing you may enter your e-mail address and close the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.




GETTING HELP

Scientific problems:        Technical problems: