deFUME is an easy-to-use web-server for trimming, assembly and functional annotation of Sanger sequencing data derived from functional selection experiments. As input the user simply provides raw Sanger sequencing chromatograms or pre-assembled sequencing projects. Upon submission the web-server processes the information by integrating multiple analysis steps into one single workflow: read trimming, assembly of reads into contigs, open reading frame prediction, BLAST and enrichment with available metadata. As output, deFUME delivers a comprehensive sequence-overview that include functional annotations and sequence statistics. The following section provides instructions to the deFUME web-server.
Table of contents
As input, you can either choose to upload raw chromatograms in ab1 file format (.ab1), or provide pre-assembled contigs as plain sequence in Fasta format. In the latter case, deFUME will skip the chromatogram trimming and assembly process and the submitted sequence will directly be subject to functional annotation by Blast and InterPro.
Input options for raw chromatogram reads
Chromatograms (.ab1 format) must be compressed into an archive (Zip or tar) file. From the deFUME interface, select your zip or tar file from your local disk and upload. The zip file may contain multiple ab1 files. In order to compress multiple chromatogram (.ab1) files in one compressed archive you can use the following command: tar -cvzf YourCompressedFiles.tar.gz *.ab1
Input options for pre-assembled projects
An alternative input to raw sequencing data is pre-assembled sequence, and the user simply loads or copy-paste his/hers sequence in fast format in the specified input window. When choosing this option, deFUME will skip the phred assemby step. This option is useful for a variety of functional annotation analysis and expands the input to other sequencing techniques than Sanger sequencing such as next generation sequencing (NGS).
Recommended input options
Specification of sequencing primer directionality
As a useful option, deFUME allows the user to specify the directionality of the primers used for Sanger sequencing. By specifying an identifier that matches a part of the name of chromatograms generated with a forward this will be visible in the output. Example: if a users chromatograms are named FORW_01.ab1, FORW_02.ab1, FORW_03.ab1,‚Ä¶ REV_E01.ab1, REV_02.b1, REV_03.ab1,‚Ä¶, etc. then specifying a ‚ÄúForward primer identifier‚Äù as ‚ÄúFORW_‚Äù informs deFUME that all chromatograms with this identifier as part of their name is a chromatogram generated with a forward primer. This will generate a more intuitive visualization of the output. If the user inputs an identifier that does not match the chromatogram name or leaves the field empty, deFUME will randomly choose the directionality in the output.
In case you have a folder with reads from a forward primer and another folder containing ab1 files from the reverse primer. You can run the following command (on a Mac or Unix machine) in the directory containing the forward primer reads, it will add the postfix _FOR to all your .ab1 files.
for file in *.ab1; do mv $file $(basename $file .ab1)_FOR.ab1; done
The same can me done in the directory containing all the reverse reads, adding the postfix _REV to all your .ab1 files:
for file in *.ab1; do mv $file $(basename $file .ab1)_REV.ab1; done
On a Windows system the following commands will do the job for the forward reads:
for /f "tokens=*"%a in ('dir *.ab1 /b') do @ren "%a" "FORWARD_%a" and for the reverse reads:
for /f "tokens=*"%a in ('dir *.ab1 /b') do @ren "%a" "REVERSE_%a"
Email for InterPro queries
Assembled open reading frames are further annotated using the InterPro server. In order to use this service at EMBL-EBI a valid email address is required.
Advanced input options
Trimming of cloning vector sequences
For sequencing data generated from a cloning vector, loading vector sequence in this field (as Fasta format) enables deFUME to remove vector sequence from the user sequencing data. This is performed prior to assembly and improves the accuracy of the assembly process.
Base calling error rate
Accuracy of base calls expressed as error probability. The standard probability is 0.01, which corresponds to a base call probability of 99% (or 1 error in 100 bases). Read more in the Wiki article or the phred accuracy assessment and phred error probabilities.
The Visual Output Page
After submitting the job the output page will load when the processing is completed. While processing, it is possible to type in an email address and get notified when the job is complete.
The deFUME output page is a table containing all assembled contigs per row and includes a visual and interactive overview of each assembled contig, specifying chromatogram areas, predicted open reading frames, Blast results and InterPro hits.
deFUME is using the D3.js, jQuery and jqGrid library to render the data.
Example output page
Note that a small sample set is prepared containing a few assembled contigs that enables the user to play around with the different filter capabilities of deFUME. In order to directly view the results of this sample set click here.
Each contig is represented by a thick green line at the top, followed by the Open Read Frames (ORFs) found by MetaGeneMark . The BlastP hits of each ORF are represented by red lines (5 individual hits are represented by 1 line) . In parallel the ORFS are analyzed by InterPro and the individual hits are visualized using a yellow line.
The reads (extracted from the ab1 files) that make up a contig are represented by a green arrow
Expand a contig
By clicking on the + sign, the contig will expand and show the BlastP hit with the highest E-value as a representative for the ORF.
Expand an ORF
By clicking on the + sign of the ORF a new table will open showing the 25 most significant BLAST hits.
On the BlastP level more information is shown for the individual BlastP hits.
- E-value: The e value are calculated by the BLASTP algorithm. A small value represent a more significant hit.
- Coverage %: This value indicates how much percent of the amino acids in the BlastP database are covered by this particular Open Reading Frame.
- Hit id [%]: The sequence identity as output by BlastP
- Hit length: The total length of the ORF in amino acids
The Open Reading Frames are enriched with InterPro data shown as yellow lines. To inspect the detailed InterPro results, click on the link "InterPro" in the designated column to open a popup with the page as rendered by the InterPro server. In case the Open Reading Frame didn't came back with an InterPro hit a "-" is shown.
Associated GO Terms
Since InterPro can associate multiple GO terms these are reported in the "Associated GO terms" column. By hovering over the corresponding cell all the associate GO terms are visible. To investigate more in-depth the GO annotations click on the "InterPro" link. In case there are no InterPro hits for this ORF, or the InterPro data do not contain any GO annotations a "-" is shown.
The GO terms here are used to retrieve the top-level GO term as shown in the menu box to the right of the main results table.
Exploring your data
Left menu box
The menu on the right side contains additional filtering options. The following visual cues can be turned on and off to easy the browse-ability
- Red BlastP lines
- Yellow Interpro lines
- Green AB1 read arrows
- Removal of all the hits that are annotated with "hypothetical" or "unknown"
Furthermore, the E-value cutoff can be adjust interactively so that only hits with an E-value below this cutoff are shown
An important feature of deFUME is the interactive browsing of the GO terms. The GO annotations is composed of three main categories: "Molecular Function", "Biological Process" and "Cellular Component". Each ORF is annotated using InterPro with 0 or more GO terms. Of these GO terms the top-level GO term is extracted from the GO hierarchy and shown as a histogram in this menu.
By clicking on one of the bars in the bar plot, the deFUME tables adjust to visualize only contigs that are annotated with the selected GO term. This will update both the interactive table and the GO term chart.
In order to inspect the individual GO terms associated with an ORF you can click over the cell containing the 'Associate GO Terms' on ORF level.
To reset the filtering on a particular GO term just click on the current GO term filter that is active.
By clicking on the small up and down arrows a deFUME list can be sorted ascending or descending. These features are also available on the ORF and BlastP level.
By typing in a (part) of a particular read name or contig name the deFUME tables will automatically update to match the search criteria. By typing "cont" only contigs that are not composed of single reads are found.
Exporting your data
Right menu box
The total set of assembled reads can be exported in three formats:
- GenBank (a zip file containing the individual contigs as .gb files)
This allows for further manipulation and use in sequence analysis programs like Vector NTI, CLC, etc.
On Contig and ORF level
By clicking on one of the export buttons the current individual contig or BlastP hit will be exported to a Genbank or FASTA file.
Submit the job
Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.
At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified via e-mail when it has completed. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect
deFUME is compatible with the major browsers available, however be sure to use the latest version. deFUME was successfully tested with Chrome 39.0.2171.95, Firefox 4.0.5, Safari Version 6.1.3, Internet Explorer 11.0.15. deFUME will for example not render properly on Internet Explorer 10.
Please read the CBS access policies for information about limitations on the daily number of submissions.