Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

FeatureExtract 1.2 Server

The FeatureExtract server extracts sequence and feature annotation, such as intron/exon structure, from GenBank entries and other GenBank format files.

New in version 1.2: Placeholder GenBank entries is expanded into subentries automatically. New options with regard to spliced genes. Download: command-line version (any platform) of FeatureExtract is now available as Open Source.

Instructions Output format Databases Software download Article abstract

Paste in GenBank IDs

Upload list of GenBank IDs


Paste in one or more GenBank file(s)

Upload file containing one or more GenBank entries


View example accession IDs View example GenBank file
Notice: 1) Multiple GenBank format files can be concatenated. 2) Using a GenBank accession ID will work for all main GenBank entries plus a selected set of eukaryotic genomes (see details: databases). A comprehensive source for GenBank files is the NCBI web-site: http://www.ncbi.nlm.nih.gov/.

Jan 12th, 2006:IDs for eukaryotic genomes now work again.

Instructions: Basic usage - Paste in or upload a list of GenBank Entry ID's (alternatively GenBank files) and hit submit. The FeatureExtract server will then by default extract all protein coding genes with full intron/exon annotation.

Please read the CBS access policies for information about limitations on the daily number of submissions. For processing large datasets (e.g the Human Genome builds from NCBI) it is recommended to download the command-line version of FeatureExtract, and do the processing locally.


Basic options

Select type of features to extract

Alternatively, enter the desired feature type(s) below:

Example: CDS,rRNA,tRNA

Include intergenic regions.
[details]

Naming preferences

1) Gene name
2) Systematic name
3) EntryId + distance

If the desired type of naming is not available, fall back to the level below: 1 -> 2 -> 3.
[details]

Flanking regions

bp : Upstream (5')
bp : Downstream (3')

Optional: Define flanking regions
[details]


Advanced options

Frameshifts

(bp): Frameshift cutoff

"Introns" shorter than this length are considered annotated frameshifts
[details]

Custom defined annotation

Example: snRNA=(N),promoter={P},unknown=QQQ
[details]

Splicing (new in 1.2)

Splice all intron containing seqeunces
Full length sequences are kept in the comments field

Only output intron containing sequnces
Can be used in combination with the "splice all..." option

[details]

Feature types to annotate in flanking regions

Alternatively, enter the desired feature type(s) below:

Example: MOST,polyA
[details]

Flanking region annotation scheme

Full annotation
Uppercase = same strand, Lowercase = opposite strand.
Presence/absence annotation
+ = same strand, - = opposite strand, # = overlapping
[details]

Trouble shooting

Produce verbose information

Verbose: Output additional information about the contents of the GenBank files and the general progress of the extraction.
[details]

Restrictions: A maximum of 100mb of GenBank files will be processed in each run.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

FeatureExtract - extraction of sequence annotation made easy.
Rasmus Wernersson.
Nucleic Acids Research, 2005, Vol. 33, Web Server issue W567-W569

View the abstract.


PORTABLE VERSION

The commandline version of FeatureExtract is open source software (GPL license) and can be downloaded here.

If you require FeatureExtract on a commerical license, please contact software@cbs.dtu.dk.


GETTING HELP

Scientific and technical problems: