Services are gradually being migrated to
Please try out the new site.

FeatureExtract 1.2L (light) Server

The FeatureExtract server extracts sequence and feature annotation, such as intron/exon structure, from GenBank entries and other GenBank format files.

New in version 1.2: Placeholder GenBank entries is expanded into subentries automatically. New options with regard to spliced genes. command-line version (any platform) of FeatureExtract is now available as Open Source. Download:

Light version (2017): Automatic look up of GenBank IDs disabled due to stability issues.

Instructions Output format Software download Article abstract

Paste in one or more GenBank file(s)

Upload file containing one or more GenBank entries

View example GenBank file
Notice: Multiple GenBank format files can be concatenated. A comprehensive source for GenBank files is the NCBI web-site:

Mar 13th, 2017:Light version (automatic look-up of GenBank IDs disabled).

Instructions: Basic usage - Paste in or upload a set of GenBank format files and hit submit. The FeatureExtract server will then by default extract all protein coding genes with full intron/exon annotation.

Please read the CBS access policies for information about limitations on the daily number of submissions. For processing large datasets (e.g the Human Genome builds from NCBI) it is recommended to download the command-line version of FeatureExtract, and do the processing locally.

Basic options

Select type of features to extract

Alternatively, enter the desired feature type(s) below:

Example: CDS,rRNA,tRNA

Include intergenic regions.

Naming preferences

1) Gene name
2) Systematic name
3) EntryId + distance

If the desired type of naming is not available, fall back to the level below: 1 -> 2 -> 3.

Flanking regions

bp : Upstream (5')
bp : Downstream (3')

Optional: Define flanking regions

Advanced options


(bp): Frameshift cutoff

"Introns" shorter than this length are considered annotated frameshifts

Custom defined annotation

Example: snRNA=(N),promoter={P},unknown=QQQ

Splicing (new in 1.2)

Splice all intron containing seqeunces
Full length sequences are kept in the comments field

Only output intron containing sequnces
Can be used in combination with the "splice all..." option


Feature types to annotate in flanking regions

Alternatively, enter the desired feature type(s) below:

Example: MOST,polyA

Flanking region annotation scheme

Full annotation
Uppercase = same strand, Lowercase = opposite strand.
Presence/absence annotation
+ = same strand, - = opposite strand, # = overlapping

Trouble shooting

Produce verbose information

Verbose: Output additional information about the contents of the GenBank files and the general progress of the extraction.

Restrictions: A maximum of 100mb of GenBank files will be processed in each run.

The sequences are kept confidential and will be deleted after processing.


For publication of results, please cite:

FeatureExtract - extraction of sequence annotation made easy.
Rasmus Wernersson.
Nucleic Acids Research, 2005, Vol. 33, Web Server issue W567-W569

View the abstract.


The commandline version of FeatureExtract is open source software (GPL license) and can be downloaded here.

If you require FeatureExtract on a commerical license, please contact


Scientific and technical problems: