Export oligos/ probes
Quick guide (Query)
Short walk through
In order to use OligoWiz 2 for microarray probe design,
the first step is to specify the target sequences (transcripts).
This is done by supplying OligoWiz 2 with a file containing the
intended target sequence in FASTA or TAB format.
The TAB file may contain sequence annotation,
such as exon/intron annotation. An annotation containing TAB file can be custom
made or generated from a GenBank file using the
FeatureExtract 1.0 server.
Following, a species
database and a series of parameters must be set (a set of default
settings can be loaded). When the desired parameters are set, a query can be
launched by pressing the "submit" button.
After calculating scores for all possible probes
in the input sequences, an *.owz.gz file is returned from the server and saved to disk.
The probe scores, together with the Total score (red curve) and sequence
annotation can be viewed in the graphical interface. The next step is to place
the probes, using the "Oligo placement" dialog. The number of probes pr. input
sequence and distance between probes as well as minimum score criteria can be
set. In addition, the probes can be placed with respect the sequence annotation
using regular expressions.
Finally, the probes can be exported as either sense or antisense probes, with
or without mismatch probes and in either tab or FASTA format.
Getting started
First thing you must do is to download the client program from the OligoWiz 2 web page and possible Java 1.4 or newer.
From a Windows PC or Macintosh just click on the program icon and wait
a few seconds.
In a UNIX or Linux operative system you must
write: java -jar OligoWiz2.0.jar
Setting up a query
The input for OligoWiz 2.0 contains information about the nucleotide sequence of the target sequences against which probes must be designed. Optionally sequence annotation can be included in the input file. OligoWiz 2.0 takes two input formats, FASTA and Tab. Only the tab format may contain sequence annotation
FASTA
A FASTA file entry begins with a ">" followed by a sequence description which is ended by a new line, the following lines contain nucleotide sequence. For each target sequence a separate entry in the FASTA file must be given. The FASTA file must be in ASCII format, i.e. a text file, not MSWord.doc or the like. An example, with 30 Bacillus subtilis coding regions can be found
here
Annotation containing tab file
The tab file format is a tab-delimited file with one target sequence per line, containing: a sequence id, the nucleotide sequence followed by an optional sequence annotation string and comments.
Example:
Seq_x ATGTCTACATATGAAGGTATGTAA (EEEEEEEEEEEEEE)DIIIIIII /comment
A tab file can automatically be generated from a GenBank file using the FeatureExtract 1.0 server
A detailed description of the file format can be found
here.
The input file must be specified under "Main/Input FASTA or TAB file:"
You may activate a browse dialog function by pressing the "..." button.
When the input file has been specified you will be prompted to specify an output OWZ file,
for the server output. The OWZ file may be used to load the query result from
the "File/open file ..." menu, at later time.
Setting parameters
OligoWiz 2 requires a series of parameters in order to calculate scores for the probes.
This involves setting the "Species" parameter as well as the Score parameters.
The most important parameters to set are the species parameter and the oligo length.
OligoWiz 2.0 also offers some default parameter settings that can be loaded
(se Loading defaults), but the species parameter must always be custom set.
Species database
In order to calculate an appropriate cross-hybridization score as well as
low-complexity score, OligoWiz 2 requires information about the species in question.
A number of databases are available, and can be found in the tree structure under
"Species:" The species option is set by highlighting a leaf in the tree structure,
e.g. "Eukaryote/Fungi/S. cerevisiae" as shown in the Quick guide (query).
A more detailed description of the databases can be found
here.
For some higher organisms a UniGene database is available and specified "(UniGene)"
other wise the database is a genome database.
Databases are continually added to the OligoWiz 2 server and the available databases are
automatically read from the server every time the OligoWiz 2 client is started,
or you press the "Connect" button in the top right corner of the query page.
Loading defaults
Defaults score parameters can be loaded from the "Pre-defined parameter sets" drop-down menu
(point 3 in the quick guide (query) diagram). The defaults are loaded with the "Load" button
and can following be viewed in the four pages "General", "Tm", "Cross-hybridization"
and "Position", below. However, the user must always set the "Species" parameter.
Score parameters
The OligoWiz 2 client allows you to customize a series
of parameters for the calculations of the scores. These parameters can be set in
the "Score parameters/info" field. These are set from the four "page selectors" named:
General, Tm, Cross-hybridization and Position, that are described hereunder.
General
Her you may:
- Specify the length of the oligonucleotides.
If the "Optimize oligo length to fit Tm" is selected, the aim length is used to
calculate the optimal Tm for the delta-Tm score, i.e. the mean Tm for all oligos
of this length is the aim Tm.
- Specify the Min and Max oligo length to define the length interval within which
OligoWiz selects oligos.
Tm
Here an optimal Tm may be specified, but we recommend you leave that to OligoWiz 2.0.
When the server returns the result the aim Tm can be read from here.
The hybridization chemistry may also be set to either DNA:DNA or RNA:RNA.
Cross-hybridization
The criteria's for the BLAST hits to be included in the Homology score calculation may
be set.
- "Minimum homology" (%) specifies the lowest degree of similarity to be taken into
account for the homology calculation.
- "Minimum length of homology stretch" (bp) specifies the shortest overlap a BLAST hit
can have with an oligo without being ignored from the score calculation.
Kane et al. (2000) show that for 50mers more than 75-80% homology or stretches over 15bp
can course cross-hybridization.
- "Maximum homology" and "(total) Max length cutoff" is used to filter BLAST hits
coursed by homology to the database version of the transcript in question, or homology
to very closely related paralogs.
The "Max length cutoff" sets a maximum fraction of the entire length of the query sequence, which can be covered by an accepted BLAST hit exceeding "Maximum homology". Both "Maximum homology" and "(total) Max length cutoff" should be violated for a BLAST hit to be rejected.
Position
Position refers to the position within the query sequence. The score models the reverse transcription process that produces cDNA.
There are five exclusive options.
- "Poly-T primer" specifies a score based on a 3' poly-T primer. The further a probe is placed from the 3' end the lower score.
- "Random primer" specifies a score based on a random primer that anneals various places along the transcript. The chance of having a primer annealing upstream is larger for transcripts toward the 5'end, still taking into account that the reverse transcriptase can drop off the transcript and that the transcript may be degraded
- "5' preference" specifies a score that is one at the 5' end and decreases linear toward the 3' end
- "3' preference" specifies a score that is one at the 3' end and decreases linear toward the 5' end
- "Mid preference" specifies a score that is one in the middle of the sequence and decreases toward the ends
Launching a job
When the desired parameters have been set, a job may be launched at the OligoWiz server.
The job is submitted by selecting a query in the query list and pressing the submit button.
The query status can be followed in the status column. Furthermore, clicking the "www ..." button in the "inspect query" column will
open a web page with server status for the query. This function may also be used to inspect
error messages.
To prepare a new query press the new button or clone button to duplicate the previously query.
Interpreting the results/ placing probes
View results
When the server has returned the result, the text in the status column will change to "Completed - click to view".
Double clicking the query row will open a result page. The result page/interface is described below.
Result interface
A: Graphs represent scores (y-axis) along the input sequence (x-axis).
The different graphs represent different score types indicated by their color (see H).
For all scores it counts that 1 is optimal and 0 is worst.
B: Color bar representing the Total (weighted) score. The Total score is also
represented as a graph (thick red line). On the color bar red represent total
scores higher than 0.8
C: Sequence bar. This bar represents the currently selected sequence entry (see I).
The bar is color coded if sequence annotation has been supplied. Here green indicates
exons and blue indicate introns.
D: Sequence inspection box / probe movement tool. The box represents the length of a
probe and can be moved up and down the sequence bar (see C) using the mouse. The
sequence and annotation at to current position is show below (see D). The eye (J)
and hand (K) icons switched between sequence inspection and probe movement.
E: Sequence and feature annotation of the currently selected oligos, or the sequence
postion being inspected (see D).
F: "Place oligos" button. Opens the dialog for automatic placement of probes.
G: A selection of options for manual manipulation of the probes.
H: Score management bar. Score weights and visual representation can be adjusted here.
I: Table of all sequence entries. One row for each entry in the original FASTA or TAB file.
The columns can be sorted by clicking the header.
J: Eye icon. Put the selection box (D) in sequence inspect mode.
K: Hand icon. Put the selection box (D) in probe adjustment mode.
L: Visual representation of where the probes a located. Click on a probe to
select it.
M: Export oligos. Open the dialog where probes can be exported - several file formats and
advanced options are available.
N: Oligo Table. Shows all probes for the currently selected sequence entry (see I).
Score Weights
The weight of the scores can be adjusted individually (see section H above).
For example this can be used to ignore certain scores completely, or to
put more wieght on a parameter which is important for a special study.
Place oligos/probes
The probe placement dialog
The oligo placement dialog is the main tool for probe placement in
OligoWiz 2. The dialog is opened by pressing the "Place oligos..." button
(see section F above) or by using the "Oligos" menu.
General oligo placement
This box in the placement dialog, governs the basic options
for the distance between the probes and the Total score cut-off value.
Replacement behavior
This options governs if all existing probes should be discared
prior to applying the search ruls, or if they should be kept.
If old probes are kept they will be evaluated according to the
search criteria before placing new probes, thus disallowing new
probes in certain positions.
Filters
One of the main new features of OligoWiz 2 is the ability
to take sequence feature annotation like intron/exon structure into account.
The annotation is in the form of an annotation string of the same length
as the DNA sequence. Each position in the annotation string describes
the annotation for the corresponding position in the DNA sequence, e.g.
"E" for exon and "I" for intron.
TAB files created using the FeatureExtract server
contains the following type of annotation:
Sequence: ATGTCTACATATGAAGGTATGTAATG ... TCATCATTAGATTAGAGGAACATGGAATACAACAAAACT ... ATTGGGGTATGTACGGTTAA
Annotation: (EEEEEEEEEEEEEE)DIIIIIIIII ... IIIIIIIIIIIIIIA(EEEEEEEEEEEEEEEEEEEEEEE ... EEEEEEEEEEEEEEEEEEE)
(EEEE) blocks = exons. DIIIIA blocks = introns.
Some positions have been skipped ( ... ) for readbility.
For details please refer to the FeatureExtract server:
output format.
Using the filters it is possible to restrict the selection of probes
to regions of the input genes that has a certain type of annotation.
By default the search is restricted to exonic regions - the regions
annotated as (EE .. EE).
Custom filter can be defined using regular expressions - advanced
string matching.
About regular expressions
(PERL style)
A regular expression is an advanced description of a text string to search for.
Several closely
related flavors are found. Here we focus on regular expression as the programming
language PERL interprets it, since this is the most general accepted definition.
In its simplest form, a regular expression is just a word or phrase, for example
"human".
But sometimes an alternative word is just as good, then you may use
"human|Homo sapiens".
Where the sign "|" (the pipe character) means OR.
Several alternatives can be listed separated by series of "|"'s.
Sometimes however, there are too many alternatives, or the alternatives may not be known,
and using a wildcard becomes handy. The wildcard in regular expressions is
"." (dot).
Sometimes the number of times a certain expression is repeated is uncertain.
In this case the "*" (star / multiply) sign to indicate one ot more repeats.
For example "TA*" will match all of the following: "T", "TA" and
"TAAAA"
If an expression is expected an interval of times, putting "{n,m}" after the
expression can be used to specify this:
n is the minimum match counts and m the maximum. Alternatively "{n}" means
exactly n times, and "{n,}" means at least n times.
If more than a single character is to be repeated in the expression it must be enclosed
by "(" and ")".
For example "(TA){2,4}" will match "TATA", "TATATA"
and "TATATATA".
Regular expression has a series of special characters with special meanings,
including "(, ), [, ], ., *, {, }, ^, !, \".
However, it may be needed to search
for these characters in the sequence annotation,
for example in order to target the exon start and end characters "(" and
"). In order to do so, the special
meaning of these characters needs to be bypassed. This can be don by putting a "\"
(backslash) in front of the special character.
Example: "\(|\)" will search for "(" or ")".
Oligo or Regional search unit
When a string search is done, the program that performs the search will return a "TRUE"
when it finds the string that matched the regular expression. Therefore, it is important
to note which target units the search space is broken into. If the target unit is a line
in a file, one will be able to retrieve the line(s) that had a match to the regular
expression.
In OligoWiz 2 two target units are available: oligo and regional. In the
"Oligo Include"/"Oligo Exclude" fields the unit is the oligos. The regulatory expression
written here will include or exclude oligos based on the regular expression.
In the "Region Include"/"Region Exclude" fields each position in the input sequence is
evaluated. The following oligo selection is then done among oligos that have all bases
evaluated to be included.
Both criteria's can be used at the same time. In this case an oligo is required to meet
both criteria's.
A more detailed description of regular expression can be found
here
Examples of probe placement
Example 1: Targeting exons
Region include: \(E+\)
Example 2: Targeting introns
Region include: DI+A
Example 3: Targeting long exons
Region include: \(E{200,}\)
Targets exons of length 202 or more (200 + exon start and end)
Example 4: Exons near splice junstions
Region include: E{1,100}\)D|A\(E{1,100}
Oligo exclude: D|A
Tagets exons 100bp upstream of donor sites and 100bp downstream of acceptor sites
Example 5: Cover Exon/Intron junction
Oligo include: E{4,}\)D{4,}I|I{4,}A\(E{4,}
Ensure that at least 5bp og exon and 5 bp intron is both included.
Export oligos/ probes
When probes have been selected they may be exported as either sense or antisense
probes, with or without mismatch probes and in either tab or FASTA format.
By default a materials and methods section are auto generated and included. This
section describes all the parameters used.
The export dialog can be opend by clicking on the "Export oligos ..." button
in the main interface (see section H above) or by using the menu short cut:
File -> Export oligos ...
GETTING HELP
Scientific problems:
Technical problems: