Exam in course 27013, May 28, 2003
Perl and Unix for Bioinformaticians
Preface:
Trivial syntax errors do not count against you.
However, using functions or language structures (loops, conditional statements)
in a nonsensical manner do. In appendix 6 is a short reminder (not complete
coverage) of perl structure and functions.
If you want to refer to the data in the appendices as
files, then use filenamesÂ
"appendix1", "appendix2" and so forth.
You can use danish or english to answer the questions.
Assignment 1
(50%):
During your research in ion channels you stumble upon
the SwissProt entry (CIQ3_HUMAN) in appendix 1. You notice in the feature table
(FT) that several variants and mutations of this gene exists. You want to take
a closer look at this
and decide that the first step is to extract the
original amino acid sequence and all variations (full sequence with the appropiate amino acid changed) hereoff and
put the result in a fasta file (see appendix 2 for fasta file format). Since
you probably are going to do this on a lot of SwissProt entries, you decide to
make a program in perl (surprise).
a)
Describe which keywords/patterns you will be looking
for when parsing the file searching for the variants and the sequence.
b)
Describe a method to extract the sequences. You can
use pseudo code, a diagram or whatever you find suitable in your description.
c)
Implement your method in perl (on paper).
d)
What kind of error checking could/should you include
in your program ? Here you should name every check, which is relevant to the
task, not every check it is possible to
make.
e)
In what way could you generalize or extend the program
?
page 1/2
Assignment 2
(50%):
You have earlier in your career made a splendid
program, that calculate various scores based on amino acid sequence features.
The output from your program can be seen in appendix 3, and consist of an
accession number followed by 6 numbers between 0 and 1 per line (tab
separated). You want to find the accession
numbers with the highest and lowest average scores (average
of the 6 numbers). However, you want to exclude any genes on your negative list
from your calculations. These genes are listed as SwissProt IDs in appendix 4.
Since GenBank accession numbers and SwissProt IDs are not identical, you need
to translate between them in order to solve your problem. Fortunately you have
a file, that does just that, see appendix 5, where the first item on the line
is a SwissProt ID, second item is irrelevant, and third is the corresponding
GenBank accession number.
a) Describe a method to find the data. You can use
pseudo code, a diagram or whatever you find suitable in your description.
b) Implement your method in perl (on paper).
c) Have you made any assumptions about the data in
your algoritm ? Which ? Why ? Are they reasonable assumptions (explain) ?
Could/should you do away with them (by changing the code) ?
d) Usually, when you have this kind of problem, you
want the highest 10 and lowest 10 average scores, not just the top and buttom
average score. How would you solve this problem ? Will it change any
assumptions i c) ?
page 2/2
Appendix 1 (page
1)
IDÂ Â
CIQ3_HUMANÂ Â Â Â STANDARD;Â Â Â Â Â PRT;Â Â
872 AA.
ACÂ Â O43525;
DTÂ Â
15-JUL-1999 (Rel. 38, Created)
DTÂ Â
15-JUL-1999 (Rel. 38, Last sequence update)
DTÂ Â 28-FEB-2003
(Rel. 41, Last annotation update)
DEÂ Â Potassium
voltage-gated channel subfamily KQT member 3 (Potassium
DEÂ Â channel
KQT-like 3).
GNÂ Â KCNQ3.
OSÂ Â Homo
sapiens (Human).
OCÂ Â Eukaryota;
Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OCÂ Â Mammalia;
Eutheria; Primates; Catarrhini; Hominidae; Homo.
OXÂ Â NCBI_TaxID=9606;
RNÂ Â [1]
RPÂ Â SEQUENCE
FROM N.A., AND MUTAGENESIS OF GLY-310 AND GLY-318.
RCÂ Â
TISSUE=Brain;
RXÂ Â MEDLINE=99087323; PubMed=9872318;
RAÂ Â Schroeder
B.C., Kubisch C., Stein V., Jentsch T.J.;
RTÂ Â
"Moderate loss of function of cyclic-AMP-modulated KCNQ2/KCNQ3 K+
RTÂ Â channels
causes epilepsy.";
RLÂ Â Nature 396:687-690(1998).
RNÂ Â [2]
RPÂ Â VARIANT
BFNC2 ARG-309.
RXÂ Â MEDLINE=20309392; PubMed=10852552;
RAÂ Â Hirose S., Zenri F., Akiyoshi H., Fukuma
G., Iwata H., Inoue T.,
RAÂ Â Yonetani M., Tsutsumi M., Muranaka H.,
Kurokawa T., Hanai T., Wada K.,
RAÂ Â Kaneko S., Mitsudome A.;
RTÂ Â "A
novel mutation of KCNQ3 (c.925T-->C) in a Japanese family with
RTÂ Â benign
familial neonatal convulsions.";
RLÂ Â Ann. Neurol. 47:822-826(2000).
CCÂ Â -!-
FUNCTION: PROBABLY IMPORTANT IN THE REGULATION OF NEURONAL
CCÂ Â Â Â Â Â
EXCITABILITY. ASSOCIATES WITH KCNQ2 OR KCNQ5 TO FORM A POTASSIUM
CCÂ Â Â Â Â Â
CHANNEL WITH ESSENTIALLY IDENTICAL PROPERTIES TO THE CHANNEL
CCÂ Â Â Â Â Â UNDERLYING THE NATIVE M-CURRENT, A SLOWLY
ACTIVATING AND
CCÂ Â Â Â Â Â
DEACTIVATING POTASSIUM CONDUCTANCE WHICH PLAYS A CRITICAL ROLE IN
CCÂ Â Â Â Â Â
DETERMINING THE SUBTHRESHOLD ELECTRICAL EXCITABILITY OF NEURONS AS
CCÂ Â Â Â Â Â WELL
AS THE RESPONSIVENESS TO SYNAPTIC INPUTS.
CCÂ Â -!-
SUBUNIT: HETEROMULTIMER WITH KCNQ2 OR KCNQ5. MAY ASSOCIATE WITH
CCÂ Â Â Â Â Â KCNE2.
CCÂ Â -!- SUBCELLULAR
LOCATION: INTEGRAL MEMBRANE PROTEIN.
CCÂ Â -!- TISSUE
SPECIFICITY: PREDOMINANTLY EXPRESSED IN BRAIN.
CCÂ Â -!-
DOMAIN: THE SEGMENT S4 IS PROBABLY THE VOLTAGE-SENSOR AND IS
CCÂ Â Â Â Â Â
CHARACTERIZED BY A SERIES OF POSITIVELY CHARGED AMINO ACIDS AT
CCÂ Â Â Â Â Â EVERY
THIRD POSITION (BY SIMILARITY).
CCÂ Â -!-
DISEASE: DEFECTS IN KCNQ3 ARE THE CAUSE OF BENIGN FAMILIAL
CCÂ Â Â Â Â Â
NEONATAL CONVULSIONS TYPE 2 (BFNC2); ALSO KNOWN AS EPILEPSY,
CCÂ Â Â Â Â Â BENIGN
NEONATAL TYPE 2 (EBN2); BFNC2 IS AN AUTOSOMAL DOMINANT FORM
CCÂ Â Â Â Â Â OF
EPILEPSY IN THE NEWBORN THAT CLEARS SPONTANEOUSLY AFTER A FEW
CCÂ Â Â Â Â Â WEEKS
AND IS FOLLOWED BY NORMAL PSYCHOMOTOR DEVELOPMENT.
Appendix 1 (page
2)
CCÂ Â -!-
MISCELLANEOUS: MUTAGENESIS EXPERIMENTS WERE CARRIED OUT IN XENOPUS
CCÂ Â Â Â Â Â
OOCYTES BY CO-EXPRESSION OF EITHER KCNQ3(MUT) AND KCNQ2 AT THE
CCÂ Â Â Â Â Â RATIO
OF 1:1, OR OF KCNQ3(MUT), KCNQ3(WT) AND KCNQ2 AT THE RATIO
CCÂ Â Â Â Â Â OF 1:1:2,
TO MIMIC THE SITUATION IN A HETEROZYGOUS PATIENT WITH
CCÂ Â Â Â Â Â BFNC2
DISEASE.
CCÂ Â -!-
SIMILARITY: BELONGS TO THE POTASSIUM CHANNEL FAMILY. KQT
CCÂ Â Â Â Â Â
SUBFAMILY.
CCÂ Â
--------------------------------------------------------------------------
CCÂ Â This
SWISS-PROT entry is copyright. It is produced through a collaboration
CCÂ Â
between the Swiss Institute of
Bioinformatics and the EMBL outstation -
CCÂ Â the
European Bioinformatics Institute.Â
There are no restrictions
on its
CC  use byÂ
non-profit institutions as
long as its content isÂ
in no way
CCÂ Â modified
and this statement is not removed.Â
Usage by and for commercial
CCÂ Â entities
requires a license agreement (See http://www.isb-sib.ch/announce/
CCÂ Â or send an
email to license@isb-sib.ch).
CCÂ Â
--------------------------------------------------------------------------
DRÂ Â HSSP; Q54397;
1BL8.
DRÂ Â Genew;
HGNC:6297; KCNQ3.
DRÂ Â MIM; 602232;
-.
DRÂ Â MIM;
121201; -.
DRÂ Â GO;
GO:0008076; C:voltage-gated potassium channel complex; TAS.
DRÂ Â GO; GO:0005249; F:voltage-gated potassium
channel activity; TAS.
DRÂ Â GO;
GO:0006813; P:potassium ion transport; TAS.
DRÂ Â GO;
GO:0007268; P:synaptic transmission; TAS.
DRÂ Â InterPro;
IPR005821; Ion_trans.
DRÂ Â InterPro;
IPR001622; K+channel_pore.
DRÂ Â InterPro;
IPR003091; K_channel.
DRÂ Â InterPro;
IPR003937; KCNQ_channel.
DRÂ Â InterPro;
IPR005820; M+channel_nlg.
DRÂ Â Pfam;
PF00520; ion_trans; 1.
DRÂ Â Pfam;
PF03520; KCNQ_channel; 1.
DRÂ Â PRINTS;
PR00169; KCHANNEL.
KWÂ Â Transport;
Ion transport; Ionic channel; Voltage-gated channel;
KWÂ Â Potassium
channel; Potassium transport; Potassium; Transmembrane;
KWÂ Â Multigene
family; Disease mutation.
FTÂ Â
TRANSMEMÂ Â Â 122Â Â Â 142Â Â Â Â Â Â
SEGMENT S1 (POTENTIAL).
FTÂ Â
TRANSMEMÂ Â Â 153Â Â Â 173Â Â Â Â Â Â
SEGMENT S2 (POTENTIAL).
FTÂ Â
TRANSMEMÂ Â Â 197Â Â Â 217Â Â Â Â Â Â
SEGMENT S3 (POTENTIAL).
FTÂ Â
TRANSMEMÂ Â Â 226Â Â Â 247Â Â Â Â Â Â
SEGMENT S4 (POTENTIAL).
FTÂ Â
TRANSMEMÂ Â Â 262Â Â Â 282Â Â Â Â Â Â
SEGMENT S5 (POTENTIAL).
FTÂ Â
DOMAINÂ Â Â Â Â 304Â Â Â 324Â Â Â Â Â Â
SEGMENT H5 (PORE-FORMING) (POTENTIAL).
FTÂ Â TRANSMEMÂ Â Â 331Â Â Â
351Â Â Â Â Â Â SEGMENT S6 (POTENTIAL).
FTÂ Â
DOMAINÂ Â Â Â Â Â 13Â Â Â Â 24Â Â Â Â Â Â
POLY-GLY.
FTÂ Â
VARIANTÂ Â Â Â 309Â Â Â 309Â Â Â Â Â Â
W -> R (IN BFNC2).
FTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
/FTId=VAR_010935.
FTÂ Â
VARIANTÂ Â Â Â 310Â Â Â 310Â Â Â Â Â Â
G -> V (IN BFNC2).
FTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /FTId=VAR_001546.
FTÂ Â
MUTAGENÂ Â Â Â 310Â Â Â 310Â Â Â Â Â Â
G->V: ABOUT 50% REDUCTION OF WT
FTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â HETEROMERIC CURRENT; RATIO OF 1:1;
OR
FTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 20%; RATIO OF 1:1:2.
Appendix 1 (page 3)
FTÂ Â
MUTAGENÂ Â Â Â 318Â Â Â 318Â Â Â Â Â Â
G->S: >50% REDUCTION OF WT HETEROMERIC
FTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â CURRENT; RATIO OF 1:1 AND 1:1:2.
SQÂ Â
SEQUENCEÂ Â 872 AA;Â 96742 MW;Â
BB79C69EE8591A84 CRC64;
    MGLKARRAAG
AAGGGGDGGG GGGGAANPAG GDAAAAGDEE RKVGLAPGDV EQVTLALGAG
    ADKDGTLLLE
GGGRDEGQRR TPQGIGLLAK TPLSRPVKRN NAKYRRIQTL IYDALERPRG
    WALLYHALVF
LIVLGCLILA VLTTFKEYET VSGDWLLLLE TFAIFIFGAE FALRIWAAGC
    CCRYKGWRGR
LKFARKPLCM LDIFVLIASV PVVAVGNQGN VLATSLRSLR FLQILRMLRM
    DRRGGTWKLL
GSAICAHSKE LITAWYIGFL TLILSSFLVY LVEKDVPEVD AQGEEMKEEF
    ETYADALWWG
LITLATIGYG DKTPKTWEGR LIAATFSLIG VSFFALPAGI LGSGLALKVQ
    EQHRQKHFEK
RRKPAAELIQ AAWRYYATNP NRIDLVATWR FYESVVSFPF FRKEQLEAAS
    SQKLGLLDRV
RLSNPRGSNT KGKLFTPLNV DAIEESPSKE PKPVGLNNKE RFRTAFRMKA
    YAFWQSSEDA
GTGDPMAEDR GYGNDFPIED MIPTLKAAIR AVRILQFRLY KKKFKETLRP
    YDVKDVIEQY
SAGHLDMLSR IKYLQTRIDM IFTPGPPSTP KHKKSQKGSA FTFPSQQSPR
    NEPYVARPST
SEIEDQSMMG KFVKVERQVQ DMGKKLDFLV DMHMQHMERL QVQVTEYYPT
    KGTSSPAEAE
KKEDNRYSDL KTIICNYSET GPPEPPYSFH QVTIDKVSPY GFFAHDPVNL
    PRGGPSSGKV
QATPPSSATT YVERPTVLPI LTLLDSRVSC HSQADLQGPY SDRISPRQRR
    SITRDSDTPL
SLMSVNHEEL ERSPSGFSIS QDRDDYVFGP NGGSSWMREK RYLAEGETDT
    DTDPFTPSGS
MPLSSTGDGI SDSVWTPSNK PI
//
Appendix 2
Description of FASTA file format:
Every sequence starts with a header line, where the
very first character is a > followed immediately by a unique sequence id (at
the least, unique for the file). Optionally the id can be followed by
whitespace and some relevant text, but all the text has to be on the header
line only. On the lines following the header line is the sequence, which can be
a nucleotide or amino acid sequence. Usually a sequence line contains 60 units
(or less if it's the last line), but there are no limitations. Whitespace in
the sequence is allowed but ignored.
See example below:
>SequenceID One line of text describing the
sequence
MFLRRAAVAPQRAPILRPAFVPHVLQRADSALSSAAAGPRPMALRPPHQALVGPPLPGPP
GPPMMLPPMARAPGPPLGSMAALRPPLEEPAAPRELGLGLGLGLKEKEEAVVAAAAGLEE
ASAAVAVGAGGAPAGPAVIGPSLPLALAMPLPEPEPLPLPLEVVRGLLPPLRIPELLSLR
PRPRPPRPEPPPGLMALEVPEPLGEDKKKGKPEKLKRCIRTAAGÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
>NewSequenceID One line of text describing the sequence
MAELKYISGFGNECSSEDPRCPGSLPEGQNNPQVCPYNLYAEQLSGSAFTCPRSTNKRSW
LYRILPSVSHKPFESIDEGHVTHNWDEVDPDPNQLRWKPFEIPKASQKKVDFVSGLHTLC
GAGDIKSNNGLAIHIFLCNTSMENRCFYNSDGDFLIVPQKGNLLIYTEFGKMLVQPNEIC
VIQRGMRFSIDVFEETRGYILEVYGVHFELPDLGPIGANGLANPRDFLIPI
                                                                                        Â
                                 Â
Appendix 3
U01120.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.96254Â Â Â Â Â Â Â Â Â Â Â 0.48773Â Â Â Â Â Â Â Â Â Â Â 0.91830Â Â Â Â Â Â Â Â Â Â Â 0.98988Â Â Â Â Â Â Â Â Â Â Â 0.10537Â Â Â Â Â Â Â Â Â Â Â 0.62475
D25328.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.04034Â Â Â Â Â Â Â Â Â Â Â 0.42409Â Â Â Â Â Â Â Â Â Â Â 0.43538Â Â Â Â Â Â Â Â Â Â Â 0.52913Â Â Â Â Â Â Â Â Â Â Â 0.63754Â Â Â Â Â Â Â Â Â Â Â 0.79602
X15573.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.13059Â Â Â Â Â Â Â Â Â Â Â 0.65310Â Â Â Â Â Â Â Â Â Â Â 0.63434Â Â Â Â Â Â Â Â Â Â Â 0.69388Â Â Â Â Â Â Â Â Â Â Â 0.92635Â Â Â Â Â Â Â Â Â Â Â 0.03285
K03515.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.65147Â Â Â Â Â Â Â Â Â Â Â 0.03256Â Â Â Â Â Â Â Â Â Â Â 0.01210Â Â Â Â Â Â Â Â Â Â Â 0.92373Â Â Â Â Â Â Â Â Â Â Â 0.25138Â Â Â Â Â Â Â Â Â Â Â 0.04894
L44140.CDS.10Â Â Â Â Â Â Â Â Â Â Â 0.57916Â Â Â Â Â Â Â Â Â Â Â 0.67875Â Â Â Â Â Â Â Â Â Â Â 0.64902Â Â Â Â Â Â Â Â Â Â Â 0.11068Â Â Â Â Â Â Â Â Â Â Â 0.97844Â Â Â Â Â Â Â Â Â Â Â 0.40458
U24183.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.15529Â Â Â Â Â Â Â Â Â Â Â 0.94098Â Â Â Â Â Â Â Â Â Â Â 0.89230Â Â Â Â Â Â Â Â Â Â Â 0.07359Â Â Â Â Â Â Â Â Â Â Â 0.93086Â Â Â Â Â Â Â Â Â Â Â 0.99767
M97347.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.69834Â Â Â Â Â Â Â Â Â Â Â 0.97120Â Â Â Â Â Â Â Â Â Â Â 0.42177Â Â Â Â Â Â Â Â Â Â Â 0.13373Â Â Â Â Â Â Â Â Â Â Â 0.50034Â Â Â Â Â Â Â Â Â Â Â 0.05931
U05259.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.92974Â Â Â Â Â Â Â Â Â Â Â 0.63092Â Â Â Â Â Â Â Â Â Â Â 0.71241Â Â Â Â Â Â Â Â Â Â Â 0.56408Â Â Â Â Â Â Â Â Â Â Â 0.32481Â Â Â Â Â Â Â Â Â Â Â 0.63875
M62486.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.59694Â Â Â Â Â Â Â Â Â Â Â 0.97628Â Â Â Â Â Â Â Â Â Â Â 0.67132Â Â Â Â Â Â Â Â Â Â Â 0.60904Â Â Â Â Â Â Â Â Â Â Â 0.90001Â Â Â Â Â Â Â Â Â Â Â 0.92270
L11244.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.65798Â Â Â Â Â Â Â Â Â Â Â 0.47916Â Â Â Â Â Â Â Â Â Â Â 0.60145Â Â Â Â Â Â Â Â Â Â Â 0.30699Â Â Â Â Â Â Â Â Â Â Â 0.58984Â Â Â Â Â Â Â Â Â Â Â 0.57989
D38293.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.71157Â Â Â Â Â Â Â Â Â Â Â 0.74513Â Â Â Â Â Â Â Â Â Â Â 0.52088Â Â Â Â Â Â Â Â Â Â Â 0.60387Â Â Â Â Â Â Â Â Â Â Â 0.81872Â Â Â Â Â Â Â Â Â Â Â 0.45174
M86400.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.60154Â Â Â Â Â Â Â Â Â Â Â 0.51706Â Â Â Â Â Â Â Â Â Â Â 0.42294Â Â Â Â Â Â Â Â Â Â Â 0.02331Â Â Â Â Â Â Â Â Â Â Â 0.65079Â Â Â Â Â Â Â Â Â Â Â 0.92327
X56468.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.08261Â Â Â Â Â Â Â Â Â Â Â 0.58053Â Â Â Â Â Â Â Â Â Â Â 0.55420Â Â Â Â Â Â Â Â Â Â Â 0.79502Â Â Â Â Â Â Â Â Â Â Â 0.14462Â Â Â Â Â Â Â Â Â Â Â 0.87900
U54778.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.43378Â Â Â Â Â Â Â Â Â Â Â 0.74155Â Â Â Â Â Â Â Â Â Â Â 0.85528Â Â Â Â Â Â Â Â Â Â Â 0.10510Â Â Â Â Â Â Â Â Â Â Â 0.35059Â Â Â Â Â Â Â Â Â Â Â 0.75528
D78577.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.02779Â Â Â Â Â Â Â Â Â Â Â 0.00857Â Â Â Â Â Â Â Â Â Â Â 0.23445Â Â Â Â Â Â Â Â Â Â Â 0.62924Â Â Â Â Â Â Â Â Â Â Â 0.31556Â Â Â Â Â Â Â Â Â Â Â 0.82429
X57346.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.20913Â Â Â Â Â Â Â Â Â Â Â 0.02713Â Â Â Â Â Â Â Â Â Â Â 0.56942Â Â Â Â Â Â Â Â Â Â Â 0.73001Â Â Â Â Â Â Â Â Â Â Â 0.63100Â Â Â Â Â Â Â Â Â Â Â 0.38814
X77567.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.18175Â Â Â Â Â Â Â Â Â Â Â 0.23254Â Â Â Â Â Â Â Â Â Â Â 0.90520Â Â Â Â Â Â Â Â Â Â Â 0.60469Â Â Â Â Â Â Â Â Â Â Â 0.25584Â Â Â Â Â Â Â Â Â Â Â 0.55599
M74161.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.52796Â Â Â Â Â Â Â Â Â Â Â 0.33846Â Â Â Â Â Â Â Â Â Â Â 0.13653Â Â Â Â Â Â Â Â Â Â Â 0.08215Â Â Â Â Â Â Â Â Â Â Â 0.13348Â Â Â Â Â Â Â Â Â Â Â 0.28114
M32313.CDS.1Â Â Â Â Â Â Â Â Â Â Â 0.96116Â Â Â Â Â Â Â Â Â Â Â 0.56726Â Â Â Â Â Â Â Â Â Â Â 0.02270Â Â Â Â Â Â Â Â Â Â Â 0.81643Â Â Â Â Â Â Â Â Â Â Â 0.67235Â Â Â Â Â Â Â Â Â Â Â 0.37329
.
.
.
(long list
continues here)
Appendix 4
OGG1_HUMAN
HGD_HUMANÂ
CRAR_HUMAN
SN25_HUMAN
INA2_HUMAN
TBB1_HUMAN
ADT2_HUMAN
FOL2_HUMAN
CBG_HUMANÂ
MYCM_HUMAN
PYR5_HUMAN
GLUC_HUMAN
SY04_HUMAN
PPA5_HUMAN
FGF2_HUMAN
COXR_HUMAN
GTM3_HUMAN
SPCB_HUMAN
MM08_HUMAN Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
Appendix 5
OGG1_HUMAN O15527Â Â Â Â
AB000410.CDS.1
HGD_HUMANÂ
Q93099Â Â Â Â AF000573.CDS.1
CN37_HUMAN P09543Â Â Â Â
D13146.CDS.1
GCST_HUMAN P48728Â Â Â Â
D14686.CDS.1
CRAR_HUMAN P48740Â Â Â Â
D17525.CDS.1
SN25_HUMAN P13795Â Â Â Â
D21267.CDS.1
APM1_HUMAN Q15848Â Â Â Â
D45371.CDS.1
CNCG_HUMAN Q13956Â Â Â Â
D45399.CDS.1
INA2_HUMAN P01563Â Â Â Â
J00207.CDS.1
TBB1_HUMAN P07437Â Â Â Â
J00314.CDS.1
IF2A_HUMAN P05198Â Â Â Â
J02645.CDS.1
ADT2_HUMAN P05141Â Â Â Â
J02683.CDS.1
FOL2_HUMAN P14207Â Â Â Â
J02876.CDS.1
2AAA_HUMAN P30153Â Â Â Â
J02902.CDS.1
C2F1_HUMAN P24903Â Â Â Â
J02906.CDS.1
CBG_HUMANÂ P08185Â Â Â Â J02943.CDS.1
MYCM_HUMAN P12525Â Â Â Â
J03069.CDS.1
GBAZ_HUMAN P19086Â Â Â Â
J03260.CDS.1
LKHA_HUMAN P09960Â Â Â Â
J03459.CDS.1
PYR5_HUMAN P11172Â Â Â Â
J03626.CDS.1
GLUC_HUMAN P01275Â Â Â Â
J04040.CDS.1
CALM_HUMAN P02593Â Â Â Â
J04046.CDS.1
C1S_HUMANÂ
P09871Â Â Â Â J04080.CDS.1
SY04_HUMAN P13236Â Â Â Â
J04130.CDS.1
PPA5_HUMAN P13686Â Â Â Â
J04430.CDS.1
.
.
.
(long list continues here)