Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Phylogenetic Trees



    Before you start: please install the FigTree viewer on your computer.

    In this exercise you will analyze the evolutionary relationship between HIV-related viruses from man and monkeys:

    Acquired Immune Deficiency Syndrome (AIDS) is caused by two divergent viruses, Human Immunodeficiency Virus one (HIV-1) and Human Immunodeficiency Virus two (HIV-2). HIV-1 is responsible for the global pandemic, while HIV-2 has, until recently, been restricted to West Africa and appears to be less virulent in its effects. Viruses related to HIV have been found in many species of non-human primates (monkeys, apes, ...) and have been named Simian Immunodeficiency Virus, SIV. HTLV-1 is another, more distantly related, member of the family of retroviruses to which HIV and SIV belong.

    The "Pol" gene, which is present in the genome of all these viruses, encodes three different polypeptides important for the viral life cycles: integrase, reverse transcriptase, and protease. It is expressed as a single polyprotein and is subsequently cleaved by protease into its three separate parts. In this exercise you will use a data set consisting of 21 different POL-polyprotein sequences from HIV1, HIV2, chimpanzee SIV, sooty mangabey SIV, and HTLV-1:

    >HTLV  P03362 POL_HTL1A POL polyprotein (HTLV-I).
    GKKAACNLANTGASRPWARTPPKAPRNQPVPFKPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANG
    TWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTIDLRDAFFQIPLPKQFQPYFAFTVPQQCNYGP
    GTRYAWKVLPQGFKNSPTLFEMQLAHILQPIRQAFPQCTILQYMDDILLASPSHEDLLLLSEATMASLISHG
    LPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYC
    ALQRHTDPRDQIYLNPSQVQSLVQLRQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKEQWPLVWLHA
    PLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTSDHPSVPILLHHSHRFKNLGA
    QTGELWNTFLKTAAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSRAAYILWDKQILSQRSFPLPPPHKSA
    QRAELLGLLHGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSH
    TNLPDPISRLNALTDALLITPVLQLSPAELHSFTHCGQTALTLQGATTTEASNILRSCHACRGGNPQHQMPR
    GHIRRGLLPNHIWQGDITHFKYKNTLYRLHVWVDTFSGAISATQKRKETSSEAISSLLQAIAHLGKPSYINT
    DNGPAYISQDFLNMCTSLAIRHTTHVPYNPTSSGLVERSNGILKTLLYKYFTDKPDLPMDNALSIALWTINH
    LNVLTNCHKTRWQLHHSPRLQPIPETRSLSNKQTHWYYFKLPGLNSRQWKGPQEALQEAAGAALIPVSASSA
    QWIPWRLLKRAACPRPVGGPADPKEKDLQHHG
    >HIV1B5  P04587 POL polyprotein [Contains: Protease (Retro
    FFREDLAFLQGKAREFSSEQTRANSPTISSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQIT
    LWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVL
    VGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISK
    IGPENPYNTPVFAIKKKDSTKWRKLVDFRELNRRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLD
    EDFRKYTAFTIPSINNETPGSGYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDL
    EIGQHRTKIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTIQPIVLPEKDSWTVNDIQKLVGKLN
    WASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQW
    TYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQAT
    WIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAASRETKLGKAGYVTNRGRQKVVTLTHTTNQKTELQA
    IHLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVS
    AGIRKILFLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDC
    THLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTSATVKAACWWAGIKQ
    EFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQ
    TKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCV
    ASRQDED
    >HIV1H2  P04585 POL polyprotein [Contains: Protease (Retro
    FFREDLAFLQGKAREFSSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQVTLWQRPLVTIKIG
    GQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRN
    LLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVF
    AIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIP
    SINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEEL
    RQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVR
    QLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNL
    KTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPP
    LVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEV
    NIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGI
    DKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAV
    HVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQG
    VVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQ
    NFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
    >HIV1MN  P05961 POL polyprotein [Contains: Protease (Retro
    FFREDLAFLQGKAEFSSEQNRANSPTRRELQVWGRDNNSLSEAGEEAGDDRQGPVSFSFPQITLWQRPIVTI
    KIGGQLKEALLDTGADDTVLGEMNLPRRWKPKMIGGIGGFIKVRQYDQITIGICGHKAIGTVLVGPTPVNII
    GRNLLTQLGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALIEICTEMEKEGKISKIGPENPYNT
    PVFAIKKKDSTKWRKLVDFRELNKKTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAF
    TIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRAKI
    EELRRHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGI
    KVKQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEVQKQGQGQWTYQIYQEPF
    KNLKTGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFRLPIQKETWETWWTEYTXATWIPEWEVVN
    TPPLVKLWYQLEKEPIVGAETFYVDGAANRETKKGKAGYVTNRGRQKVVSLTDTTNQKTELQAIHLALQDSG
    LEVNIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFL
    DGIDKAQEDHEKYHSNWRAMASDFNLPPIVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVIL
    VAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGPNFTSTTVKAACWWTGIKQEFGIPYNPQ
    SQGVIESMNKELKKIIGQVRDQAEHLKRAVQMAVFIHNFKRKGGIGGYSAGERIVGIIATDIQTKELQKQIT
    KIQNFRVYYRDSRDPLWKGPAKLLWKGEGAVVIQDNNDIKVVPRRKAKVIRDYGKQTAGDDCVASRQDED
    >HIV1N5  P12497 POL polyprotein [Contains: Protease (Retro
    FFREDLAFPQGKAREFSSEQTRANSPTRRELQVWGRDNNSLSEAGADRQGTVSFSFPQITLWQRPLVTIKIG
    GQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVGQYDQILIEICGHKAIGTVLVGPTPVNIIGRN
    LLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVF
    AIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKQKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIP
    SINNETPGIRYQYNVLPQGWKGSPAIFQCSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEEL
    RQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVR
    QLCKLLRGTKALTEVVPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNL
    KTGKYARMKGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPP
    LVKLWYQLEKEPIIGAETFYVDGAANRETKLGKAGYVTDRGRQKVVPLTDTTNQKTELQAIHLALQDSGLEV
    NIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDGLVSAGIRKVLFLDGI
    DKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAV
    HVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTVHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQG
    VIESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQ
    NFRVYYRDSRDPVWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
    >HIV1ND  P18802 POL polyprotein [Contains: Protease (Retro
    FFREDLAFPQGKAGEFSSEQTRANSPTSRELRVWGGDNPLSETGAERQGTVSFSFPQITLWQRPLVTIKIGG
    QLKEALLDTGADDTVLEEINLPGKWKPKMIGGIGGFIKVRQYDQILIEICGYKAMGTVLVGPTPVNIIGRNL
    LTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALTEICTEMEKEGKISRIGPENPYNTPIFA
    IKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPS
    INNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPEIVIYQYMDDLYVGSDLEIGQHRTKIEELR
    EHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPINLPEKESWTVNDIQKLVGKLNWASQIYAGIKVKQ
    LCKLLRGTKALTEVVPLTEEAELELAENREILKEPVHGVYYDPSKDLIAELQKQGDGQWTYQIYQEPFKNLK
    TGKYARTRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWETWWIEYWQATWIPEWEFVNTPPL
    VKLWYQLEKEPIIGAETFYVDGAANRETKLGKAGYVTDRGRQKVVPFTDTTNQKTELQAINLALQDSGLEVN
    IVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSQGIRKVLFLDGID
    KAQEEHEKYHNNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVH
    VASGYIEAEVIPAETGQETAYFLLKLAGRWPVKVVHTDNGSNFTSATVKAACWWAGIKQEFGIPYNPQSQGV
    VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTRELQKQIIKIQN
    FRVYYRDSRDPIWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKVKIIRDYGKQMAGDDCVASRQDED
    >HIV1OY  P20892 POL polyprotein [Contains: Protease (Retro
    FFREDLAFPQGKAREFSSEQTRANSPTSRELRVWGRDNNSPSEAGADRQGTVSFNLPQITLWQRPIVTIKIG
    GQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRN
    LLTQLGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKVLIEICTEMEKEGKISKVGPENPYNTPVF
    AIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIP
    SINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEEL
    RQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIMLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVK
    NLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLVAELQKQGQGQWTYQIYQEPFKNL
    KTGKYARMRGAHTNDVKQLTEAVQKITQESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPP
    LVKLWYQLEKDPIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEV
    NIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGI
    DKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKIILVAV
    HVASGYIEAEVIPAETGQETAYFILKLAGRWPVKTIHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQG
    VVESMNNELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQ
    NFRVYYRDSREPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
    >HIV1PV  P03368 POL polyprotein [Contains: Protease (Retro
    FFREDLAFLQGKAREFSSEQTRANSPTISSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQIT
    LWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVL
    VGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISK
    IGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLD
    EDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDL
    EIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLN
    WASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQW
    TYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQAT
    WIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETRLGKAGYLTNKGRQKVVPLTNTTNQKTELQA
    IYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKQKVYLAWVPAHKGIGGNEQVDKLVS
    AGIRKILFLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDC
    THLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTSATVKAACWWAGIKQ
    EFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQ
    TKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCV
    ASRQDED
    >HIV1U4  P24740 POL polyprotein [Contains: Protease (Retro
    FFRENLAFQQGEAREFSSEQTRANSPTSRNLWDGGKDDLPCETGAERQGTDSFSFPQITLWQRPLVTVKIGG
    QLIEALLDTGADDTVLEDINLPGKWKPKIIGGIGGFIKVRQYDQILIEICGKKTIGTVLVGPTPVNIIGRNM
    LTQIGCTLNFPISPIETVPVKLKPEMDGPKVKQWPLTEEKIKALTEICNEMEKEGKISKIGPENPYNTPVFA
    IKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHTAGLKKKKSVTVLDVGDAYFSVPLDESFRKYTAFTIPS
    INNETPGVRYQYNVLPQGWKGSPSIFQSSMTKILEPFRSQHPDIVIYQYMDDLYVGSDLEIGQHRAKIEELR
    AHLLSWGFITPDKKHQKEPPFLWMGYELHPDKWTVQPIQLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQ
    LCKLLRGAKALTDIVTLTEEAELELAENREILKDPVHGVYYDPSKDLVAEIQKQGQDQWTYQIYQEPFKNLK
    TGKYARKRSAHTNDVKQLTEVVQKVSTESIVIWGKIPKFRLPIQKETWEAWWMEYWQATWIPEWEFVNTPPL
    VKLWYQLEKDPIAGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTETTNQKTELHAIHLALQDSGSEVN
    IVTDSQYALGIIQAQPDRSESEIVNQIIEKLIEKEKVYLSWVPAHKGIGGNEQVDKLVSSGIRKVLFLDGID
    KAQEDHEKYHCNWRAMASDFNLPPVVAKEIVASCNKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVH
    VASGYIEAEVIPAETGQETAYFILKLAGRWPVKVIHTDNGSNFTSAAVKAVCWWANIQQEFGIPYNPQSQGV
    VESMNKELKKIIGQVREQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTKELQKQISKIQN
    FRVYYRDSRDPIWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCMAGRQDED
    >HIV1Z2  P12499 POL polyprotein [Contains: Protease (Retro
    FFREDLAFPQGKAGELSSEQTRANSPTSRELRVWGRDNPLSETGAERQGTVSFNCPQITLWQRPLVTIKIGG
    QLKEALLDTGADDTVLEEMNLPGKWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNL
    LTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALTEICTEMEKEGKISRVGPENPYNTPIFA
    IKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPS
    INNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPEIVIYQYMDDLYVGSDLEIGQHRTKIEELR
    EHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQSIKLPEKESWTVNDIQKLVGKLNWASQIYPGIKVRQ
    LCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGHGQWTYQIYQEPFKNLK
    TGKYARMRGAHTNDVKQLAEVVQKISTESIVIWGKTPKFRLPIQKETWETWWVEYWQATWIPEWEFVNTPPL
    VKLWYQLEKEPIIGAETFYVDGAANRETKLGKAGYVTDRGRQKVVPFTDTTNQKTELQAINLALQDSGLEVN
    IVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSQGIRKVLFLDGID
    KAQEEHEKYHNNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVH
    VASGYIEAEVIPAETGQETAYFILKLAGRWPVKIVHTDNGSNFTSAAVKAACWWAGIKQEFGIPYNPQSQGV
    VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTKELQKQITKIQN
    FRVYYRDSRDPIWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKVKIIRDYGKQMAGDDCVASRQDED
    >HIV2CA  P24107 POL polyprotein [Contains: Protease (Retro
    TGGFFRDWPLGKEAPQFPRGPSSTGANTNSTPIGSSSGSTGEIYAAREKAEGAETETIQRGDRGLTAPRTRR
    GPMQGDNRGLAAPQFSLWKRPVVTAHIEGQPVEVLLDTGADDSIVAGIELGSNYSPKIVGGIGGFINTKEYK
    NVEIEVLGKRVRATIMTGDTPINIFGRNILTALGMSLNLPVAKIEPIKIMLKPGKDGPRLRQWPLTKEKIEA
    LKEICEKMEKEGQLEEAPPTNPYNTPTFAIRKKDKNKWRMLIDFRELNKVTQDFTEIQLGIPHPAGLAKKRR
    ITVLDVGDAYFSIPLHEDFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQYTMRQVLEPFRKANSD
    VIIIQYMDDILIASDRTDLEHDKVVLQLKELLNNLGFSTPDEKFQKDPPYRWMGYELWPTKWKLQKIQLPQK
    EVWTVNDIQKLVGVLNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENRIILSQEQEGHYYQE
    EKELEATVQKDQDNQWTYKIHQEEKILKVGKYAKIKHTHTNGVKLLAQVVQKIGKEALVIGRIPKFHLPVER
    EVWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVGDPIPGTETFYTDGSCNRQSKEGKAGYVTDRGRDKVK
    ILEQTTNQQAELEAFAMALTDSGPKANIIVDSQYVMGIVAGQPTESENRIVNQIIEEMIKKEAIYVAWVPAH
    KGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHTNVKELCHKFDIPQLVARQIVNTCAQYQQKGEAIH
    GQVNAEVGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRWPITHLHTDNGANFTS
    QEVKMVAWWVGIEQTFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTVETIVLMAVHCMNFKRRGGIGDMT
    PSERLINMITTEQEIQFLQAKNSKLKNFRVYFREGRDQLWKGPGELLWKGDGAVIVKVGTDIKIIPRRKAKI
    IRDYGGRQELDSSSHLEGARENGEVA
    >HIV2D1  P17757 POL polyprotein [Contains: Protease (Retro
    VLELWKGGTLGETVPSTQKTGLLEVWQVRTHHGKLPGKTGRFFRDGPTGKAAPQLPRGPSSSGADTNSTPNR
    SSSGPVGEIYAAREKAERAEGETIQGGDGGLTAPRAGRDAPQRGDRGLATPQFSLWKRPVVTAFIEDQPVEV
    LLDTGADDSIVAGIELGDNYTPKIVGGIGGFINTKEYKNVEIKVLNKRVRATIMTGDTPINIFGRNILATLG
    MSLNLPVAKLDPIKVTLKPGKDGPRLKQWPLTKEKIEALKEICEKMEREGQLEEAPPTNPYNTPTFAIKKKD
    KNKWRMLIDFRELNRVTQDFTEIQLGIPHPAGLAKKKRITVLDVGDAYFSIPLHEDFRQYTAFTLPSVNNAE
    PEKRYVYKVLPQGWKGSPAIFQFMMRQILEPFRKANPDVILIQYMDDILIASDRTGLEHDKVVLQLKELLNG
    LGFSTPDEKFQKDPPFQWMGYELWPTKWKLQKIQLPQKEIWTVNDIQKLVGVLNWAAQIYPGIKTKHLCKLI
    RGKMTLTEEVQWTELAEAELEENKIILSQEQEGSYYQEEEELEATVIKSQDNQWAYKIHQGERVLKVGKYAK
    IKNTHTNGVRLLAQVVQKIGKEALVIWGRVPKFHLPVERDTWEQWWDNYWQVTWVPEWDFVSTPPLVRLTFN
    LVGDPIPGTETFYTDGSCNRQSKEGKAGYVTDRGRDRVRVLEQTSNQQAELEAFAMALADSGPKVNIIVDSQ
    YVMGIVAGQPTESENRIVNQIIEDMIKKEAVYVAWVPAHKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEH
    EKYHSNIKELTHKFGIPQLVARQIVNTCAQCQQKGEAIHGQVNAEIGVWQMDCTHLEGKIIIVAVHVASGFI
    EAEVIPQESGRQTALFLLKLASRWPITHLHTDNGPNFTSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNH
    HLKNQISRIREQANTIETIVLMAVHCMNFKRRGGIGDMTPAERLINMITTEQEIQFLQRKNSNFKKFQVYYR
    EGRDQLWKGPGELLWKGDGAVIVKVGADIKVVPRRKAKIIRDYGGRQELDSSSHLEGAREDGEVA
    >HIV2G1  P18042 POL polyprotein [Contains: Protease (Retro
    MWQDRTRHGKMPRKTGRFFRDGSMGKEAPQLPRGPSSSGADTNSTPSRSSSGSIGKIYAAGERAEGAEGETI
    QRGDGRLTAPRAGKSTSQRGDRGLAAPQFSLWKRPVVTAYIEVQPVEVLLDTGADDSIVAGIQLGDNYVPKI
    VGGIGGFINTKEIKNIEIKVLNKRVRATIMTGDTPINIFGRNILTALGMSLNLPIAKIEPIKVTLKPGKDGP
    RLRQWPLTKEKIEALREICEKMEKEGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNRVTQDFTEIQ
    LGIPHPAGLAKKKRITVLDVGDAYFSIPLHEDFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHT
    MRQVLEPFRKANPDVILIQYMDDILIASDRTGLEHDKVVLQLKELLNGLGFSTPDEKFQKDPPLQWMGYELW
    PTKWKLQKLQLPQKEIWTVNDIQKLVGVLNWAAQIYPGIKTKHLCRLIKGKMTLTEEVQWTELAEAELEENK
    IILSQEQEGYYYQEEKELEATIQKNQDNQWTYKIHQEEKILKVGKYAKIKNTHTNGVRLLAQVVQKIGKEAL
    VIWGRIPKFHLPVERETWEQWWDNYWQVTWIPEWDFVSTPPLVRLTFNLVGDPIPGAETFYTDGSCNRQSKE
    GKARYVTDRGRDKVRVLERTTNQQAELEAFAMTLTDSGPKVNIIVDSQYVMGIVVGQPTESESRIVNQIIED
    MIKKEAVYVAWVPAHKGIGGNQEVDHLVSQGIRQVLFLERIEPAQEEHEKYHSNMKELTHKFGIPQLVARQI
    VNTCAQCQQKGEAIHGQVNAEIGVWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRW
    PITHLHTDNGSNFTSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTIETIVLMAV
    HCMNFKRRGGIGDMTPAERLINMITTEQEIQFLQRKNSNFKNFQVYYREGRDQLWKGPGELLWKGDGAVIVK
    VGADIKVIPRRKAKIIRDYGGRQELDSSHLEGAREEDGEVA
    >HIV2KR  Q74120 POL polyprotein [Contains: Protease (Retro
    TGWFFRDWPMGKEASQLPRDPSPAGADTNSTPSRPSSRPAREVLAAREEAERAENETIQGGDRGLTAPRTRR
    DTTQRGDRGFAAPQFSLWKRPVVTAYVEGQPVEVLLDTGADDSIVAGIELGSNYSPKIVGGIGGFINTKEYK
    NVEIKVLNKKVKATIMTGDTPINIFGRNILTALGMSLNLPVAKVDPIKVILKPGKDGPKVRQWPLTKEKIEA
    LKEICEKMEREGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQEFTEIQLGIPHPAGLAKKRR
    ITVLDIGDAYFSIPLHEDFRQYTAFTLPTVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRQVLEPFRKANPD
    VILVQYMDDILIASDRTDLEHDRTVLQLKELLNGLGFSTPDEKFQKDPPYKWMGYELWPTKWKLQKIQLPQK
    EVWTVNDIQKLVGVLNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENKIILSQEQEGCYYQE
    EKELEATVQKDQDNQWTYKIHQGEKILKVGKYAKIKNTHTNGVRLLAHVVQKIGKEALVIWGRIPKFHLPVE
    RETWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVKDPIPGEETFYTDGSCNRQSKEGKAGYITDRGRDKV
    RILEQTTNQQAELEAFAMALTDSGPKANIIVDSQYVMGIVAGQPTESESKLVNQIIEEMIKKETLYVAWVPA
    HKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHSNVKELSHKFGLPKLVARQIVNTCAQCQQKGEAI
    HGQVDAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQETGRQTALFLLKLASRWPITHLHTDNGANFT
    SQEVKMVAWWTGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTMETIVLMAVHCMNFKRRGGIGDM
    TPAERLINMITTEQEIQFLHAKNSKLKNFRVYFREGRDQLWKGPGELLWKGDGAVIVKVGTDIKIVPRRKAK
    IIRDYGGRREVDSSSHLEGTREDGEVA
    >HIV2RO  P04584 POL polyprotein [Contains: Protease (Retro
    TGRFFRTGPLGKEAPQLPRGPSSAGADTNSTPSGSSSGSTGEIYAAREKTERAERETIQGSDRGLTAPRAGG
    DTIQGATNRGLAAPQFSLWKRPVVTAYIEGQPVEVLLDTGADDSIVAGIELGNNYSPKIVGGIGGFINTKEY
    KNVEIEVLNKKVRATIMTGDTPINIFGRNILTALGMSLNLPVAKVEPIKIMLKPGKDGPKLRQWPLTKEKIE
    ALKEICEKMEKEGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEIQLGIPHPAGLAKKR
    RITVLDVGDAYFSIPLHEDFRPYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRQVLEPFRKANK
    DVIIIQYMDDILIASDRTDLEHDRVVLQLKELLNGLGFSTPDEKFQKDPPYHWMGYELWPTKWKLQKIQLPQ
    KEIWTVNDIQKLVGVLNWAAQLYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENRIILSQEQEGHYYQ
    EEKELEATVQKDQENQWTYKIHQEEKILKVGKYAKVKNTHTNGIRLLAQVVQKIGKEALVIWGRIPKFHLPV
    EREIWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVGDPIPGAETFYTDGSCNRQSKEGKAGYVTDRGKDK
    VKKLEQTTNQQAELEAFAMALTDSGPKVNIIVDSQYVMGISASQPTESESKIVNQIIEEMIKKEAIYVAWVP
    AHKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHSNVKELSHKFGIPNLVARQIVNSCAQCQQKGEA
    IHGQVNAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRWPITHLHTDNGANF
    TSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTIETIVLMAIHCMNFKRRGGIGD
    MTPSERLINMITTEQEIQFLQAKNSKLKDFRVYFREGRDQLWKGPGELLWKGEGAVLVKVGTDIKIIPRRKA
    KIIRDYGGRQEMDSGSHLEGAREDGEMA
    >HIV2SB  P12451 POL polyprotein [Contains: Protease (Retro
    TGWFFRAWTMGKEAPQLPRGPKFAGANTNSTPNGSSSGPTGEVHAAREKTERAETKTIQRSDRGLAASRARR
    DTTQRDDRGLAAPQFSLWKRPVVTAYIEDQPVEVLLDTGADDSIVAGIELGSNYSPKIVGGIGGFINTKEYK
    DVEIRVLNKKVRATIMTGDTPINIFGRNILTALGMSLNLPVAKIEPVKVTLKPGKDGPKQRQWPLTREKIEA
    LREICEKMEREGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEVQLGIPHPAGLAKKRR
    ITVLDVGDAYFSIPLYEDFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQYTMRQVLEPFRKANPD
    VIIVQYMDDILIASDRTDLEHDKVVLQLKELLNGLGFSTPDEKFQKDPPYQWMGYELWPTKWKLQKIQLPQK
    EVWTVNDIQKLVGVLNWAAQIYPGIKTKHLCKLIRGKMTPTEEVQWTELAEAELEENKIILSQEQEGHYYQE
    EKELEATVQKDQDNQWTYKVHQGEKILKVGKYAKIKNTHTNGVRLLAQVVQKIGKEALVIWGRIPKFHLPVE
    RETWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVKDPIPGAETFYTDGSCNRQSKEGKAGYITDRGKDKV
    RILEQTTNQQAELEAFAMAVTDSGPKVNIVVDSQYVMGIVTGQPAESESRIVNKIIEEMIKKEAIYVAWVPA
    HKGIGGNQEIDHLVSQGIRQVLFLERIEPAQEEHGKYHSNVKELAHKFGLPNLVARQIVNTCAQCQQKGEAI
    HGQVNAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRWPITHLHTDNGANFT
    SQEVKMVAWWVGIEQSFGVPYNPQSQGVVEAMNHHLKNQIERIREQANTMETIVLMAVHCMNFKRRGGIGDM
    TPVERLVNMITTEQEIQFLQAKNSKLKNFRVYFREGRNQLWQGPGELLWKGDGAVIVKVGTDIKVIPRRKAK
    IIRDYGPRQEMDSGSHLEGAREDGEMA
    >HIV2ST  P20876 POL polyprotein [Contains: Protease (Retro
    KTRLLEMWQGRTHHGKMPRKTGGFFRVGPMGKEAPQFPCGPNPAGADTNSTPDRPSRGPTREVHAAREKAER
    AEREAIQRSDRGLPAARETRDTMQRDDRGLAAPQFSLWKRPVVTAHVEGQPVEVLLDTGADDSIVAGVELGS
    NYSPKIVGGIGGFINTKEYKNVEIRVLNKRVRATIMTGDTPINIFGRNILTALGMSLNLPVAKIEPIKIMLK
    PGKDGPKLRQWPLTKEKIEALKEICEKMEREGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQ
    DFTEIQLGIPHPAGLAKKKRITVLDVGDAYFSIPLHEDFRQYTAFTLPSINNAEPGKRYIYKVSPQGWKGSP
    AIFQYTMRQVLEPFRKANPDIILIQYMDDILIASDRTDLEHDRVVLQLKELLNGLGFSTPDEKFQKDPPYQW
    MGYELWPTKWKLQRIQLPQKEVWTVNDIQKLVGVLNWAAQIYPGIKTRNLCRLIRGKMTLTEEVQWTELAEA
    ELEENKIILSQEQEGCYYQEEKELEATVQKDQDNQWTYKIHQGGKILKVGKYAKVKNTHTNGVRLLAQVVQK
    IGKEALVIWGRIPKFHLPVERDTWEQWWDNYWQVTWIPDWDFISTPPLVRLVFNLVKDPILGAETFYTDGSC
    NKQSREGKAGYITDRGRDKVRLLEQTTNQQAELEAFAMAVTDSGPKANIIVDSQYVMGIVAGQPTESESKIV
    NQIIEEMIKKEAIYVAWVPAHKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHSNVKELSHKFGLPK
    LVARQIVNTCTQCQQKGEAIHGQVNAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLL
    KLASRWPITHLHTDNGANFTSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTVET
    IVLMAVHCMNFKRRGGIGDMTPAERLINMVTAEQEIQFLQAKNSKLQNFRVYFREGRDQLWKGPGELLWKGD
    GAVIVKVGADIKIIPRRKAKIIKDYGGRQEMDSGSNLEGAREDGEVA
    >SIVCZ  P17283 POL polyprotein [Contains: Protease (Retro
    STKKKRLLAVWARGTPNERLHRKTGEFFRERLAFPQREARQLCAEQNRTNGPTDRELWVPGGREEPGEERGR
    EQSISTNLPQITLWQRPLIPVKVEGQLCEALLDTGADDTVIERIQLQGLWKPKMIGGIGGFIKVKQFDNVHI
    EIEGRKVVGTVLVGPTPVNIIGRNILTQLGCTLVFPISSIETVPVKLKPGMDGPKVKQWPLSAEKIKALTEI
    CQEMEKEGKISKIGPENPYNTPIFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVL
    DVGDAYFSCPLDKDFRKYTAFTIPSINNETPGVRYQYNVLPQGWKGSPSIFQSSMTKILEPFREKNPDITIY
    QYMDDLYVGSDLEIDQHRKKVEELRQHLLKWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIQLPEKEVWT
    VNDIQKLIGKLNWASQIYPGIKIKQLCKLIRGTKKLTDVVPLTPEAELELAENREIVSTPVHGVYYDPDKEL
    IAEIQKQGNCQWTYQIFQEPHKNLKTGKYARQRSAHTNDIRQLAEAVQKIATESIVIWGKTPKFRLPVQKES
    WEAWWAEYWQATWIPEWEFINTPPLVKLWYSLETEPIPTTDTYYVDGAANRETKTGKAGYVTDKGKQKIISL
    ENTTNQQAELKALLLALQDSDQQVNIVTDSQYVLGIIQSQPDHSESELVNQIIEELIKKEKIYLSWVPAHKG
    IGGNEQVDKLVSAGIRKVLFLDGIDRAQEEHERYHSNWKAMASDFNLPPIVAKEIVAHCDKCQVKGEAMHGQ
    VDCSPGIWQVDCTHLEGKVIIVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGPNFTSAA
    VKAACWWADIKQEFGIPYNPQSQGVVESLNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYTAG
    ERIIDIIATDIQTSELQKQILKVQKFRVYYRDSRDPIWKGPATLLWKGEGAVVIQDQGELKVVPRRKAKIIR
    DYGKQMAGDDCVASRQNED
    >Smanga_S4  P12502 POL polyprotein [Contains: Protease (Retro
    KTGGFFRAWPMGKEAPQFPHGPDASGADTNCSPRGSSCGSTEELHEDGQKAEGEQRETLQGGDRGFAAPQFS
    LWRRPVVTAYIEEQPVEVLLDTGADDSIVAGIELGPNYTPKIVGGIGGFINTKEYKDVKIKVLGKVIKGTIM
    TGDTPINIFGRNLLTAMGMSLNLPIAKVEPIKVTLKPGKEGPKLRQWPLSKEKIIALREICEKMEKDGQLEE
    APPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEVQLGIPHPAGLAKRRRITVLDVGDAYFSIPLD
    EEFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQYTMRNVLEPFRKANPDVTLIQYMDDILIASDR
    TDLEHDRVVLQLKELLNGIGFSTPEEKFQKDPPFQWMGYELWPTKWKLQKIELPQRETWTVNDIQKLVGVLN
    WAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTEMAEAEYEENKIILSQEQEGCYYQEGKPIEATVIKSQDNQW
    SYKIHQEDKVLKVGKFAKVKNTHTNGVRLLAHVVQKIGKEALVIWGEVPKFHLPVEREIWEQWWTDYWQVTW
    IPDWDFVSTPPLVRLVFNLVKEPIQGAETFYVDGSCNRQSREGKAGYVTDRGRDKAKLLEQTTNQQAELEAF
    YLALADSGPKANIIVDSQYVMGIIAGQPTESESRLVNQIIEEMIKKEAIYVAWVPAHKGIGGNQEVDHLVSQ
    GIRQVLFLKKIEPAQEEHEKYHSNVKELVFKFGLPRLVAKQIVDTCDKCHQKGEAIHGQVNAELGTWQMDCT
    HLEGKIIIVAVHVASGFIEAEVIPQETGRQTALFLLKLAGRWPITHLHTDNGANFTSQEVKMVAWWAGIEQT
    FGVPYNPQSQGVVEAMNHHLKTQIDRIREQANSIETIVLMAVHCMNFKRRGGIGDMTPAERLVNMITTEQEI
    QFQQSKNSKFKNFRVYYREGRDQLWKGPGELLWKGEGAVILKVGTEIKVVPRRKAKIIKDYGGGKELDSGSH
    LEDTGEAREVA
    >Smanga_SP  P19505 POL polyprotein [Contains: Protease (Retro
    MPRKTSGFFRAWPMGKEAPQFPHGPDASGADTNCSPRGSSCGSTEELHEDGQKAEGEQRETLQGGNGGFAAP
    QFSLWRRPIVTAYIEEQPVEVLLDTGADDSIVAGIELGPNYTPKIVGGIGGFINTKEYKDVKIKVLGKVIKG
    TIMTGDTPINIFGRNLLTAMGMSLNLPIAKVEPIKVTLKPGKDGPKLRQWPLSKEKIIALREICEKMEKDGQ
    LEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEVQLGIPHPAGLAKRRRITVLDVGDAYFSI
    PLDEEFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRNVLEPFRKANPDVTLIQYMDDILIA
    SDRTDLEHDRVVLQLKELLNSIGFSTPEEKFQKDPPFQWMGYELWPTKWKLQKIELPQRETWTVNDIQKLVG
    VLNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTEMAEAEYEENKIILSQEQEGCYYQEGKPLEATVIKSQD
    NQWSYKIHQEDKILKVGKFAKIKNTHTNGVRLLAHVVQKIGKEAIVIWGQVPRFHLPVEREIWEQWWTDYWQ
    VTWIPEWDFVSTPPLVRLVFNLVKEPIQGAETFYVDGSCNRQSREGKAGYVTDRGRDKAKLLEQTTNQQAEL
    EAFYLALADSGPKANIIVDSQYVMGIVAGQPTESESRLVNQIIEEMIKKEAIYVAWVPAHKGIGGNQEVDHL
    VSQGIRQVLFLEKIEPAQEEHEKYHSNVKELVFKFGLPRLVAKQIVDTCDKCHQKGEAIHGQVNAELGTWQM
    DCTHLEGKIIIVAVHVASGFIEAEVIPQETGRQTALFLLKLASRWPITHLHTDNGANFTSQEVKMVAWWAGI
    EQTFGVPYNPQSQGVVEAMNHHLKTQIDRIREQANSIETIVLMAVHCMNFKRRGGIGDMTPAERLVNMITTE
    QEIQFQQSKNSKFKNFRVYYREGRDQLWKGPGELLWKGEGAVILKVGTEIKVVPRRKAKIIKDYGGGKELDS
    GSHLEDTGEAREVA
    

  1. Align the Pol sequences using the mafft server at EBI with default settings.

    Once the alignment is done, save the resulting alignment as a fasta file: right-click the "Download alignment file" button on the mafft output page, and then save the file using "Save linked file as" (or whatever it is called in your particular browser). Make sure you can find the file again!

  2. Open the TreeHugger web server. (The TreeHugger server constructs a neighbor joining tree from an aligned set of sequences).

  3. Select the option to upload a file (see figure below), then choose the Pol-protein alignment file you just saved on your harddisk, and finally click "Submit Query" to construct the neighbor joining tree:

  4. When the run is done, right-click the "Download data in Newick/Phylip format" link to save the tree file as a text file on your harddisk (again make sure you can find it later). You will notice that the treefile is in the parenthesis-based format we discussed previously in the lecture:

  5. Open the FigTree treeviewer that you have previously installed on your own computer and use File->Open to open the treefile you just saved.

  6. The view that you will see first is presumably a rooted view similar to the one below. However, it is important to realize that we have not explicitly rooted the tree yet, so the root in this view has been chosen randomly. A more realistic view can be seen by clicking the unrooted view button (see figures below):

  7. The last figure above shows the unrooted tree. For now, however, go back to the (pseudo)rooted view you started out with. We wil now place the root by using the HTLV Pol sequence as a so-called outgroup. Click the branch leading to the HTLV sequence such that it gets selected (see figure below). Then click the "Reroot" button, which will subsequently root the tree on the selected outgroup:

    The rationale for using an outgroup to place the root of the tree is as follows: our data set consists of sequences from HIV-1, HIV-2, SIV and HTLV. We know from other evidence that the lineage leading to HTLV branched off before any of the remaining viruses diverged from each other. The root of the tree connecting the organisms investigated here, must therefore be located between the HTLV sequence (the "outgroup") and the rest (the "ingroup"). This way of finding a root is called "outgroup rooting".

  8. Inspect the rooted tree that you get as a result of rerooting and consider what this tells you about the origin of HIV viruses.

    When you have pondered the problem for a while you can read this short explanation that I have prepared: Origin of HIV1 and HIV2.