Before you start: please install the FigTree viewer on your computer.
In this exercise you will analyze the evolutionary relationship between HIV-related viruses from man and monkeys:
Acquired Immune Deficiency Syndrome (AIDS) is caused by two divergent viruses, Human Immunodeficiency Virus one (HIV-1) and Human Immunodeficiency Virus two (HIV-2). HIV-1 is responsible for the global pandemic, while HIV-2 has, until recently, been restricted to West Africa and appears to be less virulent in its effects. Viruses related to HIV have been found in many species of non-human primates (monkeys, apes, ...) and have been named Simian Immunodeficiency Virus, SIV. HTLV-1 is another, more distantly related, member of the family of retroviruses to which HIV and SIV belong.
The "Pol" gene, which is present in the genome of all these viruses, encodes three different polypeptides important for the viral life cycles: integrase, reverse transcriptase, and protease. It is expressed as a single polyprotein and is subsequently cleaved by protease into its three separate parts. In this exercise you will use a data set consisting of 21 different POL-polyprotein sequences from HIV1, HIV2, chimpanzee SIV, sooty mangabey SIV, and HTLV-1:
>HTLV P03362 POL_HTL1A POL polyprotein (HTLV-I).
GKKAACNLANTGASRPWARTPPKAPRNQPVPFKPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANG
TWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTIDLRDAFFQIPLPKQFQPYFAFTVPQQCNYGP
GTRYAWKVLPQGFKNSPTLFEMQLAHILQPIRQAFPQCTILQYMDDILLASPSHEDLLLLSEATMASLISHG
LPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYC
ALQRHTDPRDQIYLNPSQVQSLVQLRQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKEQWPLVWLHA
PLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTSDHPSVPILLHHSHRFKNLGA
QTGELWNTFLKTAAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSRAAYILWDKQILSQRSFPLPPPHKSA
QRAELLGLLHGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSH
TNLPDPISRLNALTDALLITPVLQLSPAELHSFTHCGQTALTLQGATTTEASNILRSCHACRGGNPQHQMPR
GHIRRGLLPNHIWQGDITHFKYKNTLYRLHVWVDTFSGAISATQKRKETSSEAISSLLQAIAHLGKPSYINT
DNGPAYISQDFLNMCTSLAIRHTTHVPYNPTSSGLVERSNGILKTLLYKYFTDKPDLPMDNALSIALWTINH
LNVLTNCHKTRWQLHHSPRLQPIPETRSLSNKQTHWYYFKLPGLNSRQWKGPQEALQEAAGAALIPVSASSA
QWIPWRLLKRAACPRPVGGPADPKEKDLQHHG
>HIV1B5 P04587 POL polyprotein [Contains: Protease (Retro
FFREDLAFLQGKAREFSSEQTRANSPTISSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQIT
LWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVL
VGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISK
IGPENPYNTPVFAIKKKDSTKWRKLVDFRELNRRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLD
EDFRKYTAFTIPSINNETPGSGYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDL
EIGQHRTKIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTIQPIVLPEKDSWTVNDIQKLVGKLN
WASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQW
TYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQAT
WIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAASRETKLGKAGYVTNRGRQKVVTLTHTTNQKTELQA
IHLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVS
AGIRKILFLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDC
THLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTSATVKAACWWAGIKQ
EFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQ
TKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCV
ASRQDED
>HIV1H2 P04585 POL polyprotein [Contains: Protease (Retro
FFREDLAFLQGKAREFSSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQVTLWQRPLVTIKIG
GQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRN
LLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVF
AIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIP
SINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEEL
RQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVR
QLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNL
KTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPP
LVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEV
NIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGI
DKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAV
HVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQG
VVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQ
NFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
>HIV1MN P05961 POL polyprotein [Contains: Protease (Retro
FFREDLAFLQGKAEFSSEQNRANSPTRRELQVWGRDNNSLSEAGEEAGDDRQGPVSFSFPQITLWQRPIVTI
KIGGQLKEALLDTGADDTVLGEMNLPRRWKPKMIGGIGGFIKVRQYDQITIGICGHKAIGTVLVGPTPVNII
GRNLLTQLGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALIEICTEMEKEGKISKIGPENPYNT
PVFAIKKKDSTKWRKLVDFRELNKKTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAF
TIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRAKI
EELRRHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGI
KVKQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEVQKQGQGQWTYQIYQEPF
KNLKTGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFRLPIQKETWETWWTEYTXATWIPEWEVVN
TPPLVKLWYQLEKEPIVGAETFYVDGAANRETKKGKAGYVTNRGRQKVVSLTDTTNQKTELQAIHLALQDSG
LEVNIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFL
DGIDKAQEDHEKYHSNWRAMASDFNLPPIVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVIL
VAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGPNFTSTTVKAACWWTGIKQEFGIPYNPQ
SQGVIESMNKELKKIIGQVRDQAEHLKRAVQMAVFIHNFKRKGGIGGYSAGERIVGIIATDIQTKELQKQIT
KIQNFRVYYRDSRDPLWKGPAKLLWKGEGAVVIQDNNDIKVVPRRKAKVIRDYGKQTAGDDCVASRQDED
>HIV1N5 P12497 POL polyprotein [Contains: Protease (Retro
FFREDLAFPQGKAREFSSEQTRANSPTRRELQVWGRDNNSLSEAGADRQGTVSFSFPQITLWQRPLVTIKIG
GQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVGQYDQILIEICGHKAIGTVLVGPTPVNIIGRN
LLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVF
AIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKQKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIP
SINNETPGIRYQYNVLPQGWKGSPAIFQCSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEEL
RQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVR
QLCKLLRGTKALTEVVPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNL
KTGKYARMKGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPP
LVKLWYQLEKEPIIGAETFYVDGAANRETKLGKAGYVTDRGRQKVVPLTDTTNQKTELQAIHLALQDSGLEV
NIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDGLVSAGIRKVLFLDGI
DKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAV
HVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTVHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQG
VIESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQ
NFRVYYRDSRDPVWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
>HIV1ND P18802 POL polyprotein [Contains: Protease (Retro
FFREDLAFPQGKAGEFSSEQTRANSPTSRELRVWGGDNPLSETGAERQGTVSFSFPQITLWQRPLVTIKIGG
QLKEALLDTGADDTVLEEINLPGKWKPKMIGGIGGFIKVRQYDQILIEICGYKAMGTVLVGPTPVNIIGRNL
LTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALTEICTEMEKEGKISRIGPENPYNTPIFA
IKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPS
INNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPEIVIYQYMDDLYVGSDLEIGQHRTKIEELR
EHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPINLPEKESWTVNDIQKLVGKLNWASQIYAGIKVKQ
LCKLLRGTKALTEVVPLTEEAELELAENREILKEPVHGVYYDPSKDLIAELQKQGDGQWTYQIYQEPFKNLK
TGKYARTRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWETWWIEYWQATWIPEWEFVNTPPL
VKLWYQLEKEPIIGAETFYVDGAANRETKLGKAGYVTDRGRQKVVPFTDTTNQKTELQAINLALQDSGLEVN
IVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSQGIRKVLFLDGID
KAQEEHEKYHNNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVH
VASGYIEAEVIPAETGQETAYFLLKLAGRWPVKVVHTDNGSNFTSATVKAACWWAGIKQEFGIPYNPQSQGV
VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTRELQKQIIKIQN
FRVYYRDSRDPIWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKVKIIRDYGKQMAGDDCVASRQDED
>HIV1OY P20892 POL polyprotein [Contains: Protease (Retro
FFREDLAFPQGKAREFSSEQTRANSPTSRELRVWGRDNNSPSEAGADRQGTVSFNLPQITLWQRPIVTIKIG
GQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRN
LLTQLGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKVLIEICTEMEKEGKISKVGPENPYNTPVF
AIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIP
SINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEEL
RQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIMLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVK
NLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLVAELQKQGQGQWTYQIYQEPFKNL
KTGKYARMRGAHTNDVKQLTEAVQKITQESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPP
LVKLWYQLEKDPIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEV
NIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGI
DKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKIILVAV
HVASGYIEAEVIPAETGQETAYFILKLAGRWPVKTIHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQG
VVESMNNELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQ
NFRVYYRDSREPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
>HIV1PV P03368 POL polyprotein [Contains: Protease (Retro
FFREDLAFLQGKAREFSSEQTRANSPTISSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQIT
LWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVL
VGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISK
IGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLD
EDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDL
EIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLN
WASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQW
TYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQAT
WIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETRLGKAGYLTNKGRQKVVPLTNTTNQKTELQA
IYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKQKVYLAWVPAHKGIGGNEQVDKLVS
AGIRKILFLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDC
THLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTSATVKAACWWAGIKQ
EFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQ
TKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCV
ASRQDED
>HIV1U4 P24740 POL polyprotein [Contains: Protease (Retro
FFRENLAFQQGEAREFSSEQTRANSPTSRNLWDGGKDDLPCETGAERQGTDSFSFPQITLWQRPLVTVKIGG
QLIEALLDTGADDTVLEDINLPGKWKPKIIGGIGGFIKVRQYDQILIEICGKKTIGTVLVGPTPVNIIGRNM
LTQIGCTLNFPISPIETVPVKLKPEMDGPKVKQWPLTEEKIKALTEICNEMEKEGKISKIGPENPYNTPVFA
IKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHTAGLKKKKSVTVLDVGDAYFSVPLDESFRKYTAFTIPS
INNETPGVRYQYNVLPQGWKGSPSIFQSSMTKILEPFRSQHPDIVIYQYMDDLYVGSDLEIGQHRAKIEELR
AHLLSWGFITPDKKHQKEPPFLWMGYELHPDKWTVQPIQLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQ
LCKLLRGAKALTDIVTLTEEAELELAENREILKDPVHGVYYDPSKDLVAEIQKQGQDQWTYQIYQEPFKNLK
TGKYARKRSAHTNDVKQLTEVVQKVSTESIVIWGKIPKFRLPIQKETWEAWWMEYWQATWIPEWEFVNTPPL
VKLWYQLEKDPIAGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTETTNQKTELHAIHLALQDSGSEVN
IVTDSQYALGIIQAQPDRSESEIVNQIIEKLIEKEKVYLSWVPAHKGIGGNEQVDKLVSSGIRKVLFLDGID
KAQEDHEKYHCNWRAMASDFNLPPVVAKEIVASCNKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVH
VASGYIEAEVIPAETGQETAYFILKLAGRWPVKVIHTDNGSNFTSAAVKAVCWWANIQQEFGIPYNPQSQGV
VESMNKELKKIIGQVREQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTKELQKQISKIQN
FRVYYRDSRDPIWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCMAGRQDED
>HIV1Z2 P12499 POL polyprotein [Contains: Protease (Retro
FFREDLAFPQGKAGELSSEQTRANSPTSRELRVWGRDNPLSETGAERQGTVSFNCPQITLWQRPLVTIKIGG
QLKEALLDTGADDTVLEEMNLPGKWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNL
LTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALTEICTEMEKEGKISRVGPENPYNTPIFA
IKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPS
INNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPEIVIYQYMDDLYVGSDLEIGQHRTKIEELR
EHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQSIKLPEKESWTVNDIQKLVGKLNWASQIYPGIKVRQ
LCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGHGQWTYQIYQEPFKNLK
TGKYARMRGAHTNDVKQLAEVVQKISTESIVIWGKTPKFRLPIQKETWETWWVEYWQATWIPEWEFVNTPPL
VKLWYQLEKEPIIGAETFYVDGAANRETKLGKAGYVTDRGRQKVVPFTDTTNQKTELQAINLALQDSGLEVN
IVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSQGIRKVLFLDGID
KAQEEHEKYHNNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVH
VASGYIEAEVIPAETGQETAYFILKLAGRWPVKIVHTDNGSNFTSAAVKAACWWAGIKQEFGIPYNPQSQGV
VESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIIDIIATDIQTKELQKQITKIQN
FRVYYRDSRDPIWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKVKIIRDYGKQMAGDDCVASRQDED
>HIV2CA P24107 POL polyprotein [Contains: Protease (Retro
TGGFFRDWPLGKEAPQFPRGPSSTGANTNSTPIGSSSGSTGEIYAAREKAEGAETETIQRGDRGLTAPRTRR
GPMQGDNRGLAAPQFSLWKRPVVTAHIEGQPVEVLLDTGADDSIVAGIELGSNYSPKIVGGIGGFINTKEYK
NVEIEVLGKRVRATIMTGDTPINIFGRNILTALGMSLNLPVAKIEPIKIMLKPGKDGPRLRQWPLTKEKIEA
LKEICEKMEKEGQLEEAPPTNPYNTPTFAIRKKDKNKWRMLIDFRELNKVTQDFTEIQLGIPHPAGLAKKRR
ITVLDVGDAYFSIPLHEDFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQYTMRQVLEPFRKANSD
VIIIQYMDDILIASDRTDLEHDKVVLQLKELLNNLGFSTPDEKFQKDPPYRWMGYELWPTKWKLQKIQLPQK
EVWTVNDIQKLVGVLNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENRIILSQEQEGHYYQE
EKELEATVQKDQDNQWTYKIHQEEKILKVGKYAKIKHTHTNGVKLLAQVVQKIGKEALVIGRIPKFHLPVER
EVWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVGDPIPGTETFYTDGSCNRQSKEGKAGYVTDRGRDKVK
ILEQTTNQQAELEAFAMALTDSGPKANIIVDSQYVMGIVAGQPTESENRIVNQIIEEMIKKEAIYVAWVPAH
KGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHTNVKELCHKFDIPQLVARQIVNTCAQYQQKGEAIH
GQVNAEVGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRWPITHLHTDNGANFTS
QEVKMVAWWVGIEQTFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTVETIVLMAVHCMNFKRRGGIGDMT
PSERLINMITTEQEIQFLQAKNSKLKNFRVYFREGRDQLWKGPGELLWKGDGAVIVKVGTDIKIIPRRKAKI
IRDYGGRQELDSSSHLEGARENGEVA
>HIV2D1 P17757 POL polyprotein [Contains: Protease (Retro
VLELWKGGTLGETVPSTQKTGLLEVWQVRTHHGKLPGKTGRFFRDGPTGKAAPQLPRGPSSSGADTNSTPNR
SSSGPVGEIYAAREKAERAEGETIQGGDGGLTAPRAGRDAPQRGDRGLATPQFSLWKRPVVTAFIEDQPVEV
LLDTGADDSIVAGIELGDNYTPKIVGGIGGFINTKEYKNVEIKVLNKRVRATIMTGDTPINIFGRNILATLG
MSLNLPVAKLDPIKVTLKPGKDGPRLKQWPLTKEKIEALKEICEKMEREGQLEEAPPTNPYNTPTFAIKKKD
KNKWRMLIDFRELNRVTQDFTEIQLGIPHPAGLAKKKRITVLDVGDAYFSIPLHEDFRQYTAFTLPSVNNAE
PEKRYVYKVLPQGWKGSPAIFQFMMRQILEPFRKANPDVILIQYMDDILIASDRTGLEHDKVVLQLKELLNG
LGFSTPDEKFQKDPPFQWMGYELWPTKWKLQKIQLPQKEIWTVNDIQKLVGVLNWAAQIYPGIKTKHLCKLI
RGKMTLTEEVQWTELAEAELEENKIILSQEQEGSYYQEEEELEATVIKSQDNQWAYKIHQGERVLKVGKYAK
IKNTHTNGVRLLAQVVQKIGKEALVIWGRVPKFHLPVERDTWEQWWDNYWQVTWVPEWDFVSTPPLVRLTFN
LVGDPIPGTETFYTDGSCNRQSKEGKAGYVTDRGRDRVRVLEQTSNQQAELEAFAMALADSGPKVNIIVDSQ
YVMGIVAGQPTESENRIVNQIIEDMIKKEAVYVAWVPAHKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEH
EKYHSNIKELTHKFGIPQLVARQIVNTCAQCQQKGEAIHGQVNAEIGVWQMDCTHLEGKIIIVAVHVASGFI
EAEVIPQESGRQTALFLLKLASRWPITHLHTDNGPNFTSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNH
HLKNQISRIREQANTIETIVLMAVHCMNFKRRGGIGDMTPAERLINMITTEQEIQFLQRKNSNFKKFQVYYR
EGRDQLWKGPGELLWKGDGAVIVKVGADIKVVPRRKAKIIRDYGGRQELDSSSHLEGAREDGEVA
>HIV2G1 P18042 POL polyprotein [Contains: Protease (Retro
MWQDRTRHGKMPRKTGRFFRDGSMGKEAPQLPRGPSSSGADTNSTPSRSSSGSIGKIYAAGERAEGAEGETI
QRGDGRLTAPRAGKSTSQRGDRGLAAPQFSLWKRPVVTAYIEVQPVEVLLDTGADDSIVAGIQLGDNYVPKI
VGGIGGFINTKEIKNIEIKVLNKRVRATIMTGDTPINIFGRNILTALGMSLNLPIAKIEPIKVTLKPGKDGP
RLRQWPLTKEKIEALREICEKMEKEGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNRVTQDFTEIQ
LGIPHPAGLAKKKRITVLDVGDAYFSIPLHEDFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHT
MRQVLEPFRKANPDVILIQYMDDILIASDRTGLEHDKVVLQLKELLNGLGFSTPDEKFQKDPPLQWMGYELW
PTKWKLQKLQLPQKEIWTVNDIQKLVGVLNWAAQIYPGIKTKHLCRLIKGKMTLTEEVQWTELAEAELEENK
IILSQEQEGYYYQEEKELEATIQKNQDNQWTYKIHQEEKILKVGKYAKIKNTHTNGVRLLAQVVQKIGKEAL
VIWGRIPKFHLPVERETWEQWWDNYWQVTWIPEWDFVSTPPLVRLTFNLVGDPIPGAETFYTDGSCNRQSKE
GKARYVTDRGRDKVRVLERTTNQQAELEAFAMTLTDSGPKVNIIVDSQYVMGIVVGQPTESESRIVNQIIED
MIKKEAVYVAWVPAHKGIGGNQEVDHLVSQGIRQVLFLERIEPAQEEHEKYHSNMKELTHKFGIPQLVARQI
VNTCAQCQQKGEAIHGQVNAEIGVWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRW
PITHLHTDNGSNFTSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTIETIVLMAV
HCMNFKRRGGIGDMTPAERLINMITTEQEIQFLQRKNSNFKNFQVYYREGRDQLWKGPGELLWKGDGAVIVK
VGADIKVIPRRKAKIIRDYGGRQELDSSHLEGAREEDGEVA
>HIV2KR Q74120 POL polyprotein [Contains: Protease (Retro
TGWFFRDWPMGKEASQLPRDPSPAGADTNSTPSRPSSRPAREVLAAREEAERAENETIQGGDRGLTAPRTRR
DTTQRGDRGFAAPQFSLWKRPVVTAYVEGQPVEVLLDTGADDSIVAGIELGSNYSPKIVGGIGGFINTKEYK
NVEIKVLNKKVKATIMTGDTPINIFGRNILTALGMSLNLPVAKVDPIKVILKPGKDGPKVRQWPLTKEKIEA
LKEICEKMEREGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQEFTEIQLGIPHPAGLAKKRR
ITVLDIGDAYFSIPLHEDFRQYTAFTLPTVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRQVLEPFRKANPD
VILVQYMDDILIASDRTDLEHDRTVLQLKELLNGLGFSTPDEKFQKDPPYKWMGYELWPTKWKLQKIQLPQK
EVWTVNDIQKLVGVLNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENKIILSQEQEGCYYQE
EKELEATVQKDQDNQWTYKIHQGEKILKVGKYAKIKNTHTNGVRLLAHVVQKIGKEALVIWGRIPKFHLPVE
RETWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVKDPIPGEETFYTDGSCNRQSKEGKAGYITDRGRDKV
RILEQTTNQQAELEAFAMALTDSGPKANIIVDSQYVMGIVAGQPTESESKLVNQIIEEMIKKETLYVAWVPA
HKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHSNVKELSHKFGLPKLVARQIVNTCAQCQQKGEAI
HGQVDAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQETGRQTALFLLKLASRWPITHLHTDNGANFT
SQEVKMVAWWTGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTMETIVLMAVHCMNFKRRGGIGDM
TPAERLINMITTEQEIQFLHAKNSKLKNFRVYFREGRDQLWKGPGELLWKGDGAVIVKVGTDIKIVPRRKAK
IIRDYGGRREVDSSSHLEGTREDGEVA
>HIV2RO P04584 POL polyprotein [Contains: Protease (Retro
TGRFFRTGPLGKEAPQLPRGPSSAGADTNSTPSGSSSGSTGEIYAAREKTERAERETIQGSDRGLTAPRAGG
DTIQGATNRGLAAPQFSLWKRPVVTAYIEGQPVEVLLDTGADDSIVAGIELGNNYSPKIVGGIGGFINTKEY
KNVEIEVLNKKVRATIMTGDTPINIFGRNILTALGMSLNLPVAKVEPIKIMLKPGKDGPKLRQWPLTKEKIE
ALKEICEKMEKEGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEIQLGIPHPAGLAKKR
RITVLDVGDAYFSIPLHEDFRPYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRQVLEPFRKANK
DVIIIQYMDDILIASDRTDLEHDRVVLQLKELLNGLGFSTPDEKFQKDPPYHWMGYELWPTKWKLQKIQLPQ
KEIWTVNDIQKLVGVLNWAAQLYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENRIILSQEQEGHYYQ
EEKELEATVQKDQENQWTYKIHQEEKILKVGKYAKVKNTHTNGIRLLAQVVQKIGKEALVIWGRIPKFHLPV
EREIWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVGDPIPGAETFYTDGSCNRQSKEGKAGYVTDRGKDK
VKKLEQTTNQQAELEAFAMALTDSGPKVNIIVDSQYVMGISASQPTESESKIVNQIIEEMIKKEAIYVAWVP
AHKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHSNVKELSHKFGIPNLVARQIVNSCAQCQQKGEA
IHGQVNAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRWPITHLHTDNGANF
TSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTIETIVLMAIHCMNFKRRGGIGD
MTPSERLINMITTEQEIQFLQAKNSKLKDFRVYFREGRDQLWKGPGELLWKGEGAVLVKVGTDIKIIPRRKA
KIIRDYGGRQEMDSGSHLEGAREDGEMA
>HIV2SB P12451 POL polyprotein [Contains: Protease (Retro
TGWFFRAWTMGKEAPQLPRGPKFAGANTNSTPNGSSSGPTGEVHAAREKTERAETKTIQRSDRGLAASRARR
DTTQRDDRGLAAPQFSLWKRPVVTAYIEDQPVEVLLDTGADDSIVAGIELGSNYSPKIVGGIGGFINTKEYK
DVEIRVLNKKVRATIMTGDTPINIFGRNILTALGMSLNLPVAKIEPVKVTLKPGKDGPKQRQWPLTREKIEA
LREICEKMEREGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEVQLGIPHPAGLAKKRR
ITVLDVGDAYFSIPLYEDFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQYTMRQVLEPFRKANPD
VIIVQYMDDILIASDRTDLEHDKVVLQLKELLNGLGFSTPDEKFQKDPPYQWMGYELWPTKWKLQKIQLPQK
EVWTVNDIQKLVGVLNWAAQIYPGIKTKHLCKLIRGKMTPTEEVQWTELAEAELEENKIILSQEQEGHYYQE
EKELEATVQKDQDNQWTYKVHQGEKILKVGKYAKIKNTHTNGVRLLAQVVQKIGKEALVIWGRIPKFHLPVE
RETWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVKDPIPGAETFYTDGSCNRQSKEGKAGYITDRGKDKV
RILEQTTNQQAELEAFAMAVTDSGPKVNIVVDSQYVMGIVTGQPAESESRIVNKIIEEMIKKEAIYVAWVPA
HKGIGGNQEIDHLVSQGIRQVLFLERIEPAQEEHGKYHSNVKELAHKFGLPNLVARQIVNTCAQCQQKGEAI
HGQVNAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLLKLASRWPITHLHTDNGANFT
SQEVKMVAWWVGIEQSFGVPYNPQSQGVVEAMNHHLKNQIERIREQANTMETIVLMAVHCMNFKRRGGIGDM
TPVERLVNMITTEQEIQFLQAKNSKLKNFRVYFREGRNQLWQGPGELLWKGDGAVIVKVGTDIKVIPRRKAK
IIRDYGPRQEMDSGSHLEGAREDGEMA
>HIV2ST P20876 POL polyprotein [Contains: Protease (Retro
KTRLLEMWQGRTHHGKMPRKTGGFFRVGPMGKEAPQFPCGPNPAGADTNSTPDRPSRGPTREVHAAREKAER
AEREAIQRSDRGLPAARETRDTMQRDDRGLAAPQFSLWKRPVVTAHVEGQPVEVLLDTGADDSIVAGVELGS
NYSPKIVGGIGGFINTKEYKNVEIRVLNKRVRATIMTGDTPINIFGRNILTALGMSLNLPVAKIEPIKIMLK
PGKDGPKLRQWPLTKEKIEALKEICEKMEREGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQ
DFTEIQLGIPHPAGLAKKKRITVLDVGDAYFSIPLHEDFRQYTAFTLPSINNAEPGKRYIYKVSPQGWKGSP
AIFQYTMRQVLEPFRKANPDIILIQYMDDILIASDRTDLEHDRVVLQLKELLNGLGFSTPDEKFQKDPPYQW
MGYELWPTKWKLQRIQLPQKEVWTVNDIQKLVGVLNWAAQIYPGIKTRNLCRLIRGKMTLTEEVQWTELAEA
ELEENKIILSQEQEGCYYQEEKELEATVQKDQDNQWTYKIHQGGKILKVGKYAKVKNTHTNGVRLLAQVVQK
IGKEALVIWGRIPKFHLPVERDTWEQWWDNYWQVTWIPDWDFISTPPLVRLVFNLVKDPILGAETFYTDGSC
NKQSREGKAGYITDRGRDKVRLLEQTTNQQAELEAFAMAVTDSGPKANIIVDSQYVMGIVAGQPTESESKIV
NQIIEEMIKKEAIYVAWVPAHKGIGGNQEVDHLVSQGIRQVLFLEKIEPAQEEHEKYHSNVKELSHKFGLPK
LVARQIVNTCTQCQQKGEAIHGQVNAELGTWQMDCTHLEGKIIIVAVHVASGFIEAEVIPQESGRQTALFLL
KLASRWPITHLHTDNGANFTSQEVKMVAWWIGIEQSFGVPYNPQSQGVVEAMNHHLKNQISRIREQANTVET
IVLMAVHCMNFKRRGGIGDMTPAERLINMVTAEQEIQFLQAKNSKLQNFRVYFREGRDQLWKGPGELLWKGD
GAVIVKVGADIKIIPRRKAKIIKDYGGRQEMDSGSNLEGAREDGEVA
>SIVCZ P17283 POL polyprotein [Contains: Protease (Retro
STKKKRLLAVWARGTPNERLHRKTGEFFRERLAFPQREARQLCAEQNRTNGPTDRELWVPGGREEPGEERGR
EQSISTNLPQITLWQRPLIPVKVEGQLCEALLDTGADDTVIERIQLQGLWKPKMIGGIGGFIKVKQFDNVHI
EIEGRKVVGTVLVGPTPVNIIGRNILTQLGCTLVFPISSIETVPVKLKPGMDGPKVKQWPLSAEKIKALTEI
CQEMEKEGKISKIGPENPYNTPIFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVL
DVGDAYFSCPLDKDFRKYTAFTIPSINNETPGVRYQYNVLPQGWKGSPSIFQSSMTKILEPFREKNPDITIY
QYMDDLYVGSDLEIDQHRKKVEELRQHLLKWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIQLPEKEVWT
VNDIQKLIGKLNWASQIYPGIKIKQLCKLIRGTKKLTDVVPLTPEAELELAENREIVSTPVHGVYYDPDKEL
IAEIQKQGNCQWTYQIFQEPHKNLKTGKYARQRSAHTNDIRQLAEAVQKIATESIVIWGKTPKFRLPVQKES
WEAWWAEYWQATWIPEWEFINTPPLVKLWYSLETEPIPTTDTYYVDGAANRETKTGKAGYVTDKGKQKIISL
ENTTNQQAELKALLLALQDSDQQVNIVTDSQYVLGIIQSQPDHSESELVNQIIEELIKKEKIYLSWVPAHKG
IGGNEQVDKLVSAGIRKVLFLDGIDRAQEEHERYHSNWKAMASDFNLPPIVAKEIVAHCDKCQVKGEAMHGQ
VDCSPGIWQVDCTHLEGKVIIVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGPNFTSAA
VKAACWWADIKQEFGIPYNPQSQGVVESLNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYTAG
ERIIDIIATDIQTSELQKQILKVQKFRVYYRDSRDPIWKGPATLLWKGEGAVVIQDQGELKVVPRRKAKIIR
DYGKQMAGDDCVASRQNED
>Smanga_S4 P12502 POL polyprotein [Contains: Protease (Retro
KTGGFFRAWPMGKEAPQFPHGPDASGADTNCSPRGSSCGSTEELHEDGQKAEGEQRETLQGGDRGFAAPQFS
LWRRPVVTAYIEEQPVEVLLDTGADDSIVAGIELGPNYTPKIVGGIGGFINTKEYKDVKIKVLGKVIKGTIM
TGDTPINIFGRNLLTAMGMSLNLPIAKVEPIKVTLKPGKEGPKLRQWPLSKEKIIALREICEKMEKDGQLEE
APPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEVQLGIPHPAGLAKRRRITVLDVGDAYFSIPLD
EEFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQYTMRNVLEPFRKANPDVTLIQYMDDILIASDR
TDLEHDRVVLQLKELLNGIGFSTPEEKFQKDPPFQWMGYELWPTKWKLQKIELPQRETWTVNDIQKLVGVLN
WAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTEMAEAEYEENKIILSQEQEGCYYQEGKPIEATVIKSQDNQW
SYKIHQEDKVLKVGKFAKVKNTHTNGVRLLAHVVQKIGKEALVIWGEVPKFHLPVEREIWEQWWTDYWQVTW
IPDWDFVSTPPLVRLVFNLVKEPIQGAETFYVDGSCNRQSREGKAGYVTDRGRDKAKLLEQTTNQQAELEAF
YLALADSGPKANIIVDSQYVMGIIAGQPTESESRLVNQIIEEMIKKEAIYVAWVPAHKGIGGNQEVDHLVSQ
GIRQVLFLKKIEPAQEEHEKYHSNVKELVFKFGLPRLVAKQIVDTCDKCHQKGEAIHGQVNAELGTWQMDCT
HLEGKIIIVAVHVASGFIEAEVIPQETGRQTALFLLKLAGRWPITHLHTDNGANFTSQEVKMVAWWAGIEQT
FGVPYNPQSQGVVEAMNHHLKTQIDRIREQANSIETIVLMAVHCMNFKRRGGIGDMTPAERLVNMITTEQEI
QFQQSKNSKFKNFRVYYREGRDQLWKGPGELLWKGEGAVILKVGTEIKVVPRRKAKIIKDYGGGKELDSGSH
LEDTGEAREVA
>Smanga_SP P19505 POL polyprotein [Contains: Protease (Retro
MPRKTSGFFRAWPMGKEAPQFPHGPDASGADTNCSPRGSSCGSTEELHEDGQKAEGEQRETLQGGNGGFAAP
QFSLWRRPIVTAYIEEQPVEVLLDTGADDSIVAGIELGPNYTPKIVGGIGGFINTKEYKDVKIKVLGKVIKG
TIMTGDTPINIFGRNLLTAMGMSLNLPIAKVEPIKVTLKPGKDGPKLRQWPLSKEKIIALREICEKMEKDGQ
LEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEVQLGIPHPAGLAKRRRITVLDVGDAYFSI
PLDEEFRQYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRNVLEPFRKANPDVTLIQYMDDILIA
SDRTDLEHDRVVLQLKELLNSIGFSTPEEKFQKDPPFQWMGYELWPTKWKLQKIELPQRETWTVNDIQKLVG
VLNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTEMAEAEYEENKIILSQEQEGCYYQEGKPLEATVIKSQD
NQWSYKIHQEDKILKVGKFAKIKNTHTNGVRLLAHVVQKIGKEAIVIWGQVPRFHLPVEREIWEQWWTDYWQ
VTWIPEWDFVSTPPLVRLVFNLVKEPIQGAETFYVDGSCNRQSREGKAGYVTDRGRDKAKLLEQTTNQQAEL
EAFYLALADSGPKANIIVDSQYVMGIVAGQPTESESRLVNQIIEEMIKKEAIYVAWVPAHKGIGGNQEVDHL
VSQGIRQVLFLEKIEPAQEEHEKYHSNVKELVFKFGLPRLVAKQIVDTCDKCHQKGEAIHGQVNAELGTWQM
DCTHLEGKIIIVAVHVASGFIEAEVIPQETGRQTALFLLKLASRWPITHLHTDNGANFTSQEVKMVAWWAGI
EQTFGVPYNPQSQGVVEAMNHHLKTQIDRIREQANSIETIVLMAVHCMNFKRRGGIGDMTPAERLVNMITTE
QEIQFQQSKNSKFKNFRVYYREGRDQLWKGPGELLWKGEGAVILKVGTEIKVVPRRKAKIIKDYGGGKELDS
GSHLEDTGEAREVA
- Align the Pol sequences using the mafft server at EBI with default settings.
Once the alignment is done, save the resulting alignment as a fasta file: right-click the "Download alignment file" button on the mafft output page, and then save the file using "Save linked file as" (or whatever it is called in your particular browser). Make sure you can find the file again!
- Open the TreeHugger web server. (The TreeHugger server constructs a neighbor joining tree from an aligned set of sequences).
- Select the option to upload a file (see figure below), then choose the Pol-protein alignment file you just saved on your harddisk, and finally click "Submit Query" to construct the neighbor joining tree:
- When the run is done, right-click the "Download data in Newick/Phylip format" link to save the tree file as a text file on your harddisk (again make sure you can find it later). You will notice that the treefile is in the parenthesis-based format we discussed previously in the lecture:
- Open the FigTree treeviewer that you have previously installed on your own computer and use File->Open to open the treefile you just saved.
- The view that you will see first is presumably a rooted view similar to the one below. However, it is important to realize that we have not explicitly rooted the tree yet, so the root in this view has been chosen randomly. A more realistic view can be seen by clicking the unrooted view button (see figures below):
- The last figure above shows the unrooted tree. For now, however, go back to the (pseudo)rooted view you started out with. We wil now place the root by using the HTLV Pol sequence as a so-called outgroup. Click the branch leading to the HTLV sequence such that it gets selected (see figure below). Then click the "Reroot" button, which will subsequently root the tree on the selected outgroup:
The rationale for using an outgroup to place the root of the tree is as follows: our data set consists of sequences from HIV-1, HIV-2, SIV and HTLV. We know from other evidence that the lineage leading to HTLV branched off before any of the remaining viruses diverged from each other. The root of the tree connecting the organisms investigated here, must therefore be located between the HTLV sequence (the "outgroup") and the rest (the "ingroup"). This way of finding a root is called "outgroup rooting".
- Inspect the rooted tree that you get as a result of rerooting and consider what this tells you about the origin of HIV viruses.
When you have pondered the problem for a while you can read this short explanation that I have prepared: Origin of HIV1 and HIV2.