LOCUS HUMINS01 4044 bp DNA PRI 06-JAN-1995 DEFINITION Human insulin gene, complete cds. ACCESSION J00265 VERSION J00265.1 GI:186429 KEYWORDS GC rich region; insulin; polymorphic variation; tandem repeat. SEGMENT 1 of 2 SOURCE Human cDNA ([1],[3]) and DNA ([2],[4],[5],[6]). ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 2414 to 2610) AUTHORS Bell,G.I., Swain,W.F., Pictet,R., Cordell,B., Goodman,H.M. and Rutter,W.J. TITLE Nucleotide sequence of a cDNA clone encoding human preproinsulin JOURNAL Nature 282 (5738), 525-527 (1979) MEDLINE 80054779 REFERENCE 2 (bases 1925 to 3715) AUTHORS Bell,G.I., Pictet,R.L., Rutter,W.J., Cordell,B., Tischer,E. and Goodman,H.M. TITLE Sequence of the human insulin gene JOURNAL Nature 284 (5751), 26-32 (1980) MEDLINE 80120725 REFERENCE 3 (bases 2411 to 2610) AUTHORS Sures,I., Goeddel,D.V., Gray,A. and Ullrich,A. TITLE Nucleotide sequence of human preproinsulin complementary DNA JOURNAL Science 208 (4439), 57-59 (1980) MEDLINE 80147417 REFERENCE 4 (bases 1928 to 3651) AUTHORS Ullrich,A., Dull,T.J., Gray,A., Brosius,J. and Sures,I. TITLE Genetic variation in the human insulin gene JOURNAL Science 209 (4456), 612-615 (1980) MEDLINE 80236313 REFERENCE 5 (bases 1 to 2227) AUTHORS Bell,G.I., Selby,M.J. and Rutter,W.J. TITLE The highly polymorphic region near the human insulin gene is composed of simple tandemly repeating sequences JOURNAL Nature 295 (5844), 31-35 (1982) MEDLINE 82125365 REFERENCE 6 (bases 3615 to 4044; 917 to 1428; 1828 to 2185) AUTHORS Ullrich,A., Dull,T.J., Gray,A., Philips,J.A. III. and Peter,S. TITLE Variation in the sequence and modification state of the human insulin gene flanking regions JOURNAL Nucleic Acids Res. 10 (7), 2225-2240 (1982) MEDLINE 82221404 COMMENT The human insulin gene region consists of three exons and two introns coding for a signal peptide, a b-chain, a c-peptide, and an a-chain. Present evidence favors a single insulin gene per haploid genome; however, allelic and polymorphic variation are conspicuous. The two major alleles studied thus far are denoted alpha and beta. The 5' flanks for these are so different, largely because of the presence of tandem repeats not found elsewhere in the human genome, that separate entries have been made for this region (see J00266 and J00267). Thus differences in the first 2000 bases are not annotated below. This sequence heterogeneity is generated largely, though not exclusively, by a family of G+C-rich oligonucleotides whose consensus sequence is ACAGGGGTGTGGGG. In the 5' sequence reported below (from [5]), these occur most obviously between bases 1340 and 1823. While the variation in the 5' flank may be significant for gene expression, it has not been associated to date with diabetic conditions. [4],[5],[6] discuss this variation in detail. Variation in the form of base modification is observed in the 3' flanking sequence ([6]). Conflicts between [5],[6] in this region may ultimately prove to be polymorphic variations. This sequence of 4044 bases (which most closely represents the beta allele) was communicated with revisions by G.I.Bell. An additional stretch of about 950 bases in the 3' flank, which has not been published, is available through G.I.Bell or this library. See other loci beginning and other loci with ins as the 4th-6th characters of the locus name. FEATURES Location/Qualifiers source 1..4044 /organism="Homo sapiens" /db_xref="taxon:9606" /dev_stage="foetus" /tissue_type="liver" /map="11p15.5" exon 2186..2227 /gene="INS" /note="G00-119-349" /number=1 intron 2228..2406 /gene="INS" /note="G00-119-349" /number=1 variation 2401 /gene="INS" /note="a in alpha-allele; t in beta allele ([4])" CDS join(2424..2610,3397..3542) /gene="INS" /note="precursor" /codon_start=1 /db_xref="GDB:G00-119-349" /product="insulin" /protein_id="AAA59172.1" /db_xref="GI:386828" /translation="MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCG ERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSL YQLENYCN" sig_peptide 2424..2495 /gene="INS" /note="G00-119-349" mat_peptide join(2496..2585,2586..2610,3397..3476,3477..3539) /gene="INS" /note="c peptide; G00-119-349" intron 2611..3396 /gene="INS" /note="G00-119-349" /number=2 variation 3229 /gene="INS" /note="c in alpha-allele; g in beta-allele ([4])" exon 3397..3615 /partial /gene="INS" /note="G00-119-349" /number=2 variation 3551 /gene="INS" /note="c in alpha-allele; t in beta-allele ([4])" variation 3564 /gene="INS" /note="c in alpha-allele; a in beta-allele ([4])" BASE COUNT 680 a 1239 c 1417 g 708 t ORIGIN Chromosome 11p15.5; near an AvaI site([Nature 295, 31-35 (1982)]). 1 ctcgaggggc ctagacattg ccctccagag agagcaccca acaccctcca ggcttgaccg 61 gccagggtgt ccccttccta ccttggagag agcagcccca gggcatcctg cagggggtgc 121 tgggacacca gctggccttc aaggtctctg cctccctcca gccaccccac tacacgctgc 181 tgggatcctg gatctcagct ccctggccga caacactggc aaactcctac tcatccacga 241 aggccctcct gggcatggtg gtccttccca gcctggcagt ctgttcctca cacaccttgt 301 tagtgcccag cccctgaggt tgcagctggg ggtgtctctg aagggctgtg agcccccagg 361 aagccctggg gaagtgcctg ccttgcctcc ccccggccct gccagcgcct ggctctgccc 421 tcctacctgg gctcccccca tccagcctcc ctccctacac actcctctca aggaggcacc 481 catgtcctct ccagctgccg ggcctcagag cactgtggcg tcctggggca gccaccgcat 541 gtcctgctgt ggcatggctc agggtggaaa gggcggaagg gaggggtcct gcagatagct 601 ggtgcccact accaaacccg ctcggggcag gagagccaaa ggctgggtgt gtgcagagcg 661 gccccgagag gttccgaggc tgaggccagg gtgggacata gggatgcgag gggccggggc 721 acaggatact ccaacctgcc tgcccccatg gtctcatcct cctgcttctg ggacctcctg 781 atcctgcccc tggtgctaag aggcaggtaa ggggctgcag gcagcagggc tcggagccca 841 tgccccctca ccatgggtca ggctggacct ccaggtgcct gttctgggga gctgggaggg 901 ccggaggggt gtaccccagg ggctcagccc agatgacact atgggggtga tggtgtcatg 961 ggacctggcc aggagagggg agatgggctc ccagaagagg agtgggggct gagagggtgc 1021 ctggggggcc aggacggagc tgggccagtg cacagcttcc cacacctgcc cacccccaga 1081 gtcctgccgc cacccccaga tcacacggaa gatgaggtcc gagtggcctg ctgaggactt 1141 gctgcttgtc cccaggtccc caggtcatgc cctccttctg ccaccctggg gagctgaggg 1201 cctcagctgg ggctgctgtc ctaaggcagg gtgggaacta ggcagccagc agggagggga 1261 cccctccctc actcccactc tcccaccccc accaccttgg cccatccatg gcggcatctt 1321 gggccatccg ggactgggga caggggtcct ggggacaggg gtccggggac agggtcctgg 1381 ggacaggggt gtggggacag gggtctgggg acaggggtgt ggggacaggg gtgtggggac 1441 aggggtctgg ggacaggggt gtggggacag gggtccgggg acaggggtgt ggggacaggg 1501 gtctggggac aggggtgtgg ggacaggggt gtggggacag gggtctgggg acaggggtgt 1561 ggggacaggg gtcctgggga caggggtgtg gggacagggg tgtggggaca ggggtgtggg 1621 gacaggggtg tggggacagg ggtcctgggg ataggggtgt ggggacaggg gtgtggggac 1681 aggggtcccg gggacagggg tgtggggaca ggggtgtggg gacaggggtc ctggggacag 1741 gggtctgagg acaggggtgt gggcacaggg gtcctgggga caggggtcct ggggacaggg 1801 gtcctgggga caggggtctg gggacagcag cgcaaagagc cccgccctgc agcctccagc 1861 tctcctggtc taatgtggaa agtggcccag gtgagggctt tgctctcctg gagacatttg 1921 cccccagctg tgagcaggga caggtctggc caccgggccc ctggttaaga ctctaatgac 1981 ccgctggtcc tgaggaagag gtgctgacga ccaaggagat cttcccacag acccagcacc 2041 agggaaatgg tccggaaatt gcagcctcag cccccagcca tctgccgacc cccccacccc 2101 gccctaatgg gccaggcggc aggggttgac aggtagggga gatgggctct gagactataa 2161 agccagcggg ggcccagcag ccctcagccc tccaggacag gctgcatcag aagaggccat 2221 caagcaggtc tgttccaagg gcctttgcgt caggtgggct cagggttcca gggtggctgg 2281 accccaggcc ccagctctgc agcagggagg acgtggctgg gctcgtgaag catgtggggg 2341 tgagcccagg ggccccaagg cagggcacct ggccttcagc ctgcctcagc cctgcctgtc 2401 tcccagatca ctgtccttct gccatggccc tgtggatgcg cctcctgccc ctgctggcgc 2461 tgctggccct ctggggacct gacccagccg cagcctttgt gaaccaacac ctgtgcggct 2521 cacacctggt ggaagctctc tacctagtgt gcggggaacg aggcttcttc tacacaccca 2581 agacccgccg ggaggcagag gacctgcagg gtgagccaac cgcccattgc tgcccctggc 2641 cgcccccagc caccccctgc tcctggcgct cccacccagc atgggcagaa gggggcagga 2701 ggctgccacc cagcaggggg tcaggtgcac ttttttaaaa agaagttctc ttggtcacgt 2761 cctaaaagtg accagctccc tgtggcccag tcagaatctc agcctgagga cggtgttggc 2821 ttcggcagcc ccgagataca tcagagggtg ggcacgctcc tccctccact cgcccctcaa 2881 acaaatgccc cgcagcccat ttctccaccc tcatttgatg accgcagatt caagtgtttt 2941 gttaagtaaa gtcctgggtg acctggggtc acagggtgcc ccacgctgcc tgcctctggg 3001 cgaacacccc atcacgcccg gaggagggcg tggctgcctg cctgagtggg ccagacccct 3061 gtcgccagcc tcacggcagc tccatagtca ggagatgggg aagatgctgg ggacaggccc 3121 tggggagaag tactgggatc acctgttcag gctcccactg tgacgctgcc ccggggcggg 3181 ggaaggaggt gggacatgtg ggcgttgggg cctgtaggtc cacacccagt gtgggtgacc 3241 ctccctctaa cctgggtcca gcccggctgg agatgggtgg gagtgcgacc tagggctggc 3301 gggcaggcgg gcactgtgtc tccctgactg tgtcctcctg tgtccctctg cctcgccgct 3361 gttccggaac ctgctctgcg cggcacgtcc tggcagtggg gcaggtggag ctgggcgggg 3421 gccctggtgc aggcagcctg cagcccttgg ccctggaggg gtccctgcag aagcgtggca 3481 ttgtggaaca atgctgtacc agcatctgct ccctctacca gctggagaac tactgcaact 3541 agacgcagcc tgcaggcagc cccacacccg ccgcctcctg caccgagaga gatggaataa 3601 agcccttgaa ccagccctgc tgtgccgtct gtgtgtcttg ggggccctgg gccaagcccc 3661 acttcccggc actgttgtga gcccctccca gctctctcca cgctctctgg gtgcccacag 3721 gtgccaacgc cggccaggcc cagcatgcag tggctctccc caaagcggcc atgcctgttg 3781 gctgcctgct gcccccaccc tgtggctcag ggtccagtat gggagcttcg ggggtctctg 3841 aggggccagg gatggtgggg ccactgagaa gtgacttctt gttcagtagc tctggactct 3901 tggagtcccc agagaccttg ttcaggaaag ggaatgagaa cattccagca attttccccc 3961 cacctagccc tcccaggttc tatttttaga gttatttctg atggagtccc tgtggaggga 4021 ggaggctggg ctgagggagg gggt //