Proc Natl Acad Sci U S A 1996 Sep 17;93(19):10268-10273
A minimal gene set for cellular life derived by comparison of complete bacterial genomes.
Mushegian AR, Koonin EVNational Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
The recently sequenced genome of the parasitic bacterium Mycoplasma genitalium contains only 468 identified protein-coding genes that have been dubbed a minimal gene complement [Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., et al. (1995) Science 270, 397-403]. Although the M. genitalium gene complement is indeed the smallest among known cellular life forms, there is no evidence that it is the minimal self-sufficient gene set. To derive such a set, we compared the 468 predicted M. genitalium protein sequences with the 1703 protein sequences encoded by the other completely sequenced small bacterial genome, that of Haemophilus influenzae. M. genitalium and H. influenzae belong to two ancient bacterial lineages, i.e., Gram-positive and Gram-negative bacteria, respectively. Therefore, the genes that are conserved in these two bacteria are almost certainly essential for cellular function. It is this category of genes that is most likely to approximate the minimal gene set. We found that 240 M. genitalium genes have orthologs among the genes of H. influenzae. This collection of genes falls short of comprising the minimal set as some enzymes responsible for intermediate steps in essential pathways are missing. The apparent reason for this is the phenomenon that we call nonorthologous gene displacement when the same function is fulfilled by nonorthologous proteins in two organisms. We identified 22 nonorthologous displacements and supplemented the set of orthologs with the respective M. genitalium genes. After examining the resulting list of 262 genes for possible functional redundancy and for the presence of apparently parasite-specific genes, 6 genes were removed. We suggest that the remaining 256 genes are close to the minimal gene set that is necessary and sufficient to sustain the existence of a modern-type cell. Most of the proteins encoded by the genes from the minimal set have eukaryotic or archaeal homologs but seven key proteins of DNA replication do not. We speculate that the last common ancestor of the three primary kingdoms had an RNA genome. Possibilities are explored to further reduce the minimal set to model a primitive cell that might have existed at a very early stage of life evolution.