Prediction of human protein function from post-translational modifications
and localization features.
L. Juhl Jensen1, R. Gupta1, N. Blom1,
D. Devos2, J. Tamames2, C. Kesmir1,
H. Nielsen1, H. H. Stærfeldt1,
C. Workman1, C. A. F. Andersen1,
S. Knudsen1, A. Krogh1,
A. Valencia2 and S. Brunak1.
J. Mol. Biol., 319:1257-1265, 2002.
Center for Biological Sequence Analysis, The Technical University of Denmark,
DK-2800 Lyngby, Denmark
Protein Design Group, National Center for Biotechnology,
CNB-CSIC, Cantoblanco, Madrid E-28049, Spain.
We have developed an entirely sequence-based method which
identifies and integrates relevant features that can be used to assign
orphan proteins to functional classes, and for enzymes enzyme
categories. We show that strategies for the elucidation of orphan
protein function may benefit from a number of functional attributes
which are more directly related to the linear sequence of amino acids
-- and hence easier to predict -- than protein structure. These
attributes include features associated with post-translational
modifications and protein sorting, but also much simpler aspects such as
the length, isoelectric point and composition of the polypeptide chain.
Prediction of human protein function according to Gene Ontology
Lars Juhl Jensen, Hans-Henrik Stærfeldt
and Søren Brunak.
Bioinformatics, 19:635-642, 2003.
The human genome project has led to the discovery of many human protein
coding genes which were previously unknown. As a large fraction of these
are functionally uncharacterized, it is of interest to develop methods for
predicting their molecular function from sequence.
We have developed a method for prediction of protein function for a subset
of classes from the Gene Ontology
classification scheme. This subset includes several pharmaceutically
interesting categories: transcription factors, receptors, ion channels,
stress and imune response proteins, hormones and growth factors can all be
predicted. Although the method relies on protein sequences as the sole input,
it does not rely on sequence similarity, but instead on sequence derived
protein features such as predicted post translational modifications (PTMs),
protein sorting signals and physical/chemical properties calculated from the
amino acid composition. This allows for prediction of the function for orphan
proteins where no homologs can be found. Using this method we propose two novel
receptors in the human genome, and further demonstrate chromosomal clustering
of related proteins.
Lars Juhl Jensen,