Article abstracts

Prediction of human protein function from post-translational modifications and localization features.
L. Juhl Jensen1, R. Gupta1, N. Blom1, D. Devos2, J. Tamames2, C. Kesmir1, H. Nielsen1, H. H. Stærfeldt1,
K. Rapacki1, C. Workman1, C. A. F. Andersen1, S. Knudsen1, A. Krogh1, A. Valencia2 and S. Brunak1.

J. Mol. Biol., 319:1257-1265, 2002.

1 Center for Biological Sequence Analysis, The Technical University of Denmark, DK-2800 Lyngby, Denmark
2 Protein Design Group, National Center for Biotechnology, CNB-CSIC, Cantoblanco, Madrid E-28049, Spain.

We have developed an entirely sequence-based method which identifies and integrates relevant features that can be used to assign orphan proteins to functional classes, and for enzymes enzyme categories. We show that strategies for the elucidation of orphan protein function may benefit from a number of functional attributes which are more directly related to the linear sequence of amino acids -- and hence easier to predict -- than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects such as the length, isoelectric point and composition of the polypeptide chain.

Additional information

Prediction of human protein function according to Gene Ontology categories.
Lars Juhl Jensen, Hans-Henrik Stærfeldt and Søren Brunak.
Bioinformatics, 19:635-642, 2003.


The human genome project has led to the discovery of many human protein coding genes which were previously unknown. As a large fraction of these are functionally uncharacterized, it is of interest to develop methods for predicting their molecular function from sequence.


We have developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories: transcription factors, receptors, ion channels, stress and imune response proteins, hormones and growth factors can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties calculated from the amino acid composition. This allows for prediction of the function for orphan proteins where no homologs can be found. Using this method we propose two novel receptors in the human genome, and further demonstrate chromosomal clustering of related proteins.


Lars Juhl Jensen,