Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Additional information


DISTRIBUTIONS OVER CHROMOSOMES

The two figures below show the distribution of different cellular roles and subcellular localizations for proteins belonging to each chromosome in the human genome. Each spot represents the normalised frequency of the number of proteins (belonging to a certain category) transcribed from a particular chromosome. This number is obtained by dividing the absolute number of proteins (on a spot) by the number of proteins transcribed by the chromosome and by the number of proteins occuring in the category:

Normalised freq. = Absolute freq. / (# chromosome * # category)




FEATURES REPRESENTATIONS

In ProtFun many different predicted protein features are used as input. These features are represented in a number of different ways:
  • Single value
    The feature consists of a single value for each protein.
    Example: The gravy feature consists of one number, the hydrophobicity.
  • Several numbers
    The feature consists several different scores, the names of which are listed.
    Example: The signalp feature consists of a fixed number of values. These are the meanS score, the maxY score and the logarithm of the length of the predicted signal peptide (all of which are obtained from SignalP).
  • N bins
    A prediction method outputs a value per residue. These are encoded as a fixed number of values by dividing each sequence into N bins and averaging over the residue scores within each bin.
    Example: The phosY feature consists of the average tyrosine phosporylation potential within each of 10 sequence bins of equal size.
  • Several values in N bins
    Some prediction methods output several values per residue. Each of these values are binned as described above.
    Example: The psipred feature consists of a score for helix, sheet and coil each averaged within five bins, giving a total of 15 input values.

The following table summarize the encoding scheme for all features actually used in the ProtFun method.

Abbriviation Encoding Description
ec single value Extinction coefficient predicted by ExPASy ProtParam
gravy single value Hydrophobicity predicted by ExPASy ProtParam
nneg single value Number of negatively charged residues counted by ExPASy ProtParam
npos single value Number of positively charged residues counted by ExPASy ProtParam
nglyc potential in 5 bins N-glycosylation sites predicted by NetNGlyc
oglyc potential-threshold in 10 bins GalNAc O-glycosylations predicted by NetOGlyc
pest fraction in 10 bins PEST rich regions identified by PESTfind
phosST potential in 10 bins Serine and threonine phosporylations predicted by NetPhos
phosY potential in 10 bins Tyrosine phosporylations predicted by NetPhos
psipred helix, sheet, coil in 5 bins Predicted secondary structure from PSI-Pred
psort 20 probabilities Subcellular location predtions by PSORT
seg fraction in 10 bins Low-complexity regions identified by SEG
signalp meanS, maxY, log(cleavage pos) Signal peptide predictions made by SignalP
tmhmm inside, outside, membrane in 5 bins Transmembrane helix predictions made by TMHMM


NEURAL NETWORK ARCHITECTURES

In ProtFun an ensemble of five neural networks is used for each predicted category. These are all three layer feed-forward neural nets, that is they have one hidden layer. The number of neurons in this layer is not the same in all networks; the number of hidden units can be found in the table below. The table also shows which features are used as input for each network.

Category Hidden
units
Input features
Amino acid biosynthesis 30 ec psipred psort tmhmm
30 ec psipred tmhmm
30 ec netoglyc psipred psort
30 gravy psipred psort
30 oglyc psipred psort
Biosynthesis of cofactors 50 gravy pest psort tmhmm
50 ec psipred psort seg
30 ec psipred psort tmhmm
30 gravy psipred psort seg
40 ec psipred psort seg
Cell envelope 40 signalp tmhmm
40 nglyc psort signalp tmhmm
30 nglyc psipred psort signalp tmhmm
30 psort signalp tmhmm
30 psipred psort signalp tmhmm
Cellular processes 30 gravy nglyc psort seg
30 gravy phosST psipred
30 phosST pest psort seg
30 pest psort seg
30 psort seg
Central intermediary metabolism 50 ec nneg npos psipred psort tmhmm
30 nneg npos psipred psort tmhmm
30 nneg npos psipred tmhmm
30 ec psipred psort
30 nneg npos psipred
Energy metabolism 10 ec phosST phosY pest psort signalp
40 ec phosST phosY psort signalp
50 ec pest psort signalp
30 phosY pest psort signalp
30 pest psort signalp
Fatty acid metabolism 30 gravy phosST pest signalp
30 gravy seg signalp
30 gravy pest seg signalp
30 gravy seg
30 pest psipred seg signalp
Purines and pyrimidines 10 gravy nneg npos psort
40 gravy nneg tmhmm
20 gravy nneg npos tmhmm
30 gravy nneg npos psort
30 ec gravy nneg tmhmm
Regulatory functions 30 phosST phosY pest psort
30 oglyc nglyc pest psort
30 oglyc phosST phosY pest psort
30 npos nglyc pest psort
40 pest psort
Replication and transcription 30 oglyc nglyc psort tmhmm
30 oglyc psort
30 oglyc psort tmhmm
30 nglyc psort tmhmm
30 gravy nglyc psort tmhmm
Translation 30 phosY nglyc pest signalp
30 oglyc pest signalp
30 oglyc phosY nglyc signalp
10 oglyc pest signalp tmhmm
10 phosY nglyc pest signalp
Transport and binding 40 ec nglyc psort signalp
30 npos psort
40 ec gravy nglyc psort signalp
30 gravy nglyc psort signalp
30 ec psort
Enzyme/nonenzyme 40 ec nneg npos psort tmhmm
40 ec npos phosY psipred psort tmhmm
10 ec nneg psort
10 ec nneg npos psort
40 ec psort tmhmm
Oxireductase 10 ec oglyc pest psort signalp
30 ec oglyc psort signalp
30 ec gravy phosY pest psort
50 ec phosY pest psort signalp
40 ec oglyc psort signalp
Transferase 50nneg nglyc psort tmhmm
30 nneg phosY psort seg
30 nneg nglyc psort seg
20 nneg phosY psort
50 ec nneg phosY psort tmhmm
Hydrolase 50 ec gravy psipred psort signalp tmhmm
30 ec nneg psipred tmhmm
20 ec gravy psipred signalp
30 ec gravy psipred tmhmm
50 ec psort signalp tmhmm
Lyase 50 gravy nneg phosST nglyc tmhmm
30 nglyc psort tmhmm
50 nneg npos phosST psort tmhmm
30 gravy nneg phosST nglyc psort tmhmm
30 gravy nneg nglyc psort tmhmm
Isomerase 30 nneg npos psipred seg tmhmm
30 nneg pest psipred tmhmm
30 nneg pest psipred seg
30 ec nneg psipred
30 ec npos pest psipred tmhmm
Ligase 30 nneg psipred psort
30 pest psipred psort tmhmm
10 nneg psipred signalp tmhmm
30 ec gravy nneg psipred
30 gravy nneg psipred



CORRESPONDENCE

Lars Juhl Jensen,