Analysis and prediction of gene splice sites in four Aspergillus genomes.
Kai Wang, David Wayne Ussery, Søren Brunak¤
Fungal Genetics and Biology. Volume 46, Issue 1, (s) 14-18, March 2009
Center for Biological Sequence Analysis, Dept. of Systems Biology,
Technical University of Denmark, DK-2800 Lyngby, Denmark
Several Aspergillus fungal genomic sequences have been published, with many more in progress. Obviously, it is essential to have high-quality, consistently annotated sets of proteins from each of the genomes, in order to make meaningful comparisons. We have developed a dedicated, publicly available, splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window size on single local networks for training, to cover both donor and acceptor site information. We have applied NetAspGene to other Aspergilli, including A. nidulans, A. oryzae, and A. niger. Evaluation with independent data sets reveal that NetAspGene performs substantially better splice site prediction than other available tools. NetAspGene will be very helpful for the study in Aspergillus splice sites and especially in alternative splicing. A webpage for NetAspGene is publicly available at http://www.cbs.dtu.dk/services/NetAspGene.