EasyGene 1.2 INSTALLATION INSTRUCTIONS DESCRIPTION EasyGene v. 1.2 predicts genes in sequences of prokaryotic DNA. The method is described in detail in the following articles: EasyGene 1.2 (this version): Large-scale prokaryotic gene prediction and comparison to genome annotation. Pernille Nielsen and Anders Krogh. Bioinformatics 21(24):4322-4329, 2005. EasyGene 1.1 (original version): EasyGene - a prokaryotic gene finder that ranks ORFs by statistical significance. Thomas Schou Larsen and Anders Krogh. BMC Bioinformatics 4:21, 2003. More information on the method is also available at: http://www.cbs.dtu.dk/services/EasyGene/ http://www.binf.ku.dk/cgi-bin/easygene/search/ DOWNLOAD By special agreement only; contact software@cbs.dtu.dk for details. PRE-INSTALLATION The version 1.2 requires Linux, PERL 5.6 or better and NCBI BLAST 2.2.11 or better. The software package consists of two files: easygene-1.2.readme this file easygene-1.2.Linux.tar.Z compressed TAR archive After installation the software will occupy less than 300 Mb of diskspace. INSTALLATION 1. Uncompress and untar the package: cat easygene-1.2.Linux.tar.Z | uncompress | tar xvf - This will produce a directory 'easygene-1.2'. 2. In the 'easygene-1.2' directory edit the main script 'easygene'. In the part of the script marked "GENERAL SETTINGS: CUSTOMIZE ... " configure the following variables: EG full path to the 'easygene-1.2' directory AWK full path to 'awk' e.g. /usr/bin/gawk PERL " 'perl' BLASTALL " 'blastall' (NCBI BLAST) FORMATDB " 'formatdb' (NCBI BLAST) 3. In the 'easygene-1.2/lib/sprot' directory remove the files: reldate.txt sprot.dat Create the files again as symbolic links to the files 'reldate.txt' and 'uniprot_sprot.dat' in the flat file UniProt distribution available on your system. If you do not have UniProt installed in flat file form you can download the two files mentioned above from e.g. ftp.expasy.ch, in '/databases/uniprot/current_release/knowledgebase/complete/'. In such case they may be placed in 'easygene-1.2/lib/sprot' directly. Remember to uncompress the 'uniprot_sprot.dat.gz' file. Please note the difference in file name: 'uniprot_sprot.dat' must be installed or linked as 'sprot.dat'. In the 'easygene-1.2' directory verify the status of the database: > ./easygene -db stat The presence of the database should be reported; y ou need to format it for EasyGene: > ./easygene -db update Verify the status again: > ./easygene -db stat The presence of both databases should be reported now. 4. Make sure that the 'easygene-1.2/new_trainruns' directory has the right permissions: the listing should read "drwxrwxrwt ..." (sticky bit set). If the sticky bit is not set, set it: > chmod 1777 new_trainruns The directory contains an example directory 'BA02' with two input files for 'Buchnera aphidicola' provided: NC_004061.fna (genome sequence in FASTA, mandatory) NC_004061.gbk (GenBank entry, optional) It is possible to download the complete results of a training for that organism, for comparison with own runs in the future: http://www.cbs.dtu.dk/services/EasyGene/BA01.tar.gz 5. a. Copy the 'easygene' script to a directory in the users' path. b. copy the 'discotope.1' file to a location in your manual system. 6. Enjoy ... PROBLEMS Contact support@cbs.dtu.dk in case of problems. Questions on the scientific aspects of the method should be sent to Thomas Schou Larsen, thomas@biopeople.dk. 2010-12-13