Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Exercise M3


OBJECTIVES

The purpose of this exercise is to make you familiar with the linux system. To do this, we will start simple by maneuvering the shell and downloading some files. Then we will try to edit files using emacs. Write a small program, and generate some simple figures.

Key tools used in this exercise are:
Perl - a platform independent scripting language commonly used in bioinformatics. Perl stands for Practical Extraction and Report Language.
Emacs - a feature rich text editor, that has a somewhat steep learning curve, however.
R - a statistics package based on S. It is open source and very extensive.

Extra scripts to download

fix_keyboard - This is a small script that will let you use a brazilian keyboard.
fixR - This is a script that will fix your version of R, and set up emacs integration.

The actual exercise

  1. Your first task is simple: Download this script
  2. Open a terminal by right clicking on the desktop and selecting rxvt. When the pointer is in the terminal window, it is active. type
    ls
    
    to see the contents of the current directory. You should see the script you just downloaded, as well as two sub-directories, bin and data. The bin directory is used to hold programs, and any executable program placed here will be easily accessible from anywhere. Take a look at the program by typing
    less script.pl
    
    The comments (starting with #) describe what the program does. Now we want to make the program executable and move it to the bin directory. Making a program executable is done by typing
    chmod a+x script.pl
    
    and it is moved to the bin directory by typing
    mv script.pl bin
    
    Now verify that the script is no longer in the current directory.
    To see that the script is still accessible type
    script.pl
    
    And you will see that it prints it default message, telling it that it needs data.
  3. Now we will try to use the script for something slightly more useful. First go to the data directory by typing
    cd data
    
    Then execute the script on BA000021.Glimmer3 like this:
    script.pl BA000021.Glimmer3
    
    You get a lot of output, which is the start and end positions of a lot of genes predicted with glimmer. We will now redirect the output into the file positions.dat:
    script.pl BA000021.Glimmer3 > positions.dat
    
    Run the program 'less' on positions.dat and see that the content of the file is what you got as output last time.
  4. Now you will edit the program a bit. Open the script in emacs by typing
    cd ../bin
    emacs script.pl &
    
    As it is now, the script prints the beginning and end of a predicted gene, but we would like to have the end printed first if the gene is on the negative strand.
    Before you begin, however, try to write something random in the script. Then save it, by pressing first Ctrl-x then Ctrl-s. Now try to run the script in the terminal window. It will probably fail. Then make emacs the active window again and undo your changes by typing Ctrl-_. You can quit emacs by typing Ctrl-x followed by Ctrl-c.

    Now try to spend five or ten minutes playing around with the script, and see if you can make it always print the lowest number first.
  5. Now we will start playing around with R. First download this simple R-script. I recommend placing it in the data directory for now.
    Try opening the script in emacs. As you can see it is pretty simple. It reads the data you just created, makes a density plot of it.
    The length distribution of predicted genes turn out to be very interesting, as you will see later on.
    Now let us try to actually run it.

    Note that you have to run the fixR script before proceeding.

    In the terminal type
    R
    
    After a while you will get a prompt simply saying '>'. Now you can type
    > source("script.R")
    
    The '>' above should not be typed, but simply indicates the prompt.
    A figure should have popped up.
    Notice, however, that you cannot edit the script easily when you run it in this way.
    Quit R by typing:
    > q()
    
    Let us instead try to run it in emacs.
    Start emacs again. Now try to open the script.R file by typing Ctrl-x Ctrl-f. You should now see a new tool bar appearing, where one of the buttons is a capital R. Before you press the big R button, you should split the emacs window, this is done by typing Ctrl-x 3.
    You can change between the panes by pressing Ctrl-x o. Do this once, and then press the R button. You get some questions in the bottom of the emacs screen, which you can just bypass by pressing Enter. Then you should see R starting in the right pane.
    Change back to the left pane.
    Now you can submit code lines by pressing Ctrl-c Ctrl-n, or whole blocks of R code by pressing Ctrl-c Ctrl-c.
  6. Now that you have done this, open the plot in gv.

    This concludes the first exercise.