Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Hard R Commands

Written by: Carsten Friis


WARNING: Some of these exercises can be very hard and go beyond the scope of this lecture as well as the course itself. They are included for those of you who feel like trying your skills with R, and maybe know some R to begin with. So don't feel bad if you become stuck on your way through this exercise. A strong grasp of R allows you to customize your work and opens new possibilities for your projects, but you can get through this course with much less that that. Unless you have read and memorized it, you will likely need to have the "Introduction to R" handy to solve these exercises.


Devices in R. How to print and save plots

Before we begin, we will need some values to plot like the vectors x and y from the last exercise.

  1. In case you have restarted R or tampered with them since completing the previous exercise, you'll need to type: x <- rnorm(10000) and y <- rnorm(10000)

In a nutshell, a device in R is a place where your plots go. It could be your screen, it could be a printer or it could be a file. In a multiuser environment this is a little complicated, for if your devices are incorrectly set, you could be sending your plots to the screen of the guys sitting next to you, or even to a printer located in a dark locked room two kilometers away.

  1. Use the function dev.list() without any parameters to get a list of the devices you currently have open
  2. If no devices are open, dev.list() returns 'NULL', but you may have some plot windows open from the previous exercises.

  3. Close any such windows down (but do NOT close the R window itself!) until dev.list() returns 'NULL'

    The default in R is to send your plot to your screen.

  4. Without any devices open type plot(x,y), and a new window pops up to display your plot

  5. Keep the device (i.e. the window with your plot) open, and type plot(density(x))

    Because an active device already exists, no new window is created, instead your old plot is simply overwritten by the new plot. Keep the new plot open.

  6. Write help(Devices) to get a list of common devices. One of them should say something like '... graphical driver for ...'

    This is the device which opens a window on your system. For Linux/UNIX it is 'X11', on Windows it is 'windows'.

  7. Use the x11() function to open an 'X11' device (on Windows you should use windows() instead, although x11() will sometimes work)


  8. Now type plot(x,y) again to draw a plot in the new window

    You can also add elements to an existing plot without overwriting the plot

  9. Use the points() function to add some new dots to the active plot

    The points() function is not as smart as the plot() function, however. You cannot change the indices on the axises for example, so take care that your new dots are within the viewable range. Printing a new dot on top of an existing one may also be hard to spot, so you may need to experiment with colors. See help(points) for details.

  10. Use the dev.set() and points() functions to draw a density plot of y in the same window as the density plot of x

    The dev.cur() function lets you control which of the existing devices is active (i.e. which one you plot to). If you have followed this text strictly, you should have an open but inactive device with a density plot of x

    You cannot print on our system (or actually, you can, but the printer is in a dark room far away...), but you can save a plot to a file and copy it to your local machine using the dev.copy2eps() function. Other functions for this exist in R, which give you greater control and more options, like creating jpeg files, etc. but strange things can happen if you use them and botch it. So stick with dev.copy2eps() and be safe :-)

  11. Use the dev.copy2eps() function to copy the contents of the active device to a postscript file

    You cannot directly view a postscript file in R, but this little trick lets you open another program in UNIX/Linux which can. This does not work on a PC. Note that you need the filename of your postscript file. If you didn't specify a name for it when you ran dev.copy2eps() you may have to either track down where it put your file, or create it again, this time specifying a filename :-).

  12. Verify that the file is ok using this command: (don't worry about how this one works!)
    system("gv [name_of_your_file] &")

    To actually print this file you must first copy it to your local machine using a secure shell-compatible program. On the Windows PCs 'SSH Secure File Transfer' is such a program, on UNIX/Linux use 'scp' on the UNIX command prompt (NOT the R command prompt!). If you have a Windows machine, feel free to try this now, but be adviced that using 'scp' on a UNIX/Linux machine is tricky! Anyway, it is not critical to do this now, you can just continue with the exercise with good conscience.

  13. Use the graphics.off() function to clean up and close all open devices, active as well as inactive

A closer look at the p-value

In the previous exercise concerning p-values, You probably noticed the parentheses saying that it was only an estimate of the probability. Although we have a lot of time dedicated to statistics later in the course, it doesn't hurt to skip a little ahead and examine some of the pitfalls of the p-values in closer detail.

When you ran the t.test() before, you may have noticed among the output a line saying something about the 'alternative hypothesis'. This is the opposite of the hypothesis which the test actually evaluates: namely that the mean of x and y are equal. This is called the null hypothesis, and it is the hypothesis we wish to reject.

  1. Generate yet another vector y, this time with a mean of 0.2 and use the t.test() to statistically test if the x and y vectors are similar. Look to the p-value, is the difference between x and y statistically significant?

  2. Now draw a density plot of x and y. Start by plotting x like you did in the previous exercise, then use the points() function to add y to the plot. You will need to study the help pages on points and set some parameters manually to make the plot look good. Do the two density functions look significantly different to you?

  3. Re-create the vector y with several different means. What is the closest mean y can have to the mean of x before the t-test begins to consider them equal?

  4. Try re-creating both x and y but this time sampling only 5 values from the normal distribution (e.g. use rnorm(5)). Repeat the exercise above finding the closest mean of y to the mean of x where the t-test starts to considers the two means different (p-values around 0.05-0.01)

Statistical tests like the t-test are only as good as their null hypothesis, and that usually tests whether two means are precicely equal, not whether they are approximately equal. That is why the test becomes more strict as the sizes of x and y increases. A good p-value tells you something about the reliability of your measurements, it does not tell you anything about the scientific impact of your observation. Thus, the measurement with the lowest p-value from your microarray experiment is likely to be a good candidate for further investigation, but it is not necessarily your best candidate. Other factors like prior observation, sound judgement, etc. should weigh in the decision.


Writing custom functions

Throughout this course you will encounter custom functions usually written by your teachers to help your analyses run more smoothly. This part of the exercise illustrates how these custom functions were made. You will not, however, be called upon to make your own functions within the scope of this course, although doing so can speed up your data analysis in R, partcularly as we approach the end of the course. Using functions is a thing which slowly creeps up on you, as you begin to repeat the same analytical steps and inevitably start to tire of writing the same chunks of code time after time.

  1. Create a function cube: cube(x) = x3

    Functions in R are created via the function statement. Try to cut'n'paste this code into R:

    f <- function (x) {y = x + 2; return(y)}

    Now write:

    f(2)

    ...and see what happens. Does that inspire you? Notice also, that the x object in the function is not the same as the vector x you created above.

  2. Create a function factorial: fact(y) = y!

    This function is hard to write because it requires you to use a loop structure such as a for-loop or similar construct. See the "Introduction to R" guide for more on for loops.

  3. Create a function func: func(x; y) = cube(x) - fact(y)