Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Working with R

Written by: Carsten Friis and Marcin Krzystanek


Like for the other R exercises, you will likely need to have the "Introduction to R" handy to solve some of these exercises.


Working with dimensional objects and indices

We will be storing our microarray data in objects in R. These objects will tend to get very big, given the nature of the microarrays ability to generate lots of numbers in parallel. To find the ups and downs of our data, you will find it helpful being able to navigate through indexed objects in R.

  1. Generate a vector x of 30 random numbers from the normal distribution using the rnorm() function.

  2. Create a vector y that only contains the positive elements of x.

  3. Create a vector z that censors the negative elements of x with zeros.

  4. Using data from vector x and the matrix() function create matrix m with five columns and six rows.

    Once you've created m, use the str() function on both m and x. See the difference...

  5. Convert m to a data.frame called d.f using the as.data.frame() function.

    Try using str() again on your new data.frame. A data.frame looks like a matrix, but it is different. Notice for example how columns now have names rather than just numbers. Columns in an R matrix can have names (see help(matrix) for how, look for the 'dimnames' property), but in a data.frame columns must have names.
    R contains numerous as.[object]() functions. For example, as.vector(d.f) would give you a vector identical to your original x. These are very useful for converting between different object types, but conversion is not always this seamless.

  6. You can illustrate the difference between a matrix and a data.frame by setting one single element to a character in both m and d.f (e.g. m[2,2] = "a"; d.f[2,2] = "a").

    Use str() again. The whole matrix m has been changed from a numeric object to a character object. In the data.frame, however, only the affected column has been changed. In R, a matrix is a matrix, but you should think of a data.frame more like a collection of named vectors. Matrices are sometimes easier to work with but, from a microarray perspective, data.frames allow us to keep annotation columns with gene-specific annotations within our data sets.


Basic I/O and filehandling

In R, you can always answer 'y' to the question posed when you leave R using the q() function. This will cause your entire workspace to be saved to the file '.RData'. Note that this is a hidden file, so you may not always be able to see it, but it is there. It will automatically be loaded next time you start R in that directory. There are other ways, however, to save your progress, for example: you can save any object in a binary file using the save() function.

  1. Try to save the matrix m you created before to a file using save() (make sure to note the filename)

  2. Now erase the m object using the rm() function

  3. Verify that it is gone with the ls() function

  4. Load the file you just created with the load() function

  5. Use ls() again to confirm that m is back

    You can also save the entire workspace (i.e. all your objects) to one file using the function save.image(). In fact, this function is exactly what R uses if you answer "yes" to the question R poses when you quit R using q(). save() and save.image() represent two quick and simple ways to save your progress, but because it is stored in a binary file, you cannot use such files as input for other programs like Excel, etc. Nor can the load() function load data from other sources than R.
    If you want to export your data to another program or import data from any other source than a file created with save(), you should use a text file rather than a binary file. Of all object types in R, data.frames and matrices are most easily exported or imported. You can save a data.frame or matrix to the disk using the write.table() function for better compatibility with MS Excel you can use write.csv() function. The resulting csv file can be read directly to Excel which is great advantage if you want to share your results with collaborators that do not use R.

  6. Use the write.table() function to save the matrix m to a text-based file. Next, create a csv file using the write.csv() function (please remember to set row.names=F as one of the arguments).

  7. You can confirm that the files indeed exists and is healthy by opening it in any relevant editor. You will see the difference between the two text file types.

    You may use anything which handles raw text files, even MS Word. If you have it, MS Excel usually works best. Just, please, please, make sure that you do not change the file type. R will only be able to read the file again if it is an ASCII tab-delimited text file, so no saving as a *.docx or *.xlsx.
    You can see that when reading a csv file into Excel all the data.frame items are placed correctly.

  8. Once again, erase m and confirm that it is gone with ls().

  9. Now read back in again using read.table() and save it as m. Do the same with read.csv() and save it as m2.

    You will notice that read.table()/read.csv() have a much different (and more complicated!) syntax than save() but the advantege is that you can share data with larger number of people.

  10. And now the million dollar question: are m and m2 the same as before we saved it?

    Text files are simpler than binaries and can be edited with programs such as nedit or Excel, but they do not contain information about how the data should be represented in R. We as users must provide some of that, or accept that R just makes a few quick guesses.