Notes on UNIX from the teacher
Basic file handling in UNIX.
Necessary files to complete this exercise.
To download the files to your system, just press the Shift key while
you left click on the blue link. Follow the instructions.
The files are all excerpts of real data.
You can play around with these files as much as you like. If you change or
destroy them, just download them again.
- Use nedit to create a file mycommands.txt where you
write all commands and observations you do in the following
exercises. Use copy/paste to copy the commands.
Note: There are more standard text editors than nedit. Examples are emacs, xemacs, vi, vim, and pico.
- First list the files in the directory.
- Copy ex1.acc to myfile.acc.
- Look at the content of both files to ensure they are identical.
- Copy ex1.dat to myfile.acc.
- Check that the content of myfile.acc changed.
- Delete myfile.acc.
- Make a directory test and move the three files to it.
- Make a directory data and move the three files to that instead.
- Remove test directory.
- Change directory to data and confirm that you succeded. Go back to the home directory afterwards.
- Make three new directories newtest - one inside the other,
like a russian doll.
- Move the data directory to the innermost newtest directory.
- Confirm that the three files are moved along with the data directory.
- Copy the three files to your home (your top directory).
- Remove all newtest directories and data in the with a single command.
There may be a lot of confirmations. These are not considered part of
the command. They are annoyances.
- Count the lines in ex1.acc and ex1.dat.
- Concatenate ex1.acc and ex1.dat in the file ex1.tot,
i.e. copy the content of two files into one new file.
Verify that all gene IDs comes first followed by numerical data.
- Paste ex1.acc and ex1.dat together in ex1.tot,
thus destroying the old file. Verify that corresponding gene IDs and
numerical data are put on the same line.
as the data.
- Extract (cut) SwissProt ID and 3nd numerical data (column 1 and 5)
from ex1.tot. Put results into a file ex1.res.
- Find the 3 SwisProt ID's in ex1.res which have the largest number(s) in
column 2, i.e. the top 3 entries.
- Find the lines (using grep) in orphans.sp which contain a
GenBank accession number.
There are 85, verify this. Note: An accession number is one or two capital letters
and looks like this 'AB000114.CDS.1', the .CDS. part is kind of optional.
- How many human genes with SwissProt IDs in
orphans.sp exist ? How many of those are hypothetical ? (11)
How many genes belong to the rat, and how many of those are precursors ?
(9) Note: A Swissprot ID looks like 'PARG_HUMAN' or 'TF1A_MOUSE', with the gene
being before the underscore and the organism after the underscore.
- This litte exercise will require that man is used for help on grep.
From the file ex1.res find the lines with positive
numbers and put then into ex1.pos. The lines with negative
number go into ex1.neg.
- Write a shell script that solves exercise 19-24, with the exercises clearly
seperated in both the script and the output. This should be straight
forward (but long), especially since you took notes (exercise 1).
- Write a shell script (which is simply just a list of unix commands in a file)
that puts all the positive numbers in the file ex1.dat into a file ex1.pos2,
and all the negative numbers into a file ex1.neg2.
Column position does not matter. The script must clean up after itself, so if any
temporary files are used, they must be deleted as the last action. Remember
to put the date and a description of the files in the first lines of the
resulting output files.
- Mail your mycommands.txt file to the teacher for comments.