Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other


Basic commands
Where am I? - pwd
pwd      This command returns the path to your current location (the current directory) (and this the command that is used to construct your prompt)

Copy the /home/people/immu00/exercise1 directory and its contents to here
cp -R /home/people/immu00/exercise1 .      -R means recursively, i.e, include everything in the directory, "." means to here.

What is in this directory? - ls
ls      short listing of current directory
ls ..      short listing of directory below current directory - ".." means one directody down "../.." is two directories down
ls exercise1      short listing mail directory (equivalent to ls ./exercise1 - "." means here)
ls -l exercise1      detailed listing of projects directory
ls -ltr exercise1      long listing sorted by time (t) and reversed (r): newest files last (essential for old bioinformaticians who can not remember what they just did)
ls /usr/freeware/bin/      list programs in "/usr/freeware/bin/" directory - paths starting with "/" are absolute addresses

Make new directory - mkdir
mkdir testdir      Make test directory

I want to go to? - cd
The cd      command is used to move around in the file system. Examples:
cd testdir      go to the testdir directory (relative address to where you are)
cd ..      up one level
cd /usr/freeware/bin/      go to absolute (not relative) address
cd      go to my home directory
cd exercise1      go to exercise1 directory (verify you are there by the pwd command)

Make an empty file (or update time stamp on existing one)
touch myfile      make file called "myfile". (verify it has been created with ls -l)

Moving files - mv
mv myfile mynewfile      move myfile to mynewfile

Removing (deleting) files - rm
rm mynewfile      remove mynewfile
rmdir mydirectory      remove an empty directory
rm -rf mydirectory      remove my directory, including files and subdirectories - no questions asked - make sure this is what you want to do, there is no recycle bin on UNIX; once it is gone it is gone

Copying files - cp
touch myfile      make file called "myfile"
cp myfile mynewfile      copy myfile to mynewfile

Viewing files - cat/more/less/head/tail
cat test.dat      write contents of file to screen
head test.dat      write top of file (default 10 lines)
head -30 test.dat      write top 30 lines of file
tail test.dat      Print the last 10 lines of end of file
more test.dat      test.dat, pres "space to go one page down, "q" to quit.
less test.dat      test.dat, pres "space to go one page down, "j" to go one line down, "k" to go one line up"q" to quit.

Editing files - n/vi
The n      command is used to launch the nedit editor. Examples:
n test.dat      edit the file test.dat with nedit

vi is a nerdy editor. Type:
vi test.dat      to edit the file test.dat.
/RLM to search for "RLM"
x to delete a letter
dd to delete a line
5 dd to delete 5 lines
:q! to get out without changing anything, or
"ZZ" to save changes and quit.
To insert text press "i" - to get into insert mode and press "Esc" to get out of insert mode (in all "normal" editors you are automatically in insert mode). You can use "R" and "Esc" to get in and out of replace (overwrite) mode
You may not want to use the vi editor unless you have to e.g. if you can not run x-windows, or edit via a noisy telephoneline from Mars.

Moving data around
Redirecting: |><      
Use | to "pipe" data from one program to another. Example:
cat test.dat | wc      pipe the contents of test.dat into the program called wc (word count) count number of lines, words and bytes in test.dat
Use > to direct data to a file (and overwrite it). Example:
head test.dat > tmp.dat     Put first ten lines of test.dat into tmp.dat
Use > to direct data to a file and append the data to the contents of the file. Example:
head test.dat >> tmp.dat     Put first ten lines of test.dat into tmp.dat (now it should contain 20 lines)
Use < to get data from a file to a program. Example:
head < test.dat      

Geting help - man
The man      gives help to most unix commands. Examples:
man ls      get help to the ls command

Bioinformatics using Unix commands
Awksome programing languages (awk, nawk & gawk)
awk, nawk & gawk are different versions of the same programming language, and are very similar. It is recommended to use gawk or nawk , rather than the original version: awk, since they are more stable and have more features!
Basically gawk will read a file and do something with each line.
Examples of using gawk:
gawk '{print $1}' epitope2protein.HLA-D_m13.out      Print first field in file
gawk '{print $1, $3}' epitope2protein.HLA-D_m13.out      Print first and third field in file
cat epitope2protein.HLA-D_m13.out|gawk '{print $1}'      Print first field in file getting data from standard input
gawk '{print substr($7,2,5)}' epitope2protein.HLA-D_m13.out      Print five characters of the seventh column, starting with the second letter (in the seventh column). NB awk numbers strings from 1, but perl number them from 0!
gawk '{print substr($7,length($7)-3,4)}' epitope2protein.HLA-D_m13.out      Print last four letters in seventh column
gawk -v name=Mary -v animal=lamb '{print name,$1,animal}' epitope2protein.HLA-D_m13.out      Passing variables to gawk
gawk -F "\t" '{print $1}' epitope2protein.HLA-D_m13.out      Split only input on tabulators (rather than on any whitespace as is the default)
head epitope2protein.HLA-D_m13.out | gawk 'BEGIN{print "Here comes the data"}{print $1}END{print "No more data"}'      statements in BEGIN{} and END{} are executed before and after the data are read, respectively

A more complex example: You have a file called epitope2protein.HLA-D_m13.out with a protein sequence in the 7th column and the residuenumber in the sequence where an epitope starts in the third column. You want to print out the sequence surounding the start of the epitope (in this case the first five resigues of the epitope and the four residues before the epitope) in a format that can be read by the sequence motif visualization program logo. The first line in the output must be "* Aligned protein sequences.", and each sequence motif must be followed by a ".". Furthermore only motifs that are nine amino acids long must be printed. This is what the command can look like:
cat epitope2protein.HLA-D_m13.out | gawk 'BEGIN{print "* Aligned protein sequences."}{s=substr($7,$3-4,9);if (length(s)==9){print s"."}}' | /home/projects/projects/vaccine/bin/logo2 -p - | gawk 'BEGIN{pr=0}{if (/\%\!PS-Adobe-3.0 EPSF-3.0/){pr=1} if(pr==1){print $0}}' >

Search and replace - sed
the sed program is a command line programe that corresponds to the search and replace function in for example word. As the following examples show it can do some more advanced replacements.
Example of using sed: echo "Mary had a little lamb" | sed 's/little/big/g'      Replace (s=substitute) little by big (g=global, i.e replace all)
echo "Mary had a little lamb" | sed 's/\(.*\)lamb/\1goat/g'      Print everything before lamb followed by goat. what is matched by \(.*\) is put in the variable \1. "." means any character, and "*" means repeated zero or more times
echo "Mary had a little lamb. John had a little goat" | sed 's/\([A-Za-z]*\) had a little \([A-Za-z]*\)/The \2 is owned by \1/g'      Try that with your old search and replace

Sort file - sort
Example of using sort: Getting a test filecp /usr/opt/www/pub/CBS/researchgroups/immunology/intro/Unix/test.out .      sort myfile      Sort file
sort -n test.out      sort file numerically
sort -n -k3 test.out      sort file numerically (big numbers last) after 3rd column
sort -r -n -k3 test.out      sort file reverse numerically after 3rd column
sort -u pdb.mhc.spnam      Keep only one copy of each unique line
sort pdb.mhc.spnam|uniq -c      Count the number of each unique line

Execute a string
putting `` around a command makes a unix execute the command corresponding to the string: echo pwd
      print the string pwd
`echo pwd`
      execute the command pwd
echo pwd|sh
      echo the string pwd to the shell - which will then execute it
This will be used in the next example.

Do something with many variables - foreach
Example 1: print each entry in list to screen

foreach entry (a b c)
      echo $entry

Example 2: get each swissprot entry from list and print it

foreach entry (`gawk '{print $1}' test.out`)
      echo $entry |sed 's/.*|//'| xargs getsprot

NB:echo ENV_HV1H2| xargs getsprot      is the same as getsprot ENV_HV1H2     

Warning: the string within () is limited to a few thousand charectors

Contatinate side by side - paste
paste pdb.mhc.nam pdb.mhc.spnam     

Get lines matching a patern - grep/egrep
grep 1A68_HUMAN pdb.mhc.spnam      Get lines with "1A68_HUMAN"
grep -v HUMAN pdb.mhc.spnam      Get lines that do not contain "HUMAN"
grep _HUMAN pdb.mhc.spnam      Get lines with "_HUMAN" (Human swiss prot sequences)
grep ^KA pdb.mhc.spnam      Get lines starting with "KA" (Human swiss prot sequences)
grep "^KA.*MOUSE" pdb.mhc.spnam      Get lines matching "KA" - something ("." is a wildcard; "*" means repeated zero or more times) - "MOUSE"

What did I do
history      The history command gives the old commands

Other usefull commands
which cat      Find out where a program (the cat program in this case) is installed. Often when you edit a program and nothing happens it is because you are editing another program than the one you are running
compress test.dat      compress test dat
uncompress      Uncompress a compressed file (.Z files)
zcat test.dat      Print contents of a compressed file to the screen (without uncompressing it)
tar      Pack and unpack files
gunzip      Unzip a zipped file (.gz files)
command line options      most unix programs take options in the form "program -option". for example head -5 will print out the first 5 lines of a file
diff/gdiff      compare tho files
chmod      Change permissions
ownership chown      Change ownership
autocompletion      Press TAB to let the unix system complete a file/program name
"arrow up"      press arrow up to get old commands
a      Go to start of line
e      Go to end of line