Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Introduction to UNIX

In this exercise you shall work on a brief introduction to the UNIX/Linux system. It is expected that you all have some prior knowledge about programming, so the introduction is very short. In the first part of the exercise you shall set-up your account and copy some essential files needed to make the c-programs during the course. Next, you shall go through some small exercises giving you a more detailed introduction to Unix/Linux.

Linux. Part 1

Connect to the CBS server

Using windows
Double click on the SSH Secure Shell Client and chose File>Quick Connect. Fill in Host:, and your username. You will then be prompted for your password.

Using MAC/Linux
Type ssh -Y, where userid is your CBS user ID. You will then be prompted for your password.

Basic commands

If you have experience in using Linux/UNIX and have a particular setup for your account, you can make a course directory (say algo) and do all of the course exercises in this directory.

Where am I? - pwd
pwd      This command returns the path to your current location (the current directory) (and this the command that is used to construct your prompt)

Make a new directory - mkdir

mkdir test
mkdir data
mkdir -p src
mkdir -p bin

Make directories test, data, src, bin. The -p option gives no error if the directory already exists.

Copy all files from the directory to the data directory

cp /home/projects/mniel/ALGO/data/Cprog/* ./data/

* means everything in the directory, "." means to here, so ./data means the sub-directory data from where your are now.

What is in this directory? - ls


ls      short listing of current directory (a directory is often called a folder in windows)
ls ..      short listing of directory below current directory - ".." means one directory down "../.." is two directories down
ls data      short listing mail directory (equivalent to ls ./data - "." means here)
ls -l data      detailed listing of projects directory
ls -ltr data      long listing sorted by time (t) and reversed (r): newest files last (essential for old bioinformaticians who can not remember what they just did)
ls /usr/bin/      list programs in "/usr/bin/" directory.

paths starting with "/" are absolute addresses starting at the root folder (normally called C:\ in windows) - as opposed to relative addresses (addresses relative to where you are in the folder hieraki)

I want to go to? - cd
The cd      command is used to move around in the file system. Examples:
cd ..      up one level
cd /usr/local/bin/      go to absolute (not relative) address
cd      go to my home directory

Now you can copy some essential files and directories that will be needed to make the c-programs during the course.

Copying (more) files - cp

cd src
cp /home/projects/mniel/ALGO/code/cprog/* .
cp -R /home/projects/mniel/ALGO/code/utils .

These commands copy all files from /home/projects/mniel/ALGO/code/cprog and the complete /home/projects/mniel/ALGO/code/utils directory to where you are (in the src directory). Note, the -R option means recursively, and hence will copy the complete directory including all files and sub-directories.

Setup you Unix/Linux account

You can specify some of the Unix functionality by editing the .cshrc file in your home directory.

Go to your home directory (type cd). Use your favorit editor (gedit, emacs, vi, ..) to edit the file.

After the line

# System independent actions (examples)

follows a list aliases that are defined to make Linux/unix work a bit more user friendly. Examples

alias   cp      'cp -i'
alias   mv      'mv -i'
alias   rm      'rm -i'
alias	make	'make -f Makefile.`uname -s`"_"`uname -m`'

setenv ALGOHOME /home/people/XXX
where XXX is your CBS login. Note, if you have made a specific course directory called say "algo", then you should give the path to this directly instead.

The first of these aliases will make the UNIX systen promt you before you remove or move files around. This will help you not to by change remove important file. The last alias defines which system dependend makefile to use when compiling C code in the course.

The last line in the file

set path = ( . $ALGOHOME/bin $path )

tells the system to search for executables (binaries) first in the working directory (.) next in the ~/bin directory and next in the directories defined in the previous $path definition.

Now save the file, and type

source .cshrc

Now the UNIX behavior is updated with your changes.

You can see that the path environment variable has updated by typing

echo $PATH

Then you will see that the path has a "." followed by "/home/people/XXX" (where XXX is you login name), and next a large set of other directories.

UNIX. Part 2

The man gives help to most UNIX commands.

man ls      gets help to the ls command

Go to the test directory

cd test
cp /home/projects/mniel/ALGO/exercises/ex_unix/test.dat .

Remember the "." in the end of the command!

Make a new directory called mydirectory

mkdir mydirectory

Make an empty file (or update time stamp on existing one)

touch myfile

Makes a file called "myfile". (verify it has been created with ls -l)

Moving files - mv

mv myfile mynewfile      move myfile to mynewfile

Removing (deleting) files - rm

rm mynewfile      remove mynewfile
rmdir mydirectory      remove an empty directory
rm -rf mydirectory      remove my directory, including files and subdirectories - no questions asked - make sure this is what you want to do, there is no recycle bin on UNIX; once it is gone it is gone

Viewing files - cat/more/less/head/tail

cat test.dat      write contents of file to screen
head test.dat      write top of file (default 10 lines)
head -30 test.dat      write top 30 lines of file
tail test.dat      Print the last 10 lines of end of file
more test.dat      test.dat, pres "space to go one page down, "q" to quit.
less test.dat      test.dat, pres "space to go one page down, "j" to go one line down, "k" to go one line up"q" to quit.

Editing files - gedit/vi

The gedit      command is used to launch the gedit editor. Examples:
gedit test.dat      edit the file test.dat with gedit

vi is a nerdy editor. Type:
vi test.dat      to edit the file test.dat.
/RLM to search for "RLM"
x to delete a letter
dd to delete a line
5 dd to delete 5 lines
:q! to get out without changing anything, or
"ZZ" to save changes and quit.
To insert text press "i" - to get into insert mode and press "Esc" to get out of insert mode (in all "normal" editors you are automatically in insert mode). You can use "R" and "Esc" to get in and out of replace (overwrite) mode
You may not want to use the vi editor unless you have to e.g. if you can not run x-windows, or edit via a noisy telephoneline from Mars.

Moving data around

Redirecting: |><      
Use | to "pipe" data from one program to another. Example:
cat test.dat | wc      pipe the contents of test.dat into the program called wc (word count) count number of lines, words and bytes in test.dat
Use > to direct data to a file (and overwrite it). Example:
head test.dat > tmp.dat     Put first ten lines of test.dat into tmp.dat
Use >> to direct data to a file and append the data to the contents of the file. Example:
head test.dat >> tmp.dat     Put first ten lines of test.dat into tmp.dat (now it should contain 20 lines)
Use < to get data from a file to a program. Example:
head < test.dat      

Sort file - sort

Example of using sort: sort -n test.dat      sort file
sort -n -k2 test.dat      sort file numerically (big numbers last) after 2rd column
sort -r -n -k2 test.out      sort file reverse numerically after 2rd column
sort -u test.dat      Keep only one copy of each unique line
sort test.dat | uniq -c      Keep only one copy of each unique line and count number of duplicates for each entry

Concatinate side by side - paste

paste test.dat test.dat     

Get lines matching a patern - grep/egrep

grep AAA test.dat      Get lines with "AAA"
grep -v AAA      Get lines that do not contain "AAA"
grep ^AAA test.dat      Get lines starting with "AAA"
grep ".L......V" test.dat      Get lines matching something ("." is a wildcard) "L" six times something and "V"
grep ".[L,V,I]......[V,L]" test.dat      Get lines matching something ("." is a wildcard) "L, V or I" six times something and "V or L"

Awksome programing languages (awk, nawk & gawk)

awk, nawk & gawk are different versions of the same programming language, and are very similar. It is recommended to use gawk or nawk, rather than the original version: awk, since they are more stable and have more features!

Basically gawk will read a file and do something with each line.

Examples of using gawk:

gawk '{print $1}' test.dat       Print first field in file
gawk '{print $1, $2}' test.dat       Print first and second field in file
gawk '{print $0}' test.dat       Print entire line
gawk '{print substr($1,2,5),$0}' test.dat       Print characters 2-6 from first field and complete line in file
echo "Mary had a little lamb" |gawk '{line = $0; gsub (" ","",line);print line}'      Remove all spaces in all lines
gawk -v name=Mary -v animal=lamb '{print name,$1,animal}' test.dat      Passing variables to gawk
echo "THIS+IS+A+SENTENCE+SPLIT+BY+PLUS" | gawk -F "+" '{print $1,$2,$3,$4,$5,$6,$7}'      Split only input on "+" (rather than on any whitespace as is the default)

echo "THIS+IS+A+SENTENCE+SPLIT+BY+PLUS" | gawk -F "+" '{for ( i=1;i<NF;i++ ) { printf( "%s ", $i)}printf( "%s\n", $NF)}'

A more elegant way of doing the same.

What did I do

history      The history command gives the old commands

Geting help - man

The man      gives help to most unix commands. Examples:
man ls      get help to the ls command

Other usefull commands

which cat      Find out where a program (the cat program in this case) is installed. Often when you edit a program and nothing happens it is because you are editing another program than the one you are running
compress test.dat      compress test dat
uncompress      Uncompress a compressed file (.Z files)
zcat test.dat      Print contents of a compressed file to the screen (without uncompressing it)
tar      Pack and unpack files
gunzip      Unzip a zipped file (.gz files)
diff/gdiff      compare two files
chmod      Change permissions
ownership chown      Change ownership

command line options      most unix programs take options in the form "program -option". for example head -5 will print out the first 5 lines of a file
autocompletion      Press TAB to let the unix system complete a file/program name
"arrow up"      press arrow up to get old commands
CTRL a      Go to start of line
CTRL e      Go to end of line

Part 3

If you have more time, you can play a bit with GAWK. Most of the time doing research in bioinformatics is spend transforming data from one output format into another. For doing this, GAWK is a very powerful tool. Here is one example of such a task.

Copy the file 1A68_HUMAN.sprot to your test directory

cd ~/test
cp /home/projects/mniel/ALGO/exercises/ex_unix/1A68_HUMAN.sprot .

This file contain a protein sequence in the Swissprot format. You can see the content by typing

cat 1A68_HUMAN.sprot | gawk '{print $0}' | more

Note, that the gawk command prints everything out, i.e., it does nothing.

Your job is to rewrite the gawk command so that it writes out the SWISSPROT entry in fasta format, ie. a format like


This is all for now