|
Introduction to UNIX
In this exercise you shall work on a brief introduction to the UNIX/Linux system.
It is expected that you all have some prior knowledge about programming, so the introduction
is very short. In the first part of the exercise you shall set-up your account and
copy some essential files needed to make the c-programs during the course. Next, you shall go through some
small exercises giving you a more detailed introduction to Unix/Linux.
Linux. Part 1
Connect to the CBS server
Double click on the SSH Secure Shell
Client and chose File>Quick Connect. Fill in Host: login.cbs.dtu.dk,
and your username. You will then be prompted for your password.
Basic commands
If you have experience in using Linux/UNIX and have a particular setup for your account, you can make
a course directory (say algo) and do all of the course exercises in this directory.
Where am I? - pwd
pwd      This command returns the path to your current location (the current directory)
(and this the command that is used to construct your prompt)
Make a new directory - mkdir
Examples:
mkdir test
mkdir data
mkdir -p src
mkdir -p bin
Make directories test, data, src, bin. The -p option gives no error if the directory already exists.
Copy all files from the directory to the data directory
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/data/Cprog/* ./data/
* means everything in the directory, "." means to here, so ./data means the sub-directory data from where your are
now.
What is in this directory? - ls
Examples:
ls      short listing of current directory (a directory is often called a folder in windows)
ls ..      short listing of directory below current directory - ".." means one directory down "../.." is two directories down
ls data      short listing mail directory (equivalent to ls ./data - "." means here)
ls -l data      detailed listing of projects directory
ls -ltr data      long listing sorted by time (t) and reversed (r): newest files last (essential for old bioinformaticians who can not remember what they just did)
ls /usr/bin/      list programs in "/usr/bin/" directory.
paths starting with "/" are absolute addresses starting at the root folder (normally called C:\ in windows) -
as opposed to relative addresses (addresses relative to where you are in the folder hieraki)
I want to go to? - cd
The cd      command is used to move around in the file system.
Examples:
cd ..      up one level
cd /usr/local/bin/      go to absolute (not relative) address
cd      go to my home directory
Now you can copy some essential files and directories that will be needed to make the c-programs during the course.
Copying (more) files - cp
cd src
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/cprog/* .
cp -R /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/utils .
These commands copy all files from /usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/cprog and the complete
/usr/opt/www/pub/CBS/courses/27625.algo/exercises/code/utils directories to where you are (in the src directory).
Note, the -R option means recursively, and hence will copy the complete directory including all files and
sub-directories.
Setup you Unix/Linux account
You can specify some of the Unix functionality by editing the .cshrc file in your
home directory.
Go to your home directory (type cd). Use your favorit editor (gedit, emacs, vi, ..) to edit the file.
After the line
# System independent actions (examples)
follows a list aliases that are defined to make Linux/unix work a bit more user friendly. Examples
alias cp 'cp -i'
alias mv 'mv -i'
alias rm 'rm -i'
alias make 'make -f Makefile.`uname -s`"_"`uname -m`'
setenv ALGOHOME /home/people/XXX
where XXX is your CBS login. Note, if you have made a specific course directory called say "algo", then you should give the path to this directly instead.
The first of these aliases will make the UNIX systen promt you before you remove or move files around. This will help you not to by change remove important file. The last alias defines which system dependend makefile to use when compiling C code in the course.
The last line in the file
set path = ( . $ALGOHOME/bin $path )
tells the system to search for executables (binaries) first in the working directory (.) next in the ~/bin
directory and next in the directories defined in the previous $path definition.
Now save the file, and type
source .cshrc
Now the UNIX behavior is updated with your changes.
You can see that the path environment variable has updated by typing
echo $PATH
Then you will see that the path has a "." followed by "/home/people/XXX" (where XXX is you login name),
and next a large set of other directories.
UNIX. Part 2
The man gives help to most UNIX commands.
Examples:
man ls      gets help to the ls command
Go to the test directory
cd
cd test
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/ex_unix/test.dat .
Remember the "." in the end of the command!
Make a new directory called mydirectory
mkdir mydirectory
Make an empty file (or update time stamp on existing one)
touch myfile
Makes a file called "myfile". (verify it has been created with ls -l)
Moving files - mv
Examples:
mv myfile mynewfile      move myfile to mynewfile
Removing (deleting) files - rm
rm mynewfile      remove mynewfile
rmdir mydirectory      remove an empty directory
rm -rf mydirectory      remove my directory, including files and subdirectories - no questions asked - make sure this is what you want to do, there is no recycle bin on UNIX; once it is gone it is gone
Viewing files - cat/more/less/head/tail
Examples:
cat test.dat      write contents of file to screen
head test.dat      write top of file (default 10 lines)
head -30 test.dat      write top 30 lines of file
tail test.dat      Print the last 10 lines of end of file
more test.dat      test.dat, pres "space to go one page down, "q" to quit.
less test.dat      test.dat, pres "space to go one page down, "j" to go one line down, "k" to go one line up"q" to quit.
Editing files - gedit/vi
The gedit      command is used to launch the gedit editor.
Examples:
gedit test.dat      edit the file test.dat with gedit
vi is a nerdy editor.
Type:
vi test.dat      to edit the file test.dat.
/RLM to search for "RLM"
x to delete a letter
dd to delete a line
5 dd to delete 5 lines
:q! to get out without changing anything, or
"ZZ" to save changes and quit.
To insert text press "i" - to get into insert mode and press "Esc" to get out of insert mode
(in all "normal" editors you are automatically in insert mode). You can use "R" and "Esc" to get in and out of replace (overwrite) mode
You may not want to use the vi editor unless you have to e.g. if you can not run x-windows, or edit via a noisy telephoneline from Mars.
Moving data around
Redirecting: |><      
Use | to "pipe" data from one program to another. Example:
cat test.dat | wc      pipe the contents of test.dat into the program called wc (word count) count number of lines, words and bytes in test.dat
Use > to direct data to a file (and overwrite it). Example:
head test.dat > tmp.dat     Put first ten lines of test.dat into tmp.dat
Use >> to direct data to a file and append the data to the contents of the file. Example:
head test.dat >> tmp.dat     Put first ten lines of test.dat into tmp.dat (now it should contain 20 lines)
Use < to get data from a file to a program. Example:
head < test.dat      
Sort file - sort
Example of using sort:
sort -n test.dat      sort file
sort -n -k2 test.dat      sort file numerically (big numbers last) after 2rd column
sort -r -n -k2 test.out      sort file reverse numerically after 2rd column
sort -u test.dat      Keep only one copy of each unique line
sort test.dat | uniq -c      Keep only one copy of each unique line and count number of duplicates for each entry
Concatinate side by side - paste
Example:
paste test.dat test.dat     
Get lines matching a patern - grep/egrep
Example:
grep AAA test.dat      Get lines with "AAA"
grep -v AAA      Get lines that do not contain "AAA"
grep ^AAA test.dat      Get lines starting with "AAA"
grep ".L......V" test.dat      Get lines matching something ("." is a wildcard) "L" six times something and "V"
grep ".[L,V,I]......[V,L]" test.dat      Get lines matching something ("." is a wildcard) "L, V or I" six times something and "V or L"
Awksome programing languages (awk, nawk & gawk)
awk, nawk & gawk are different versions of the same programming language,
and are very similar. It is
recommended to use gawk or nawk, rather than the original version: awk, since
they are more stable and have more features!
Basically gawk will read a file and do something with each line.
Examples of using gawk:
gawk '{print $1}' test.dat       Print first field in file
gawk '{print $1, $2}' test.dat       Print first and second field in file
gawk '{print $0}' test.dat       Print entire line
gawk '{print substr($1,2,5),$0}' test.dat       Print characters 2-6 from first field and complete line in file
echo "Mary had a little lamb" |gawk '{line = $0; gsub (" ","",line);print line}'      Remove all spaces in all lines
gawk -v name=Mary -v animal=lamb '{print name,$1,animal}' test.dat      Passing variables to gawk
echo "THIS+IS+A+SENTENCE+SPLIT+BY+PLUS" | gawk -F "+" '{print $1,$2,$3,$4,$5,$6,$7}'      Split only input on "+" (rather than on any whitespace as is the default)
echo "THIS+IS+A+SENTENCE+SPLIT+BY+PLUS" | gawk -F "+" '{for ( i=1;i<NF;i++ ) { printf( "%s ", $i)}printf( "%s\n", $NF)}'
A more elegant way of doing the same.
What did I do
history      The history command gives the old commands
Geting help - man
The man      gives help to most unix commands.
Examples:
man ls      get help to the ls command
Other usefull commands
which cat      Find out where a program (the cat program in this case) is installed. Often when you edit a program and nothing happens it is because you are editing another program than the one you are running
compress test.dat      compress test dat
uncompress      Uncompress a compressed file (.Z files)
zcat test.dat      Print contents of a compressed file to the screen (without uncompressing it)
tar      Pack and unpack files
gunzip      Unzip a zipped file (.gz files)
diff/gdiff      compare two files
chmod      Change permissions
ownership chown      Change ownership
command line options      most unix programs take options in the form "program -option". for example head -5 will print out the first 5 lines of a file
autocompletion      Press TAB to let the unix system complete a file/program name
"arrow up"      press arrow up to get old commands
CTRL a      Go to start of line
CTRL e      Go to end of line
Part 3
If you have more time, you can play a bit with GAWK. Most of the time doing research in bioinformatics is
spend transforming data from one output format into another. For doing this, GAWK is a very powerful tool. Here is one
example of such a task.
Copy the file 1A68_HUMAN.sprot to your test directory
cd ~/test
cp /usr/opt/www/pub/CBS/courses/27625.algo/exercises/ex_unix/1A68_HUMAN.sprot .
This file contain a protein sequence in the Swissprot format. You can see the content by typing
cat 1A68_HUMAN.sprot | gawk '{print $0}' | more
Note, that the gawk command prints everything out, i.e., it does nothing.
Your job is to rewrite the gawk command so that it writes out the SWISSPROT entry in fasta format, ie. a format
like
>1A68_HUMAN
MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRF
DSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQ
MMYGCDVGSDGRFLRGYRQDAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQW
RAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLT
WQRDGEDQTQDTELVETRPAGDGTFQKWVAVVVPSGQEQRYTCHVQHEGLPKPLTLRWEP
SSQPTIPIVGIIAGLVLFGAVITGAVVAAVMWRRKSSDRKGGSYSQAASSDSAQGSDVSL
TACKV
This is all for now
|