|
UNIX
Connect to the unix machine at CBS
On Windows
Double click on the SSH Secure Shell
Client and chose File>Quick Connect. Fill in Host: "login.cbs.dtu.dk",
and your username at CBS. You will then be prompted for your password.
On Mac
Open X11
In the terminal window type:
ssh -X login.cbs.dtu.dk
--------------------------------------------------------------------------------
Basic commands
Where am I? - pwd
pwd This command returns the path to your current location (the current directory, as you can also see in your promt)
Copy the immu00 directory and its contents to here:
cp -R /usr/opt/www/pub/CBS/courses/27685.imm/exercise_unix/immu00 .
-R means recursively, i.e, include everything in the directory "." means to here, so remember the period in the end of this command.
What is in this directory? - ls
Examples:
ls short listing of the content of the current directory (a directory is called a folder in Windows or Mac OS)
ls .. short listing of content of the directory above the current directory [".." means one directody up "../.." is two directories up]
ls immu00 short listing mail directory (equivalent to ls ./immu00 ["." means here])
ls -l immu00 detailed listing of projects directory
ls -ltr immu00 long listing sorted by time (t) and reversed (r): newest files last
(essential for old bioinformaticians who can not remember what they just did)
Paths starting with "/" are absolute addresses starting at the root dirctory (normally called C:\ in Windows) -
as oposed to relative addresses (adresses relative to where you are in the folder hirachy)
Make new directory - mkdir
Examples:
mkdir testdir Make a new directory (folder) with the name testdir in the directory (folder) where you are now.
mkdir mynewdir Make a new directory (folder) with the name mynewdir in the directory where you are now
I want to go to? - cd
The cd command is used to move around in the file system.
Examples:
cd testdir go to the testdir directory (relative address to where you are)
cd .. up one level
cd go to my home directory
cd immu00 go to immu00 directory (verify you are there by the pwd command)
Moving or renaming files - mv
Examples:
touch myfile Makes a new empty file
mv myfile mynewfile Rename myfile to mynewfile
mv mynewfile testdir Moves the file mynewfile into the directory named testdir (How can you check that this has actually happened?)
Removing (deleting) files - rm and empty directories (folders) -rmdir
Examples:
rm mynewfile removes (deletes) mynewfile
rmdir mydirectory remove an empty directory
rmdir testdir remove an empty directory (this directory is not empty thus this didn't succeed)
rm -rf testdir remove a directory, including files and subdirectories - no questions asked - make sure this is what you want to do,
there is no recycle bin on UNIX; once it is gone it is gone!
Copying files - cp
Examples:
touch myfile make file called "myfile"
cp myfile mynewfile copy myfile to mynewfile
Viewing text files - cat/more/less/head/tail
Examples:
cat test.dat write contents of file to screen
head test.dat write top of file (default 10 lines)
head -30 test.dat write top 30 lines of file
tail test.dat write the last 10 lines of end of file
tail -25 test.dat write the last 30 lines of the file
more test.dat show test.dat pagewise, pres "space to go one page down, "q" to quit.
less test.dat show test.dat pagewise, pres "space to go one page down, "j" to go one line down, "k" to go one line up, "q" to quit.
Editing files - n/nedit
The n, or nedit (the first is a shortcut alias for the latter) command is used to launch the nedit editor.
Examples:
n test.dat edit the file test.dat with nedit
Executing Programs
Examples:
Moving data around
Redirecting: |, > and <
Use | to "pipe" (or send) data from one program to another.
Example:
cat test.dat | wc pipe the contents of test.dat into the program called wc (word count) count number of lines, words and bytes in test.dat
Use > to direct data to a file (and overwrite it). Example:
head test.dat > tmp.dat first ten lines of test.dat into tmp.dat
Use > to direct data to a file and append the data to the contents of the file.
Example:
head test.dat >> tmp.dat first ten lines of test.dat into tmp.dat (now it should contain 20 lines)
Use < to get data from a file to a program.
Example:
head < test.dat
Awksome programing languages (awk, nawk, gawk)
awk, nawk, and gawk are different versions of the same programming language,
and are very similar. It is recommended to use gawk or nawk, rather than the original version: awk, since they are more stable and have more features!
Basically gawk will read a file and do something with each line.
Examples of using gawk:
gawk '{print $1}' epitope2protein.HLA-D_m13.out Print first field in file
gawk '{print $1, $3}' epitope2protein.HLA-D_m13.out Print first and third field in file
cat epitope2protein.HLA-D_m13.out|gawk '{print $1}' Print first field in file getting data from standard input
cat epitope2protein.HLA-D_m13.out|gawk '{if (/NP/) {print $1}}' Print first field in lines containing "NP"
cat epitope2protein.HLA-D_m13.out|gawk '{if (/^NP/) {print $1}}' Print first field in lines starting with "NP"
gawk '{print substr($7,2,5)}' epitope2protein.HLA-D_m13.out Print five characters of the seventh column,
starting with the second letter (in the seventh column).
NB! awk numbers strings starting with 1, where many other programming and
scripting languages starts numbering from 0!
gawk '{print substr($7,length($7)-3,4)}' epitope2protein.HLA-D_m13.out Print last four letters in seventh column
echo "Mary had a little lamb" |gawk '{line = $0; gsub (" ","",line);print line}'
Remove all spaces in all lines
gawk -v name=Mary -v animal=lamb '{print name,$1,animal}' epitope2protein.HLA-D_m13.out
Passing variables to gawk
gawk -F "\t" '{print $1}' epitope2protein.HLA-D_m13.out Split only input on tabulators (rather than on any whitespace as is the default)
head epitope2protein.HLA-D_m13.out | gawk 'BEGIN{print "Here comes the data"}{print $1}END{print "No more data"}'
statements in BEGIN{} and END{} are executed before and after
the data lines are read, respectively
A more complex example: You have a file called epitope2protein.HLA-D_m13.out
with a protein sequence in the 7th column
and the residuenumber in the sequence where an epitope starts in the third column.
You want to print out the sequence surounding the start of the epitope
(in this case the first five resigues of the epitope and the four residues before the epitope) in a format that
can be read by the sequence motif visualization program logo. The first line in the output must be
"* Aligned protein sequences.", and each sequence motif must be followed by a ".". Furthermore only
motifs that are nine amino acids long must be printed. This is what the command can look like:
cat epitope2protein.HLA-D_m13.out | gawk 'BEGIN{print "* Aligned protein sequences."}{s=substr($7,$3-4,9);if (length(s)==9){print s"."}}' | /home/projects/projects/vaccine/bin/logo2 -p - | gawk 'BEGIN{pr=0}{if (/\%\!PS-Adobe-3.0 EPSF-3.0/){pr=1} if(pr==1){print $0}}' > logo.ps
Sort file - sort
Example of using sort:
Getting a test file:
cp /usr/opt/www/pub/CBS/researchgroups/immunology/intro/Unix/test.out .
sort -n test.out sort file numerically
sort -n -k3 test.out sort file numerically (big numbers last) by 3rd column
sort -r -n -k3 test.out sort file reverse numerically (big numbers first) by 3rd column
sort -u pdb.mhc.spnam Keep only one copy of each unique line
sort pdb.mhc.spnam | uniq -c Count the number of each unique line
Execute a string
putting `` around a command makes a unix execute the command corresponding to
the string:
echo pwd       print the string pwd
`echo pwd`       execute the command pwd
echo pwd|sh       echo the string pwd to the shell - which will then execute it
This will be used in the next example.
Do something with many variables - foreach
Example 1: print each entry in list to screen
foreach entry (a b c)
      echo $entry
end
Example 2: get each swissprot entry from list and print it
foreach entry (`gawk '{print $1}' test.out`)
      echo $entry |sed 's/.*|//'| xargs getsprot
end
NB:echo ENV_HV1H2| xargs getsprot      is the same as getsprot ENV_HV1H2     
Warning: the string within () is limited to a few thousand charectors
Contatinate side by side - paste
Example:
paste pdb.mhc.nam pdb.mhc.spnam     
Get lines matching a patern - grep/egrep
Example:
grep 1A68_HUMAN pdb.mhc.spnam Get lines with "1A68_HUMAN"
grep -v HUMAN pdb.mhc.spnam Get lines that do not contain "HUMAN"
grep _HUMAN pdb.mhc.spnam Get lines with "_HUMAN" (Human swiss prot sequences)
grep ^KA pdb.mhc.spnam Get lines starting with "KA" (Human swiss prot sequences)
grep "^KA.*MOUSE" pdb.mhc.spnam Get lines matching "KA" - something ("." is a wildcard; "*" means repeated zero or more times) - "MOUSE"
What did I do
history
The history command returns the 100 - 500 last executed commands depending on the shell settings
Other usefull commands
which cat
Find out where a program (the cat program in this case) is installed. Often when you edit a program and nothing happens it is because you are editing another program than the one you are running
gunzip Unzip a zipped file (.gz files)
command line options
most unix programs take options in the form "program -option". for example head -5 will print out the first 5 lines of a file (5 is an option), or the -l option to ls (ls -l)
diff/gdiff compare tho files
chmod Change permissions (who can read, write to, or execute a file or script)
ownership chown Change ownership of a file
autocompletion Press TAB to let the unix system complete a file/program name
"arrow up" press arrow up to get old commands
<ctrl> a Go to the start of the commandline
<ctrl> e Go to the end of the commandline
Can I use it?
Now the training is over and you have to solve a small problem using some of the commands above.
By the command
getsprot 1A68_HUMAN     
you can get the SWISSPROT entry 1A68_HUMAN, and print it out to the screen
The command
getsprot 1A68_HUMAN|gawk '{print $0}'     
takes the output from the previous command and runs it through a gawk command that print everything out, i.e., it does nothing.
Your job is to rewrite the gawk command so that it writes out the SWISSPROT entry in fasta format:
>1A68_HUMAN
MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRF
DSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQ
MMYGCDVGSDGRFLRGYRQDAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQW
RAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLT
WQRDGEDQTQDTELVETRPAGDGTFQKWVAVVVPSGQEQRYTCHVQHEGLPKPLTLRWEP
SSQPTIPIVGIIAGLVLFGAVITGAVVAAVMWRRKSSDRKGGSYSQAASSDSAQGSDVSL
TACKV
Q1: Which lines contain the information you need to make the fasta file?
Q2: How can you recognise these lines.
Q3: Explain which fields in these lines do should be printed out to get the correct output,
and/or alternatively how they should be edited to make correct output.
Q4: Explain what a programe that converts the SWISSPROT entry to a fasta entry should do.
Q5: Give the code of a gawk program that does this (or use any other programming language(s)).
by the command
getsprot -f pdb.mhc.spnam     
you get a lot SWISSPROT entries
Q6: Give the code of a gawk program that converts all these to fasta format (mabye the code you developed above is general
enough to be used again).
Q7: Write a program that counts how many fasta entries there are in a file.
Q8: There are several copies of some of the fasta entries.
Can you think of a way to only print each of the fasta entries out once?
|