|
Proteins structure commands
Basic commands
ls /usr/cbs/databases/pdb/allpdb/      See all pdb files at CBS
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | less      Read pdb entry 1hge (press space to go one page down)
rasmol /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z      See structure
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | grep "^ATOM.........CA......A" > tmp.pdb      Get C-alpha atoms in A chain
rasmol tmp.pdb      See selected atoms
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | gawk '{if (/^ATOM/){print $0}}'|head      Print atom records using gawk
Example of ATOM lines
ATOM 1 N GLN A 1 79.647 -9.863 41.673 1.00148.74 1 1HGE 458
ATOM 2 CA GLN A 1 78.339 -9.240 41.704 1.00148.69 1 1HGE 459
ATOM 3 C GLN A 1 77.991 -9.916 43.032 1.00148.41 1 1HGE 460
ATOM 4 O GLN A 1 78.781 -9.726 43.965 1.00148.59 1 1HGE 461
ATOM 5 CB GLN A 1 77.503 -9.741 40.499 1.00148.97 1 1HGE 462
ATOM 6 CG GLN A 1 76.513 -8.726 39.888 1.00149.76 1 1HGE 463
ATOM 7 CD GLN A 1 75.435 -8.146 40.813 1.00150.33 1 1HGE 464
ATOM 8 OE1 GLN A 1 75.584 -8.142 42.032 1.00150.63 1 1HGE 465
ATOM 9 NE2 GLN A 1 74.321 -7.645 40.310 1.00150.73 1 1HGE 466
ATOM 10 1H GLN A 1 80.024 -9.640 42.628 1.00 0.00 1 1HGE 467
ATOM 11 2H GLN A 1 79.526 -10.898 41.695 1.00 0.00 1 1HGE 468
ATOM 12 3H GLN A 1 80.276 -9.535 40.926 1.00 0.00 1 1HGE 469
ATOM 13 1HE2 GLN A 1 73.645 -7.304 40.934 1.00 0.00 1 1HGE 470
ATOM 14 2HE2 GLN A 1 74.181 -7.642 39.340 1.00 0.00 1 1HGE 471
ATOM 15 N ASP A 2 76.904 -10.686 43.115 1.00147.82 1 1HGE 472
ATOM 16 CA ASP A 2 76.637 -11.610 44.215 1.00146.80 1 1HGE 473
ATOM 17 C ASP A 2 77.557 -12.824 43.939 1.00145.99 1 1HGE 474
ATOM 18 O ASP A 2 78.750 -12.654 43.617 1.00146.02 1 1HGE 475
ATOM 19 CB ASP A 2 75.114 -11.872 44.111 1.00146.96 1 1HGE 476
ATOM 20 CG ASP A 2 74.508 -12.974 44.963 1.00147.11 1 1HGE 477
ATOM 21 OD1 ASP A 2 74.477 -12.845 46.180 1.00147.52 1 1HGE 478
ATOM 22 OD2 ASP A 2 74.119 -13.994 44.392 1.00147.01 1 1HGE 479
ATOM 23 H ASP A 2 76.173 -10.542 42.482 1.00 0.00 1 1HGE 480
Format of ATOM lines
from http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html
COLUMNS DATA TYPE FIELD DEFINITION
---------------------------------------------------------------------------------
1 - 6 Record name "ATOM "
7 - 11 Integer serial Atom serial number.
13 - 16 Atom name Atom name.
17 Character altLoc Alternate location indicator.
18 - 20 Residue name resName Residue name.
22 Character chainID Chain identifier.
23 - 26 Integer resSeq Residue sequence number.
27 AChar iCode Code for insertion of residues.
31 - 38 Real(8.3) x Orthogonal coordinates for X in
39 - 46 Real(8.3) y Orthogonal coordinates for Y in
47 - 54 Real(8.3) z Orthogonal coordinates for Z in
55 - 60 Real(6.2) occupancy Occupancy.
61 - 66 Real(6.2) tempFactor Temperature factor.
73 - 76 LString(4) segID Segment identifier, left-justified.
77 - 78 LString(2) element Element symbol, right-justified.
79 - 80 LString(2) charge Charge on the atom.
Read pdbfile formated and and write it out again
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | gawk '{if (/^ATOM/){ATOM=substr($0,1,6);serial=substr($0,7,4);name=substr($0,13,4);altLoc=substr($0,17,1);resName=substr($0,18,3);chainID=substr($0,22,1);resSeq=substr($0,23,4);iCode=substr($0,27,1);x=substr($0,31,8)*1.0;y=substr($0,39,8)*1.0;z=substr($0,47,8)*1.0;occupancy=substr($0,55,6)*1.0;tempFactor=substr($0,61,6);segID=substr($0,73,4);element=substr($0,77,2);charge=substr($0,79,2); printf("%6s%5d %4s%1s%3s %s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f%4s%2s%2s\n",ATOM, serial, name, altLoc, resName, chainID, resSeq, iCode, x, y, z, occupancy, tempFactor, segID, element, charge)}}' > tmp.pdb     
The oneliner above contains the core:
gawk '{if (/^ATOM/){READ_A_LINE; PRINT_A_LINE}}' > tmp.pdb     
where the read write statements are defined by:
READ_A_LINE: ATOM=substr($0,1,6);serial=substr($0,7,4);name=substr($0,13,4);altLoc=substr($0,17,1);resName=substr($0,18,3);chainID=substr($0,22,1);resSeq=substr($0,23,4);iCode=substr($0,27,1);x=substr($0,31,8)*1.0;y=substr($0,39,8)*1.0;z=substr($0,47,8)*1.0;occupancy=substr($0,55,6)*1.0;tempFactor=substr($0,61,6);segID=substr($0,73,4);element=substr($0,77,2);charge=substr($0,79,2)
PRINT_A_LINE: printf("%6s%5d %4s%1s%3s %s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f%4s%2s%2s\n",ATOM, serial, name, altLoc, resName, chainID, resSeq, iCode, x, y, z, occupancy, tempFactor, segID, element, charge)
A version og the onliner with more newlines can be found in
pdb2pdb.gawk
You can run it using the command:
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | /usr/opt/www/pub/CBS/researchgroups/immunology/intro/Unix/pdb2pdb.gawk
Read pdbfile formated and and print out the mirror image of the C-alpha atoms in chain B! - How can you see it is mirrored?
pdb1hge.ent.Z | gawk '{if (/^ATOM/){ATOM=substr($0,1,6);serial=substr($0,7,4);name=substr($0,13,4);altLoc=substr($0,17,1);resName=substr($0,18,3);chainID=substr($0,22,1);resSeq=substr($0,23,4);iCode=substr($0,27,1);x=substr($0,31,8)*1.0;y=substr($0,39,8)*1.0;z=substr($0,47,8)*1.0;occupancy=substr($0,55,6)*1.0;tempFactor=substr($0,61,6);segID=substr($0,73,4);element=substr($0,77,2);charge=substr($0,79,2); if (name == " CA "&&chainID=="B") {printf("%6s%5d %4s%1s%3s %s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f%4s%2s%2s\n",ATOM, serial, name, altLoc, resName, chainID, resSeq, iCode, x, y, -z, occupancy, tempFactor, segID, element, charge)}}}' > tmp.pdb     
     
     
     
     
     
     
     
     
     
     
|