Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Proteins structure commands



Basic commands
ls /usr/cbs/databases/pdb/allpdb/      See all pdb files at CBS
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | less      Read pdb entry 1hge (press space to go one page down)
rasmol /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z      See structure
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | grep "^ATOM.........CA......A" > tmp.pdb      Get C-alpha atoms in A chain
rasmol tmp.pdb      See selected atoms
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | gawk '{if (/^ATOM/){print $0}}'|head      Print atom records using gawk


Example of ATOM lines
ATOM      1  N   GLN A   1      79.647  -9.863  41.673  1.00148.74   1  1HGE 458
ATOM      2  CA  GLN A   1      78.339  -9.240  41.704  1.00148.69   1  1HGE 459
ATOM      3  C   GLN A   1      77.991  -9.916  43.032  1.00148.41   1  1HGE 460
ATOM      4  O   GLN A   1      78.781  -9.726  43.965  1.00148.59   1  1HGE 461
ATOM      5  CB  GLN A   1      77.503  -9.741  40.499  1.00148.97   1  1HGE 462
ATOM      6  CG  GLN A   1      76.513  -8.726  39.888  1.00149.76   1  1HGE 463
ATOM      7  CD  GLN A   1      75.435  -8.146  40.813  1.00150.33   1  1HGE 464
ATOM      8  OE1 GLN A   1      75.584  -8.142  42.032  1.00150.63   1  1HGE 465
ATOM      9  NE2 GLN A   1      74.321  -7.645  40.310  1.00150.73   1  1HGE 466
ATOM     10 1H   GLN A   1      80.024  -9.640  42.628  1.00  0.00   1  1HGE 467
ATOM     11 2H   GLN A   1      79.526 -10.898  41.695  1.00  0.00   1  1HGE 468
ATOM     12 3H   GLN A   1      80.276  -9.535  40.926  1.00  0.00   1  1HGE 469
ATOM     13 1HE2 GLN A   1      73.645  -7.304  40.934  1.00  0.00   1  1HGE 470
ATOM     14 2HE2 GLN A   1      74.181  -7.642  39.340  1.00  0.00   1  1HGE 471
ATOM     15  N   ASP A   2      76.904 -10.686  43.115  1.00147.82   1  1HGE 472
ATOM     16  CA  ASP A   2      76.637 -11.610  44.215  1.00146.80   1  1HGE 473
ATOM     17  C   ASP A   2      77.557 -12.824  43.939  1.00145.99   1  1HGE 474
ATOM     18  O   ASP A   2      78.750 -12.654  43.617  1.00146.02   1  1HGE 475
ATOM     19  CB  ASP A   2      75.114 -11.872  44.111  1.00146.96   1  1HGE 476
ATOM     20  CG  ASP A   2      74.508 -12.974  44.963  1.00147.11   1  1HGE 477
ATOM     21  OD1 ASP A   2      74.477 -12.845  46.180  1.00147.52   1  1HGE 478
ATOM     22  OD2 ASP A   2      74.119 -13.994  44.392  1.00147.01   1  1HGE 479
ATOM     23  H   ASP A   2      76.173 -10.542  42.482  1.00  0.00   1  1HGE 480


Format of ATOM lines
 from http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html
COLUMNS        DATA TYPE       FIELD         DEFINITION
---------------------------------------------------------------------------------
 1 -  6        Record name     "ATOM  "
 7 - 11        Integer         serial        Atom serial number.
13 - 16        Atom            name          Atom name.
17             Character       altLoc        Alternate location indicator.
18 - 20        Residue name    resName       Residue name.
22             Character       chainID       Chain identifier.
23 - 26        Integer         resSeq        Residue sequence number.
27             AChar           iCode         Code for insertion of residues.
31 - 38        Real(8.3)       x             Orthogonal coordinates for X in
39 - 46        Real(8.3)       y             Orthogonal coordinates for Y in
47 - 54        Real(8.3)       z             Orthogonal coordinates for Z in
55 - 60        Real(6.2)       occupancy     Occupancy.
61 - 66        Real(6.2)       tempFactor    Temperature factor.
73 - 76        LString(4)      segID         Segment identifier, left-justified.
77 - 78        LString(2)      element       Element symbol, right-justified.
79 - 80        LString(2)      charge        Charge on the atom.


Read pdbfile formated and and write it out again
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | gawk '{if (/^ATOM/){ATOM=substr($0,1,6);serial=substr($0,7,4);name=substr($0,13,4);altLoc=substr($0,17,1);resName=substr($0,18,3);chainID=substr($0,22,1);resSeq=substr($0,23,4);iCode=substr($0,27,1);x=substr($0,31,8)*1.0;y=substr($0,39,8)*1.0;z=substr($0,47,8)*1.0;occupancy=substr($0,55,6)*1.0;tempFactor=substr($0,61,6);segID=substr($0,73,4);element=substr($0,77,2);charge=substr($0,79,2); printf("%6s%5d %4s%1s%3s %s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f%4s%2s%2s\n",ATOM, serial, name, altLoc, resName, chainID, resSeq, iCode, x, y, z, occupancy, tempFactor, segID, element, charge)}}' > tmp.pdb     

The oneliner above contains the core:
gawk '{if (/^ATOM/){READ_A_LINE; PRINT_A_LINE}}' > tmp.pdb     
where the read write statements are defined by:
READ_A_LINE: ATOM=substr($0,1,6);serial=substr($0,7,4);name=substr($0,13,4);altLoc=substr($0,17,1);resName=substr($0,18,3);chainID=substr($0,22,1);resSeq=substr($0,23,4);iCode=substr($0,27,1);x=substr($0,31,8)*1.0;y=substr($0,39,8)*1.0;z=substr($0,47,8)*1.0;occupancy=substr($0,55,6)*1.0;tempFactor=substr($0,61,6);segID=substr($0,73,4);element=substr($0,77,2);charge=substr($0,79,2)

PRINT_A_LINE: printf("%6s%5d %4s%1s%3s %s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f%4s%2s%2s\n",ATOM, serial, name, altLoc, resName, chainID, resSeq, iCode, x, y, z, occupancy, tempFactor, segID, element, charge)


A version og the onliner with more newlines can be found in pdb2pdb.gawk
You can run it using the command:
zcat /usr/cbs/databases/pdb/allpdb/pdb1hge.ent.Z | /usr/opt/www/pub/CBS/researchgroups/immunology/intro/Unix/pdb2pdb.gawk

Read pdbfile formated and and print out the mirror image of the C-alpha atoms in chain B! - How can you see it is mirrored?
pdb1hge.ent.Z | gawk '{if (/^ATOM/){ATOM=substr($0,1,6);serial=substr($0,7,4);name=substr($0,13,4);altLoc=substr($0,17,1);resName=substr($0,18,3);chainID=substr($0,22,1);resSeq=substr($0,23,4);iCode=substr($0,27,1);x=substr($0,31,8)*1.0;y=substr($0,39,8)*1.0;z=substr($0,47,8)*1.0;occupancy=substr($0,55,6)*1.0;tempFactor=substr($0,61,6);segID=substr($0,73,4);element=substr($0,77,2);charge=substr($0,79,2); if (name == " CA "&&chainID=="B") {printf("%6s%5d %4s%1s%3s %s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f%4s%2s%2s\n",ATOM, serial, name, altLoc, resName, chainID, resSeq, iCode, x, y, -z, occupancy, tempFactor, segID, element, charge)}}}' > tmp.pdb