Lesson 13: Perl One-Liners and Trial Exam Set

Prev Index Tank Wars

Required reading
Here is an excerpt from 'man perlrun' about the important command line switches used when doing perl one-liners.
     -a   turns on autosplit mode when used with a -n or -p.  An implicit
          split command to the @F array is done as the first thing inside the
          implicit while loop produced by the -n or -p.
               perl -ane 'print pop(@F), "\n";'
          is equivalent to
              while (<>) {
                  @F = split(' ');
                  print pop(@F), "\n";
              }
          An alternate delimiter may be specified using -F.

     -e commandline
          may be used to enter one line of script.  If -e is given, Perl will
          not look for a script filename in the argument list.  Multiple -e
          commands may be given to build up a multi-line script.  Make sure to
          use semicolons where you would in a normal program.

     -n   causes Perl to assume the following loop around your script, which
          makes it iterate over filename arguments somewhat like sed -n or
          awk:
              while (<>) {
                  ...             # your script goes here
              }
          Note that the lines are not printed by default.  See -p to have
          lines printed.  If a file named by an argument cannot be opened for
          some reason, Perl warns you about it, and moves on to the next file.

     -p   causes Perl to assume the following loop around your script, which
          makes it iterate over filename arguments somewhat like sed:
              while (<>) {
                  ...             # your script goes here
              } continue {
                  print or die "-p destination: $!\n";
              }
          If a file named by an argument cannot be opened for some reason,
          Perl warns you about it, and moves on to the next file.  Note that
          the lines are printed automatically.  An error occuring during
          printing is treated as fatal.  To suppress printing use the -n
          switch.  A -p overrides a -n switch.

Examples
      perl -pe 'tr/ATCG/TAGC/' dna7.fsa
          # This complements every base in the file dna7.fsa.

      perl -ane 'print $F[0] + $F[3], "\n"' datafile
          # Add first and forth columns

      perl -ne 'print if 15 .. 17' *.pod
          # Just lines 15 to 17

      perl -ne '$counter++; END { print "$counter lines"; }' datafile
          # Count lines

      perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_ }' fastafile
          # Split a multi-sequence FASTA file into individual files

Subjects covered
How to call perl from the unix command line in order to perform a simple task, typically a text conversion. Also you will have the opportunity to see the assignments from last years exam.

Necessary files to complete this exercise
To download the files to your system, just press the Shift key while you left click on the blue link. Follow the instructions.
We have reused some of the files from earlier exercises.
dna7.fsa
proteins.netphos


All the following exercises have to be done in Perl with one-liners

  1. Make a perl one-liner that complements all the bases in dna7.fsa but leaves the '>' lines unthouched, which the first example above doesn't do.
  2. The file proteins.netphos contains serine phosphorylation predictions from NetPhos 2.0 on a number of proteins. The file has the following format:
    Nedd4              18  z   0.100  (  0.097   0.120   0.067   0.083   0.131  )  .        DEENSRIVR
    Every row in the file corresponds to a serine. The most important columns are the first column (protein name), the second column (sequence position), the last row but one ('.' means negative prediction, '*S*' means positive prediction). Study the file and make sure you understand it.
  3. Print all rows in proteins.netphos which corresponds to serines in the protein P00520.
  4. Print lines 1 to 250 of proteins.netphos.
  5. Going back to dna7.fsa: rename all FASTA names to numbers starting from '>1', ignore the comment - you can remove it or let it stay. This is something done quite often as some programs insist that FASTA names be discriminative within the first 8 charaters.
  6. Print only the first and the last but one column in proteins.netphos, separated by a tabular.
  7. In proteins.netphos count the number of serines on which the prediction was run in total, also count the number of positive predictions. Output should be single line with this info.

  8. Here is the exam in spring 2003 and here is the solution.


This page was last updated         by Peter Wad Sackett, pws@cbs.dtu.dk