Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Data Basics


This is a small exercise where we will try to identify the quality encoding of some reads.


Read quality encoding table

  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
    (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

Identify quality encoding

Use the table above (the table is from the Fastq wikipedia page) to identify the quality encoding of these three reads. You only have to differentiate between Sanger (S), Solexa (X) and Illumina (I, J).

@HWUSI-EAS656_0037_FC:3:1:16637:1035#NNNNNN/1
CATATTTTGTGGCTCATCCCAAGGGAGAGGTTTTTCTATACTCAGGAGAAGTTACTCACGATAAAGAGAA
+
41?8FFF@@DAGGGEDF@FGECGGGBG@GE.EEBGBDADBBEEBEEC>ACE>CD?EEC?CAB>EB:BC##

@FC42RW0AAXX:3:1:2:1038#NNNNNN/1
GTGTTCTCTGCGACCCGTAATTCAGCTTTTTCCGGTTGCTTTGCCCTTTGCACCTTATCCTGCACCATCTCGC
+
a]baaaa`aaaV`a_aa^Y^`_`_aa___`a]U__\\`][Z_^^R]YWWW[SWZ[QFY[VVWZWBBBBBBBBB

@I330_1_FC30JM6AAXX:4:1:13:1602/1
ATGTAGAAGTGTTTGATACGGCGATTTCAAACATTGCAGGGCTT
+I330_1_FC30JM6AAXX:4:1:13:1602/1
hhhhhhhhhhhhhhhhhhYh^hhhH[I>B^AABGDK;KBP??FN

Q1. What encoding is read 1?
Q2. What encoding is read 2?
Q3. What encoding is read 3?
Q4. Can you think of situations were it is not possible to differentiate between Solexa and Illumina quality encoding?

Congratulations you finished the exercise!