Starting from:

$10

BIOMI609-Assignment 1 Solved

1) You will write a program in Python/R/C (just pick a language you like – I’m listing ones here that I prefer) that will take as input a FASTQ file and print the distribution of quality scores across all reads. You can summarize the distribution of Q scores at each base with a statistic of your choice (e.g. mean, mode, median, quantile distribution). If you’d like, you can also plot the distribution of Q scores as a box plot much like what’s generated by FASTQC. You will then run your program on the provided FASTQ file, and obtain the output from it.

 

Note that a FASTQ file has the following format:

 

 

This format is repeated for each read. 

 

 

 

The idea is real simple; for each character in the quality score line, the ASCII value of that character - 33 = Q. Thereon, Q = -10log10Pe, where Pe is the probability of error in calling that nucleotide base. 

 

Here are functions in various languages to convert to the ASCII encoding:

Python: ord()

R: iconv()

C: When you scanf() the character, you scanf() with a %c, which automatically converts it into its ASCII encoding

More products