BIOMI609-Assignment 1 Solved

Starting from:

$10

1) You will write a program in Python/R/C (just pick a language you like – I’m listing ones here that I prefer) that will take as input a FASTQ file and print the distribution of quality scores across all reads. You can summarize the distribution of Q scores at each base with a statistic of your choice (e.g. mean, mode, median, quantile distribution). If you’d like, you can also plot the distribution of Q scores as a box plot much like what’s generated by FASTQC. You will then run your program on the provided FASTQ file, and obtain the output from it.

Note that a FASTQ file has the following format:

This format is repeated for each read.

The idea is real simple; for each character in the quality score line, the ASCII value of that character - 33 = Q. Thereon, Q = -10log10Pe, where Pe is the probability of error in calling that nucleotide base.

Here are functions in various languages to convert to the ASCII encoding:

Python: ord()

R: iconv()

C: When you scanf() the character, you scanf() with a %c, which automatically converts it into its ASCII encoding

More products

CIS407A Week 1 iLab Annual Salary Calculator Solution

$15

Add to cart

CIS336 Lab 7 Working with Views Solution

$10

Add to cart

CIS336 Lab 6 Group Functions and Subqueries Solution

$10

Add to cart