Starting from:

$30

CSCI570-Analysis of Algorithms Solved

II.   Project Description
A. Algorithm Description symbolsConsider the sets๐‘ฅ1, ๐‘ฅ2 , . , ๐‘ฅ๐‘š and ๐‘Œ consistsand๐‘‹ of the sequence of symbols๐‘Œ as representing the different๐‘‹ ๐‘ฆ1, ๐‘ฆ2 , . , ๐‘ฆ๐‘›.
 Suppose we are given two strings     and   , where     consists of the sequence of
positions in the strings{1, 2,๐‘‹ and. , ๐‘š๐‘Œ}, and consider{1, 2, . ,a matching of these sets; Recall that๐‘›}

a matching is a set of ordered pairs with the property that each item occurs in at are no “crossing” pairs: if ๐‘€ ฯต ๐‘€ and ๐‘– < ๐‘–' , then ๐‘— < ๐‘—'. Intuitively, most one pair. We say that a matching  of these two sets is an alignment if there an alignment gives a way of lining up the two strings, by telling us which pairs of(๐‘–, ๐‘—) (๐‘–', ๐‘—') positions will be lined up with one another.

Our definition of similarity will be based on finding the optimal alignment
between and , according to the following criteria. Suppose is a given alignment between and :

1.  First,position๐‘‹       thereof๐‘Œ ๐‘‹isora parameterthat๐‘Œ is not matchedδ๐‘’ > 0 thatin    defines— it isaa gapgap —penalty.๐‘€we incurForaeachcost
                                                of δ.           ๐‘‹    ๐‘Œ                                     ๐‘€
2. Second, for each pair of letters     in our alphabet, there is a mismatch
costappropriateof α๐‘๐‘ž formismatchlining upcost๐‘ with๐‘๐‘žfor,.๐‘žThus,liningforupeachwith(๐‘–, ๐‘—) ฯต. ๐‘€One, wegenerallypay the
assumes that = 0 for eachα๐‘ฅletter๐‘–๐‘ฆ๐‘— —there is no๐‘ฅ๐‘– mismatch๐‘ฆ๐‘— cost to line up a letter with anotherα๐‘๐‘ copy of itself—although๐‘ this will not be necessary in anything that follows. alignment of minimum cost.๐‘€

3.The cost of   is the sum of its gap and mismatch costs, and we seek an

B. Input string Generator
The input to the program would be a text file containing the following information:

1.2.  First base string (Next lines consist of indices after which the๐‘ 1)  copy of the previous string needs to be inserted in the cumulative string. (eg given below)๐‘— 3.4.  Second base string (Next  lines consist of indices after which the๐‘ 2) copy of the previous string needs to be inserted in the cumulative string. (eg given below)๐‘˜

This information would help generate 2 strings from the original 2 base strings. This file could be used as an input to your program and your program could use the base strings and the rules to generate the actual strings. Also note that the
numbers and      correspond to the first and the second string respectively. Make length and similarly,2๐‘˜ * ๐‘™๐‘’๐‘›(๐‘ 2)        need not be equal to    . ๐‘—         ๐‘˜
    sure you validate the length of the first and the second string to beand           ๐‘—           ๐‘˜ . Please note that the base strings need not have to be of equal2๐‘— * ๐‘™๐‘’๐‘›(๐‘ 1)

ACTG

3

6

1

TACG

1

2

9

Using the above numbers, the generated strings would be

ACACTGACTACTGACTGGTGACTACTGACTGG and

TATTATACGCTATTATACGCGACGCGGACGCG

Following is the step by step process on how the above strings are generated.

ACTG

ACTGACTG

ACTGACTACTGACTGG

ACACTGACTACTGACTGGTGACTACTGACTGG

TACG

TATACGCG

TATTATACGCGACGCG

TATTATACGCTATTATACGCGACGCGGACGCG

C. Values for Delta and Alphas
Values for α’s are as follows. δ๐‘’ is equal to 30.
A
C
G
T
A
0
110
48
94
C
110
0
118
48
G
48
118
0
110
T
94
48
110
0
D. Programming/Scripting Languages
Following are the list of languages which could be used:

1.      C

2.      C++

3.      Java

4.      Python

5.      Python3

E. Bounds
1.      Basic Algorithm

0   <= ๐‘—, ๐‘˜ <= 10

2.      1Memory Efficient Algorithm<= ๐‘™๐‘’๐‘›๐‘— *(๐‘  ๐‘™๐‘’๐‘›1), (๐‘™๐‘’๐‘›๐‘ 1)(,๐‘  22)๐‘˜ *<=๐‘™๐‘’๐‘›2000(๐‘ 2) <= 2000

1 <= 2

0   <= ๐‘—, ๐‘˜ <= 20

1   <= ๐‘™๐‘’๐‘›๐‘— *(๐‘  ๐‘™๐‘’๐‘›1), (๐‘™๐‘’๐‘›๐‘ 1)(,๐‘  22)๐‘˜ *<=๐‘™๐‘’๐‘›20000(๐‘ 2) <= 20000 1 <= 2

III.     Goals
Following are the goals to achieve for your project

A.    Your program should take 2 arguments

1.      input file path

2.      output file path (If path is valid and file not found, your program should create it)

`python2 basic_2.py input.txt output.txt` `java Basic input.txt output.txt`

`python3 basic_3.py input.txt output.txt`
Note: As mentioned in Part II-B input file will have data to generate strings. Since Gap penaltythem in your program.(δ๐‘’) and Mismatch penalty (α๐‘๐‘ž) are FIXED, you have to hardcode You are not allowed to use any libraries.

B.    Implement the Dynamic Programming algorithm. Your program should print the following information at the respective lines in output file:

1.      Cost of the alignment (Integer)

2.      First string alignment ( Consists of A, C, T, G, _ (gap) characters)

3.      Second string alignment ( Consists of A, C, T, G, _ (gap) characters )

4.      Time in Milliseconds (Float)

5.      Memory in Kilobytes (Float)

Note: There can be multiple alignments which have the same cost. You can print ANY alignment generated by your program. The only condition is it should have a minimum cost.

e.g. For strings ๐‘ 1: A and ๐‘ 2: C, alignments A_, _C and _A, C_ both have alignment cost 60 which is minimum. You can print any one of them.

C.    Implement the memory efficient version of this solution and repeat the tests in Part B.

D.    Plot the results of Part B and Part C using a line gr

 

More products