$25
Problem 1: Pattern-matching: The brute-force
Problem 1.1: The brute-force pattern-matching algorithm [10 pt.]
Describe a text D and a pattern P such that the brute-force pattern-matching algorithm runs in Ω(dp) time.The lengths of D and P are d and p, respectively.
Problem 1.2: Python’s str class and pattern-matching [20 pt]
In this part, you are asked to modify three pattern matching programs given to you (See appendix). Run your modified programs for varying-length patterns and show your results.
The count method in Python’s str class takes a text D and a pattern P and returns the maximum number of non-overlapping occurrences of a P within D. As an example ‘cdcdcdcdc’.count(‘cdc’) returns 2.
1. Modify the brute-force pattern-matching to return non-overlapping occurrences of a P within D.
2. Similar to the previous question (Problem 1.2.1), do the same on the Boyer-Moore program.
3. Similar to problem 1.2.1, modify the KMP program.
Problem 2: Experimental Analysis of Pattern-Matching Algorithms [20 pt.]
Perform an experimental analysis of pattern matching algorithms in terms of:
1. Number of character comparison: Perform an experimental analysis of the efficiency of the brute-force, the KMP and Boyer-Moore pattern matching algorithms for varying-length patterns.
2. Relative speed comparison: Perform an experimental comparison of the brute-force, KMP, and Boyer-Moore pattern-matching algorithms. Run each algorithm against large text documents using varying-length patterns and report the relative running times.
Problem 3: Matrix-chain Multiplication
The matrix-chain multiplication problem: Given a chain of < D1,D2,...,Dn > of n matrices fully parenthesize the product < D1·D2 ···Dn > in a way so that the number of scalar multiplications is minimized. Each Di has a pi−1 × pi dimension and i = 1,2,...,n.
1. The Brute-Force: [10 pt.]: Implement a Python program to solve the matrix-chain multiplication problem by the brute force algorithm.
2. Bottom-up Dynamic Programming [20 pt.]: Implement a Python program to solve the matrix-chain multiplication problem using bottom-up dynamic programming approach.
3. Dynamic Programming with Memoization [Extra Credit, 10 pt.]: Implement a Python program to solve the matrix-chain multiplication problem using dynamic programming with memoization.
Problem 4: Longest Common Sub-sequence (LCS) Problem [20 pt.]
Implement a Python program to solve LCS problem using dynamic programming. Run your program to find the best sequence alignment between DNA strings. Show your results.
Longest Common Sub-sequence (LCS) problem: Given two character strings over some alphabet, find a longest string that is a sub-sequence of given two strings.
Data source: https://www.ncbi.nlm.nih.gov/genbank/
Directions
Please follow the syllabus guidelines in turning in your homework. While testing your programs, run them with a variety of inputs and discuss your findings. This homework is due Sunday, Nov 14, 2021 10:00pm. OBSERVE THE TIME. Absolutely no homework will be accepted after that time. All the work should be your own.
Appendix
Python program for the Brute-Force pattern-matching algorithm
1 # Brute force
2 def find_brute(T, P):
3 n, m = len(T), len(P)
4 # every starting position 5 for i in range(n-m+1):
6 k = 0
7 # conduct O(k) comparisons
8 while k < m and T[i+k] == P[k]:
9 k += 1
10 if k == m:
11 return i 12 return -1
Python program for the Boyer-Moore pattern-matching algorithm
1 # Boyer-Moore
2 def find_boyer_moore(T, P):
3 n, m = len(T), len(P)
4 if m == 0:
5 return 0
6 last = {}
7 for k in range(m):
8 last[P[k]] = k
9 i = m-1
10 k = m-1
11 while i < n: 12 # If match, decrease i,k 13 if T[i] == P[k]:
14 if k == 0: 15 return i 16 else:
17 i -= 1
18 k -= 1
19 # Not match, reset the positions
20 else:
21 j = last.get(T[i], -1)
22 i += m - min(k, j+1)
23 k = m-1
24 return -1
Python program for the Knuth-Morris-Pratt pattern-matching algorithm
1 # KMP failure function
2 def compute_kmp_fail(P):
3 m = len(P)
4 fail = [0] * m
5 j = 1
6 k = 0
7 while j < m:
8 if P[j] == P[k]:
9 fail[j] = k+1
10 j += 1
11 k += 1
12 elif k > 0: 13 k = fail[k-1] 14 else:
15 j += 1
16 return fail
1 # KMP
2 def find_kmp(T, P): 3 n, m = len(T), len(P)
4 if m == 0:
5 return 0
6 fail = compute_kmp_fail(P)
7 # print(fail)
8 j = 0
9 k = 0
10 while j < n:
11 if T[j] == P[k]:
12 if k == m-1:
13 return j-m+1
14 j += 1
15 k += 1
16 elif k > 0: 17 k = fail[k-1] 18 else:
19 j += 1
20 return -1