$25
Problem 1: [30 pts]
Let A be an array of n elements. A heavy element of A is any element that appears more than 3/20 times, i.e., is more than 15% of the items.
For example, if n = 14 then a heavy element is one that appears at least times.
As examples, consider the arrays A, B and C below:
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
A[i]
2
6
3
5
6
3
8
8
2
7
4
9
9
4
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
B[i]
3
3
3
5
6
3
8
3
6
7
4
9
9
4
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
C[i]
3
3
3
6
6
3
8
3
6
6
6
9
9
9
A contains no heavy items. B contains the unique heavy item 3. C contains the three heavy items 3, 6 and 9.
Design an O(nlogn) time divide-and-conquer algorithm for finding all the heavy items (if any) in an array.
Your solution should be split into the following three parts. Write the answer to each part separately.
(a) (i) Write documented pseudocode for a procedure Heavy(i,j) that returns the set of heavy items in subarray A[i...j].
(ii) Below the pseudocode, provide a description in words (as opposed to code) of what your algorithm is doing.
(b) Explain (prove) why your algorithm is correct.
(c) Let T(n) be the worst case number of total operations of all types performed by your algorithm. Derive a recurrence relation for T(n) (explain how you derived the recurrence relation).
Show that T(n) = O(nlogn).
Note: If the recurrence relation is in the form
and T(1) = c2 (1)
for some c1,c2 ≥ 0 you may immediately conclude that T(n) = O(nlogn) without providing any further explanation other than saying you are calling the Master Theorem (and explaining which case of the master theorem it is and why).
Otherwise, you must prove from first principles that whatever recurrence relation you derived implies T(n) = O(nlogn) for all values of n.
See below for requirements, hints and general comments.
Requirements, Hints and Comments:
• Hint: The solution to Tutorial problem DC12 will help you get started.
• Hint: Let S = Heavy(i,j). Prove that subarray A[i...j] cannot contain more than 6 heavy items. You will need this fact in your analysis
• Sorting is not permitted.
The only comparisons allowed to items in the array are equalities. Inequalities are not permitted. More specifically:
– You MAY write “If A[i] = A[j]” or “If A[i] = x”.
– You MAY NOT write “If A[i] < A[j]” or “If A[i] < x”. Note that this immediately implies that you can NOT sort the array (since sorting requires inequality comparisons).
– In particular you MAY NOT solve the problem by sorting the array and then counting all of the items with the same value.
• Part (a) requires two different types of explanations as to what your algorithm is doing. The first is the documentation IN the pseudocode. The second is the explanation AFTER the pseudo-code. BOTH must be provided.
• Part (b) is a proof of correctness. Write it as you would a mathematical proof. Explicitly state any facts upon which the proof depends. Incomplete proofs will have points deducted. See the Practice Homework on the Tutorial page for pointers as to how this should be structured.
• Answer Parts (a) and (b) separately. Do not worry if your explanation in Part (a) and your proof of correctness in Part (b) overlap somewhat.
• We are specifically asking for a divide-and-conquer algorithm here, e.g., a generalization of DC12. We are aware that faster non divideand-conquer algorithms do exist and you might be able to find one by Googling. Such algorithms will not be accepted as answers. The purpose of this exercise is to use the divide-and-conquer approach to solve the problem.
• The call Heavy(1,n) should run in O(nlogn) time and return a set. You can do this by writing Return(S), where S is a set.
• Set Notation: Your pseudocode will need to use a set data structure with associated set operations. You should use standard mathematical notation for writing set operations (do not make up your own set notation or use a notation from a programming language you know).
See the next page for more information on how to write set operations in pseudocode along with associated running times.
Set notation in pseudocode: In what follows, S,S1,S2 are all sets.
Recall that sets do not contain repeated items.
So, if S1 = {a,b}, S2 = {b,c} and S = S1 ∪ S2, then S = {a,b,c}.
|S| denotes the size of S, e.g, |S1| = |S2| = 2, |S1 ∪ S2| = 3, |S1 ∩ S2| = 1. If S = ∅ (the empty set), then |S| = 0.
“∪” denotes the union of sets. ‘∩” denotes the intersection of sets.
S1 \S2 (equivalently, S1 −S2) is the set S1 with all items in S1 ∩S2 removed.
• Create sets using the notation “Create Set S”. This creates empty set S in O(1) time.
• Evaluating |S| requires O(1) time.
• Setting S1 = S2 will require O(|S2| + 1) time.
• You can add element x to set S by writing S = S ∪ {x}.
• Evaluating S = S1 ∪ S2, S = S1 ∩ S2 and S = S1 \ S2 all require
O((|S1| + 1)(|S2| + 1)) time.
• Evaluating the statement “x ∈ S” requires O(|S|+1) time and returns TRUE if x is in S and FALSE otherwise.
• You may write pseudocode in the form
∀x ∈ S do
Work(x)
where Work(x) is some code dependent upon x.
If S = {x1 ...,xk}, the time required to run this code will be O(1) plus the total of the work required to run all of Work(x1), Work(x2),...,Work(xk).
Example 1: The following code finds the unique items in array A:
Create Set S
For i = 1 to n do
S = S ∪ {A[i]}
It uses O(n · m) time, where m is the number of unique items in A[1...n].
Example 2: The following code constructs S1 ∩ S2 :
Create Set S ∀x ∈ S1 do
If x ∈ S2 then
S = S ∪ {x}
It runs in O((|S1| + 1)(|S2| + 1)) time.
Problem 2 [28 pts] Indicator Random Variables
A[1...n] is an array. In addition to A you are given 2 other structures that are built from A: a binary tree T and a m × m 2D matrix M. In each of these structure we also define N(v), the set of neighbors of item v. Each of these structures and their neighbor definitions are defined below. See the diagram on the next page for examples.
(1) Array A: This is just the standard array.
Neighbors are the items to the left and right of i.
{i − 1,i + 1} if 1 < i < n
N(i) = {2}
{n − 1}
if i = 1 if i = n
(2) Binary Tree T :
In this case we assume that n = 2k − 1 for some integer k ≥ 0.
The correspondence between A and T is given below (and is also in the Heapsort slides page 9 onwards. Please see those for more explanation).
Each node in the tree corresponds to some element of array A. Node i will have value T[i] = A[i].
The root of the tree is node 1, and has no parent.
The parent of node i 6= 1 is node bi/2c. The left and right children of node i are nodes 2i and 2i + 1, if they exist
if v = 1 if 1 < i < (n + 1)/2 if (n + 1)/2 ≤ i ≤ n
(3) m × m 2D matrix M.
In this case we assume that n = m2.
M(i,j) = A[(i − 1)m + j]. An element’s neighbors are its (at most 4) vertical and horizontal neighbors. That is,
NM((i,j)) = {(i + 1,j), (i − 1,j), (i,j + 1), (i,j − 1)}\{(x,y) : 1 ≤ x ≤ m, 1 ≤ y ≤ m}.
Now solve the six subproblems below. See the end of this problem for an explanation of what “Derive the expected number” means and how your solutions should be structured.
T
A has 5 local minima and 6 saddle points;
For all of the six problems assume that A is a uniform random permutation of < 1,2,...,n > . This means that every one of the n! possible permutations is equally likely to occur.
The corresponding T and M are built from A as previously described.
The solutions to all six of the problems must use the indicator random variable technique to derive the result.
(a) Counting local minima.
(i) A[i] is a local minimum in A if it is smaller than all of its neighbors,
i.e, A[i] < minv∈N(i) A[v].
Derive the expected number of local minima in A.
Hint: Let Xi be the indicator random variable for the event that A[i] is a local minimum. What is E(Xi) (this might depend upon i)? How does knowing the E(Xi) let you answer the problem?
(ii) T[i] is a local minimum in T if it is smaller than all of its neighbors,
i.e, T[i] < minv∈NT(i) T[v].
Derive the expected number of local minima in T.
(iii) M(i,j) is a local minimum in M if it is smaller than all of its neighbors, i.e, M(i,j) < min(x,y)∈NM(i,j) M(x,y).
Derive the expected number of local minima in M.
(b) Counting Saddle points.
(i) A[i] is a saddle point in A if i 6= 1, i 6= n, and A[i] is larger than one of its neighbors and smaller than the other one of its neighbors. Derive the expected number of saddle points in A.
(ii) T[i] is a saddle point in T if 1 < i < (n + 1)/2 and T[i] is smaller than its parent but larger than both of its children. Derive the expected number of saddle points in T.
(iii) M(i,j) is a saddle point in M if M(i,j) is smaller than all of its neighbors in the same row and larger than all of its neighbors in the same column.
Derive the expected number of saddle points in M.
Structure of the Solution: The solution to each of the six subproblems must be written separately, with space between the solution to each subproblem
For each subproblem, you should derive the expected number of local minima or saddle points. as follows, using a three part solution:
(A) First clearly define the indicator random variable(s) that you will beusing to solve this problem.
(B) Next, derive the expectation of each such indicator random variable.
(C) Finally, solve the full subproblem, by deriving a closed formula in n (no summations “P” allowed) for the requested value.
Problem 3 [24 pts] (Using Selection)
Note: Recall the Selection problem of calculating Select(A,p,r,i) that we learned in class. We actually learned a randomized linear, i.e., O(r − p + 1), time algorithm for solving Selection. We also stated that a deterministic linear time algorithm for solving this problem exists. For the sake of this problem, you may assume that you have been given this deterministic linear time algorithm and can use it as a subroutine. (calling it exactly as Select(A,p,r,i)). You may not use any other algorithm as a subroutine without explicitly providing the code and proving correctness from scratch.
The Manhattan Distance between two 2-dimensional points p = (x,y) and p0 = (x0,y0) is d1(p,p0) = |x − x0| + |y − y0|.
In what follows you are given real numbers x1,...,xn and y1,...,yn. They are NOT necessarily sorted.
Define the n two-dimensional points pi = (xi,yi). The Manhattan Warehouse problem is to find a point p = (x,y) that minimizes the average distance to all the pi. That is, to find p such that
.
(Dividing both sides of the equation by gives the average.)
The goal of this problem is to design a O(n) worst case time algorithm for solving the Manhattan Warehouse problem. We split this into two parts.
(A) Solve the one-dimensional problem.
Given (unsorted) x1,...xn, define the function
.
Call ¯x a center if it minimizes f(x), i.e.,
∀x ∈ <, f(¯x) ≤ f(x).
(a) Prove that there exist z1,z2 such that z1 ≤ z2 and (i) f(x) is monotonically decreasing for x ∈ [−∞,z1].
(ii) f(z1) = f(x) = f(z2) for all x ∈ [z1,z2].
(iii) f(x) is monotonically increasing for x ∈ [z2,∞].
Note that it is possible that z1 = z2.
(b) Give an O(n) time algorithm for finding a center of n one-dimensional points x1,...,xn.
First give documented pseudocode for your algorithm. In addition, below your algorithm, explain in words/symbols what your algorithm is doing
(c) Prove correctness of your algorithm. Your proof must be mathematically formal and understandable.
(d) Explain why your algorithm runs in O(n) time.
(B) Solve the Manhattan Warehouse Problem.
(a) Give an O(n) time algorithm for solving the Manhattan Warehouse Problem given n points p1,p2,...,pn where pi = (xi,yi).
First give documented pseudocode for your algorithm. In addition, below your algorithm, explain in words/symbols what your algorithm is doing
(b) Prove correctness of your algorithm. Your proof must be mathematically formal and understandable.
(c) Explain why your algorithm runs in O(n) time.
Notes:
• For simplicity, you may assume that none of the xi or yi repeat, i.e., there is no pair i,j such that xi = xj or yi = yj.
• Part A(a) is only meant as a hint to get you started. Note that if you know z1,z2, then (you must prove this) any x ∈ [z1,z2] will be a solution to the one-dimensional center problem.
A more detailed understanding of the behavior of f(x), that lets you find z1,z2, therefore permits solving the remainder of A. You may, if you like, fully analyze the behavior of f(x) in A(a), write this as a mathematical lemma and then quote that lemma in A(c).
• We strongly recommend that before trying to solve part A you choose 5 or 6 points xi and then graph the resultant function f(x). Seeing the diagram should help you solve the problem.
• The algorithm for part (B) can use the result of part (A) as a subroutine.
• The main work in solving this problem is in understanding how to solve it using selection. Once you understand that, the actual pseudocode might be quite short.
P4: [18 pts] Recurrence Relations
Give asymptotic upper bounds for T(n) satisfying the following recurrences. Make your bounds as tight as possible. For example, if T(n) = Θ(n2) then T(n) = O(n2) is a tight upper bound but T(n) = O(n2 logn) is not. Your upper bound should be written in the form T(n) = O(nα(logn)β) where α,β are appropriate constants.
A correct answer will gain full credits. It is not necessary to show yourwork BUT, if your answer was wrong, showing your work steps may gain you partial credits.
If showing your work, you may quote theorems shown in class. For (a), (b) and (c) you may assume that n is a power of 2; for (d) you may assume that n is a power of 4; for (e) that n is a power of 7; for (f) that n is a power of 3