CSE469-Assignment 2 Association Analysis Solved

Starting from:

$25

Two datasets (Market, Gene) can be found on Piazza. For each dataset, we provide the transaction-item representation discussed in class—Each row denotes a transaction, and each transaction consists of a set of items.

In this assignment, you are asked to implement Apriori algorithm that discovers a collection of frequent itemsets from a transaction database.

Template is for Python 3. You are asked to fill in two functions: apriori_gen and get_freq. apriori_gen generates new candidate itemsets based on the frequent itemsets found in the previous iteration, and prunes the itemsets containing any infrequent itemset of size k-1. get_freq returns the candidate itemsets that meet a minimum support threshold. Do not change the input and output of the two functions in the template.

Please take the following steps:

1. Implement Apriori algorithm. The pseudo code can be found below. Apriori (T, minSupport){

// T is the database and minSupport is the minimum support 𝐿1 = {the set of single item whose support is not less than minSupport} for (k = 2; 𝐿𝑘−1 ≠ ∅; k++ ){

//generate and prune candidate set 𝐶𝑘

𝐶𝑘 is a list of itemsets in which each itemset is formed by merging two itemsets in 𝐿𝑘−1 if their first k-2 items are identical

Remove an itemset from 𝐶𝑘 if any (k-1)-subset of this candidate itemset is not in the frequent itemset list 𝐿𝑘−1

//Count the support of each candidate itemset for each transaction t in database do{

for each candidate 𝑐 in 𝐶𝑘

// increment the support count of all candidate itemsets that are contained in transaction t

if c is a subset of t then count[c] ← count[c] + 1

}

for each candidate 𝑐 in 𝐶𝑘

// Judge if a candidate itemset is frequent or not

if the support of c is not less than minSupport

then include c in 𝐿𝑘

} return {𝐿1, 𝐿2, … , 𝐿𝑘}

}

You cannot directly call a function or package that implements Apriori. You need to implement the algorithm by yourself. If you are not sure about whether it is OK to use a certain function, please post your question on Piazza.

2. Apply your implemented Apriori on the Market dataset with minimum support threshold=50%. You should get the following candidate itemsets (Ck) and frequent itemsets (Lk):

Candidate itemsets:

C2: {Eggs,Key-chain}, {Eggs,Mango}, {Eggs,Onion}, {Eggs,Yo-yo}, {Key-chain, Mango}, {Key-chain, Onion}, {Key-chain,Yo-yo}, {Mango,Onion}, {Mango,Yo-yo},

{Onion,Yo-yo}

C3: {Eggs,Key-chain,Onion}, {Key-chain,Mango,Onion}

Frequent itemsets:

L1: {Eggs}, {Key-chain}, {Mango}, {Onion}, {Yo-yo}

L2: {Eggs,Key-chain}, {Eggs,Onion}, {Key-chain,Mango}, {Key-chain,Onion}, {Keychain,Yo-yo}, {Mango, Onion}

L3: {Eggs, Key-chain,Onion}

3. If you get the same collection of itemsets at Step 2, you can proceed to apply your implemented Apriori algorithm on the Gene dataset with minimum support threshold=50%. You should be able to get 51 length-1 frequent itemsets (L1), 1275 length-2 candidate itemsets (C2), 29 length-2 frequent itemsets (L2), 20 length-3 candidate itemsets (C3) and 2 length-3 frequent itemsets (L3).

4. Prepare your submission. Your final submission should be a zip file named as Assignment2.zip. In the zip file, you should include:

• A folder “Code”, which contains all the codes used in this assignment. You are required to fill in the template. Do not change other functions or add other scripts.

• Report: A doc or pdf file named as Assignment2.pdf. The report should consist of the following parts: 1) The frequent itemsets you obtain on Gene dataset (L1, L2, L3). 2) The length-3 candidate itemsets generated during Apriori (C3) on Gene dataset. 3) The codes of your Apriori algorithm implementation.

More products

CS290 Project 2 - Tomasulo Solved

$30

Add to cart

CS580L- Homework 0 Probability Solved

$25

Add to cart

CS290 Project 1 -Multi-level Cache Solved

$30

Add to cart