Starting from:

$25

CSE469-Assignment 2 Association Analysis Solved

Two datasets (Market, Gene) can be found on Piazza. For each dataset, we provide the transaction-item representation discussed in class—Each row denotes a transaction, and each transaction consists of a set of items.  

 

In this assignment, you are asked to implement Apriori algorithm that discovers a collection of frequent itemsets from a transaction database.

 

Template is for Python 3. You are asked to fill in two functions: apriori_gen and get_freq. apriori_gen generates new candidate itemsets based on the frequent itemsets found in the previous iteration, and prunes the itemsets containing any infrequent itemset of size k-1. get_freq returns the candidate itemsets that meet a minimum support threshold. Do not change the input and output of the two functions in the template.

 

Please take the following steps:

 

1. Implement Apriori algorithm. The pseudo code can be found below.  Apriori (T, minSupport){  

        // T is the database and minSupport is the minimum support     𝐿1 = {the set of single item whose support is not less than minSupport}     for (k = 2; 𝐿𝑘−1 ≠ ∅; k++ ){

            //generate and prune candidate set 𝐶𝑘

 𝐶𝑘 is a list of itemsets in which each itemset is formed by merging two itemsets in 𝐿𝑘−1 if their first k-2 items are identical

Remove an itemset from 𝐶𝑘 if any (k-1)-subset of this candidate itemset is not in the frequent itemset list 𝐿𝑘−1

//Count the support of each candidate itemset              for each transaction t in database do{

                                       for each candidate 𝑐 in 𝐶𝑘  

    // increment the support count of all candidate itemsets that are contained in transaction t 

                                          if  c is a subset of t then count[c] ← count[c] + 1   

            }

                           for each candidate 𝑐 in 𝐶𝑘  

    // Judge if a candidate itemset is frequent or not 

                                          if  the support of c is not less than minSupport  

then include c in  𝐿𝑘      

}  return {𝐿1, 𝐿2, … , 𝐿𝑘}

      }

You cannot directly call a function or package that implements Apriori. You need to implement the algorithm by yourself. If you are not sure about whether it is OK to use a certain function, please post your question on Piazza.

 

2.               Apply your implemented Apriori on the Market dataset with minimum support threshold=50%. You should get the following candidate itemsets (Ck) and frequent itemsets (Lk):

 

Candidate itemsets:

C2: {Eggs,Key-chain}, {Eggs,Mango}, {Eggs,Onion}, {Eggs,Yo-yo}, {Key-chain, Mango}, {Key-chain, Onion}, {Key-chain,Yo-yo}, {Mango,Onion}, {Mango,Yo-yo},

{Onion,Yo-yo}

C3: {Eggs,Key-chain,Onion}, {Key-chain,Mango,Onion}

 

Frequent itemsets:

L1: {Eggs}, {Key-chain}, {Mango}, {Onion}, {Yo-yo}

L2: {Eggs,Key-chain}, {Eggs,Onion},  {Key-chain,Mango}, {Key-chain,Onion}, {Keychain,Yo-yo}, {Mango, Onion}

L3: {Eggs, Key-chain,Onion}

 

3.               If you get the same collection of itemsets at Step 2, you can proceed to apply your implemented Apriori algorithm on the Gene dataset with minimum support threshold=50%. You should be able to get 51 length-1 frequent itemsets (L1), 1275 length-2 candidate itemsets (C2), 29 length-2 frequent itemsets (L2), 20 length-3 candidate itemsets (C3) and 2 length-3 frequent itemsets (L3).  

 

4.               Prepare your submission. Your final submission should be a zip file named as Assignment2.zip.  In the zip file, you should include:

•        A folder “Code”, which contains all the codes used in this assignment. You are required to fill in the template. Do not change other functions or add other scripts.

•        Report: A doc or pdf file named as Assignment2.pdf. The report should consist of the following parts:  1) The frequent itemsets you obtain on Gene dataset (L1, L2, L3). 2) The length-3 candidate itemsets generated during Apriori (C3) on Gene dataset.  3) The codes of your Apriori algorithm implementation.  

 

More products