Starting from:

$30

CS145- Homework 5 Solved

Note
•   You are expected to submit both a report and code. The submission format is specified on CCLE under HW5 description.

•   Copying and sharing of homework are NOT allowed. But you can discuss general challenges and ideas with others. Suspicious cases will be reported to The Office of the Dean of Students.

•   “# =================== YOUR CODE HERE ===================” is used where input from you is needed in the code file.

1        Frequent Pattern Mining for Set Data
Given a transaction database shown in Table 1, answer the following questions. Note that the parameter min support is set as 2.

(a)   Find all the frequent patterns using Apriori Algorithm. Details of the procedure are expected.

(b)   Construct and draw the FP-tree of the transaction database.

(c)    For the item d, show its conditional pattern base (projected database) and conditional FP-tree.

(d)   Find frequent patterns based on d’s conditional FP-tree.

Table 1: The transaction database for the question 1.

TID
Items
1
b,c,j
2
a,b,d
3
a,c
4
b,d
5
a,b,c,e
6
b,c,k
7
a,c
8
a,b,e,i
9
b,d
10
a,b,c,d
1

Introduction to Data Mining (UCLA CS 145)                                                                                                        Homework #5



2        Apriori for Yelp
In apriori.py, fill in the missing lines, with the following parameters (already set in the code): min_support=50, min_conf=0.25, and ignore_one_item_set=True. Output the frequent patterns and rules associated with the Yelp data (the same one as the project) which we have stored in yelp.csv and id_name.csv. Do NOT modify the print_items_rules() function and directly copy the entire output of the following command in your report in plain text format (do NOT take a screenshot):

python2.7 apriori.py

What patterns and rules do you see? Where are these businesses located? What do these results mean? Do a quick Google search and briefly interpret the patterns and rules mined from Yelp in 50 words or less.

3        Correlation Analysis
Table 2 shows how many transactions containing beer and/or nuts among 10000 transactions. Answer the following questions based on Table 2.

(a)   Calculate confidence, lift, and all confidence between buying beer and buying nuts.

(b)   What are your conclusions of the relationship between buying beer and buying nuts, based on the above measures?

Table 2: Contingency table for question 2.

 
Beer
No Beer
Totel
Nuts
150
700
850
No Nuts
350
8800
9150
Total
500
9500
10000
4        Sequential Pattern Mining (GSP Algorithm)
(a)   For a sequence s = hab(cd)(ef)i, how many events or elements does it contain? What is the length of s? How many non-empty subsequences does s contain?

(b)   Suppose we have L3 = {h(ac)ei,hb(cd)i,hbcei,ha(cdi,h(ab)di,h(ab)ci} as the frequent 3sequences, write down all the candidate 4-sequences C4 with the details of the join and pruning steps.

2

More products