Note • You are expected to submit both a report and code. The submission format is specified on CCLE under HW5 description.
• Copying and sharing of homework are NOT allowed. But you can discuss general challenges and ideas with others. Suspicious cases will be reported to The Office of the Dean of Students.
• “# =================== YOUR CODE HERE ===================” is used where input from you is needed in the code file.
1 Frequent Pattern Mining for Set Data Given a transaction database shown in Table 1, answer the following questions. Note that the parameter min support is set as 2.
(a) Find all the frequent patterns using Apriori Algorithm. Details of the procedure are expected.
(b) Construct and draw the FP-tree of the transaction database.
(c) For the item d, show its conditional pattern base (projected database) and conditional FP-tree.
(d) Find frequent patterns based on d’s conditional FP-tree.
Table 1: The transaction database for the question 1.
Introduction to Data Mining (UCLA CS 145) Homework #5
2 Apriori for Yelp In apriori.py, fill in the missing lines, with the following parameters (already set in the code): min_support=50, min_conf=0.25, and ignore_one_item_set=True. Output the frequent patterns and rules associated with the Yelp data (the same one as the project) which we have stored in yelp.csv and id_name.csv. Do NOT modify the print_items_rules() function and directly copy the entire output of the following command in your report in plain text format (do NOT take a screenshot):
python2.7 apriori.py
What patterns and rules do you see? Where are these businesses located? What do these results mean? Do a quick Google search and briefly interpret the patterns and rules mined from Yelp in 50 words or less.
3 Correlation Analysis Table 2 shows how many transactions containing beer and/or nuts among 10000 transactions. Answer the following questions based on Table 2.
(a) Calculate confidence, lift, and all confidence between buying beer and buying nuts.
(b) What are your conclusions of the relationship between buying beer and buying nuts, based on the above measures?
Table 2: Contingency table for question 2.
Beer No Beer Totel Nuts 150 700 850 No Nuts 350 8800 9150 Total 500 9500 10000 4 Sequential Pattern Mining (GSP Algorithm) (a) For a sequence s = hab(cd)(ef)i, how many events or elements does it contain? What is the length of s? How many non-empty subsequences does s contain?
(b) Suppose we have L3 = {h(ac)ei,hb(cd)i,hbcei,ha(cdi,h(ab)di,h(ab)ci} as the frequent 3sequences, write down all the candidate 4-sequences C4 with the details of the join and pruning steps.