Starting from:

$30.99

CS218-Data Structures: Assignment 1 Solved

Finding the Frequent itemset in Grocery Store

Assignment Description:

This assignment will give you basic insight into using Apriori algorithm. Apriori is use for finding the frequent item set in transaction. For example, in Supermarket store where customers can buy different categories of items. There is always a pattern for what a customer buy. This pattern changes according to the buyer.  For example, if a buyer is a Player, and he buys products such as Bat, Ball then, its most probable that he will purchase a tape too. But if the buyer is a mother having babies, she will buy baby products such as milk and diapers. In short, every buyer’s transaction involves a pattern. Our goal is to find those patterns in these transactions. Profit is automatically generated if the relationship is found between the items purchased in different transactions.

Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. Let’s understand Apriori algorithm using an example step by step:

You are given the grocery store dataset. Text file contains:

The first row of the file contains Support thresholdg. 0.5.
The second row contains total number of transactions.
Followed by transactions. Each transaction contains different set of items separated by comma. E.g. Bread, Cheese, Egg, Juice etc
Example:

0.5 



Bread,Cheese,Egg,Juice 

Bread,Chesse,Juice 

Bread,Milk,Yougrt 

Bread,Juice,Milk 

Cheese,Juice,Milk 

Bread,Egg,Cheese,Juice 

Support threshold is 0.5 and there are total no of 6 transaction and the first transaction is Bread,Cheese,Egg,Juice in the above example.

Support threshold determines, is the item popular or not.

Step 01:

Read the file and store all the transactions in 2D char Array. And evaluate Support threshold=0.5*No of transaction. So, you get threshold 0.5 * 6 = 3. Note: File should be read once.

Step 02: 

Now you have to find the frequency of all the items using 2D char Array.

Step 03: 

The set of 1st – itemset whose occurrence is satisfying the Support threshold are determined. Only those items which count more than or equal to Support threshold are taken ahead for the next iteration and the others are pruned (deleted). E.g. Egg and Yogurt have frequencies less than 3.Remove Items that does not meet the minimum support threshold and sort them. The output should look like this.

 

Items 
Frequency 
Bread 

Juice 

Cheese  

Milk 

                                                       Table 02: 1st - ItemSet                                                                                             

Step 04: 

Next, 2-item pair candidate frequencies are discovered. The 2-item pairs are generated by forming a group of 2 items using 1st – ItemSet (Table 02) and 2D char Array.

Step 05: 

The 2-itemset candidates are pruned (deleted) using Support threshold value and then sort them.

Now the table will have 2 –item sets with Support threshold.

 

Item 
Frequency 
Bread,Juice 

Bread, Cheese 

Cheese,Juice 

                                              Table 04: 2nd – ItemSet 

Step 06: 

The 3-item pair candidates with frequencies are generated by grouping pair of 3 items using 1st – ItemSet (Table 02) and 2D char Array.

Step 07: 

The 3-itemset candidates are pruned (deleted) using Support threshold value.

 

Item                                    Frequency 

Bread, Cheese,Juice         3 

                                            Table 06: 3nd – ItemSet 

 

You are only required to generate 1st , 2nd & 3rd ItemSets. 

Output files & Marks distribution: 

Generate 1st – ItemSet (as showed in Table 02) and write on 1-ItemSet.txt (
Generate 2nd – ItemSet (as showed in Table 04) and write on 2-ItemSet.txt ( Generate 3rd – ItemSet (as showed in Table 06) and write on 3-ItemSet.txt (
Note: Path of the file must be same as the Project path.

More products