### Problem 4

Consider the problem of mining association rules as presented in the papers "Mining Association rules between sets of items in large databases" (by R. Agrawal, T. Imilinski, and A. Swami), and "Fast Algorithms for Mining Association Rules" (by R. Agrawal, R. Srikant).

Follow the Apriori algorithm to obtain ALL the association rules with at least 50% support and 100% confidence that are derivable from the set of transactions below (example taken from M.J. Zaki's KDD lecture notes). Show the results obtained in each step of the algorithm.

```Items
A = Jane Austen
C = Agatha Christie
D = Sir Arthur Conan Doyle
T = Mark Twain
W = P.G. Wodehouse

Transaction Database:

Transaction ID   Items

1           A   C       T   W
2               C   D       W
3           A   C       T   W
4           A   C   D       W
5           A   C   D   T   W
6               C   D   T
```

#### Solution by Weiyang Lin:

The problem of discovering all association rules can be divided into two parts:

1. Find all frequent itemsets.

Let's use Fk to denote set of frequent itemsets of size k, and Ck to denote set of candidate itemsets of size k. By counting the support of every single item in those transactions, we could get F1 as follows:

F1  Itemset Support {A} {C} {D} {T} {W} 4 6 4 4 5

According to the candidate generation process, we could generate C2 from F1. After the join step, C2 is {{A C}, {A D}, {A T}, {A W}, {C D}, {C T}, {C W}, {D T}, {D W}, {T W}}. Since all 1-subsets of the itemsets in C2 are contained in F1, the prune step deletes nothing. By counting the supports of the itemsets in C2, we get F2.

F2  Itemset Support {A C} {A T} {A W} {C D} {C T} {C W} {D W} {T W} 4 3 4 4 4 5 3 3

Similarly, C3 could be generated from F2. After the join step, C3 is {{A C T}, {A C W}, {A T W}, {C D T}, {C D W}, {C T W}}. The prune step deletes {C D T}, since {D T} is not in F2. Then we could get F3 by counting the supports.

F3  Itemset Support {A C T} {A C W} {A T W} {C D W} {C T W} 3 4 3 3 3

After the join step, C4 only contains one itemset {A C T W}, and it is left after the prune step.

F4  Itemset Support {A C T W} 3

C5 is empty, so the whole set of frequent itemsets is the union of F1, F2, F3 and F4.

2.  Generate association rules by using frequent itemsets.
There is a straightforward approach for this task. For every frequent itemset l whose size is greater than 1, find all non-empty subsets of l. For every such subset a, output a rule of the form a ->(l  a) if the ratio of support(l) to support(a) is greater than minimum confidence.

In our example, the frequent itemsets whose sizes are greater than 1 are the itemsets in F2, F3 and F4, i.e., {A C}, {A T}, {A W}, {C D}, {C T}, {C W}, {D W}, {T W}, {A C T}, {A C W}, {A T W}, {C D W}, {C T W}, {A C T W}.

Let's start with the first one. For {A C}, the two possible rules are {A} ->{C} and {C} -> {A}. Because
support({A C}) / support({A}) = 4 / 4 = 100%,
support({A C}) / support({C}) = 4 / 6 = 66.7%,
{A} ->{C} is one desired rule.

Similarly, for {A C T}, the possible rules are {A C} -> {T}, {A T} ->{C}, {C T} -> {A}, {A} ->{C T}, {C} -> {A T} and {T} ->{A C}. By computing their confidences respectively,
support({A C T}) / support({A C}) = 3 / 4,
support({A C T}) / support({A T}) = 3 / 3 = 100%,
support({A C T}) / support({C T}) = 3 / 4,
support({A C T}) / support({A}) = 3 / 4,
support({A C T}) / support({C}) = 3 / 6,
support({A C T}) / support({T}) = 3 / 4,
only {A T} ->{C} is a desired rule.

After going through all the frequent itemsets, we could get all rules satisfying the minimum support and minimum confidence. They are listed below.

1. {A} ->{C},
2. {A} -> {W},
3. {D} ->{C},
4. {T} -> {C},
5. {W} ->{C},
6. {A} -> {C W},
7. {A T} ->{C},
8. {A T} -> {W},
9. {A C} ->{W},
10. {A W} -> {C},
11. {D W} ->{C},
12. {T W} -> {A},
13. {T W} ->{C},
14. {A T} -> {C W},
15. {T W} ->{A C},
16. {A C T} -> {W},
17. {A T W} ->{C},
18. {C T W} -> {A}.