Follow the Apriori algorithm to obtain ALL the association rules with at least 50% support and 100% confidence that are derivable from the set of transactions below (example taken from M.J. Zaki's KDD lecture notes). Show the results obtained in each step of the algorithm.
Items
A = Jane Austen
C = Agatha Christie
D = Sir Arthur Conan Doyle
T = Mark Twain
W = P.G. Wodehouse
Transaction Database:
Transaction ID   Items
     1           A   C       T   W
     2               C   D       W
     3           A   C       T   W
     4           A   C   D       W
     5           A   C   D   T   W
     6               C   D   T
The problem of discovering all association rules can be divided into two parts:
1. Find all frequent itemsets.
F1
| Itemset | Support | 
| {C} {D} {T} {W} | 6 4 4 5 | 
According to the candidate generation process, we could generate C2 from F1. After the join step, C2 is {{A C}, {A D}, {A T}, {A W}, {C D}, {C T}, {C W}, {D T}, {D W}, {T W}}. Since all 1-subsets of the itemsets in C2 are contained in F1, the prune step deletes nothing. By counting the supports of the itemsets in C2, we get F2.
F2
| Itemset | Support | 
| {A T} {A W} {C D} {C T} {C W} {D W} {T W} | 3 4 4 4 5 3 3 | 
Similarly, C3 could be generated from F2. After the join step, C3 is {{A C T}, {A C W}, {A T W}, {C D T}, {C D W}, {C T W}}. The prune step deletes {C D T}, since {D T} is not in F2. Then we could get F3 by counting the supports.
F3
|  |  | 
| {A C W} {A T W} {C D W} {C T W} | 4 3 3 3 | 
After the join step, C4 only contains one itemset {A C T W}, and it is left after the prune step.
F4
|  |  | 
|  |  | 
C5 is empty, so the whole set of frequent itemsets is the union
of F1, F2, F3 and F4.
 
In our example, the frequent itemsets whose sizes are greater than 1 are the itemsets in F2, F3 and F4, i.e., {A C}, {A T}, {A W}, {C D}, {C T}, {C W}, {D W}, {T W}, {A C T}, {A C W}, {A T W}, {C D W}, {C T W}, {A C T W}.
Let's start with the first one. For {A C}, the two possible rules are
{A} ->{C} and {C} ->
{A}. Because
    support({A C}) / support({A}) = 4 / 4 = 100%,
    support({A C}) / support({C}) = 4 / 6 = 66.7%,
{A} ->{C} is one desired rule.
Similarly, for {A C T}, the possible rules are {A C} ->
{T}, {A T} ->{C}, {C T} ->
{A}, {A} ->{C T}, {C} ->
{A T} and {T} ->{A C}. By computing their
confidences respectively,
    support({A C T}) / support({A C}) = 3 / 4,
    support({A C T}) / support({A T}) = 3 / 3 = 100%,
    support({A C T}) / support({C T}) = 3 / 4,
    support({A C T}) / support({A}) = 3 / 4,
    support({A C T}) / support({C}) = 3 / 6,
    support({A C T}) / support({T}) = 3 / 4,
only {A T} ->{C} is a desired rule.