Problem 4

Consider the problem of mining association rules as presented in the papers "Mining Association rules between sets of items in large databases" (by R. Agrawal, T. Imilinski, and A. Swami), and "Fast Algorithms for Mining Association Rules" (by R. Agrawal, R. Srikant).

Follow the Apriori algorithm to obtain ALL the association rules with at least 50% support and 100% confidence that are derivable from the set of transactions below (example taken from M.J. Zaki's KDD lecture notes). Show the results obtained in each step of the algorithm.

A = Jane Austen
C = Agatha Christie
D = Sir Arthur Conan Doyle
T = Mark Twain
W = P.G. Wodehouse

Transaction Database:

Transaction ID   Items

     1           A   C       T   W
     2               C   D       W
     3           A   C       T   W
     4           A   C   D       W
     5           A   C   D   T   W
     6               C   D   T

Solution by Weiyang Lin:

The problem of discovering all association rules can be divided into two parts:

1. Find all frequent itemsets.

2.  Generate association rules by using frequent itemsets. After going through all the frequent itemsets, we could get all rules satisfying the minimum support and minimum confidence. They are listed below.

  1. {A} ->{C},
  2. {A} -> {W},
  3. {D} ->{C},
  4. {T} -> {C},
  5. {W} ->{C},
  6. {A} -> {C W},
  7. {A T} ->{C},
  8. {A T} -> {W},
  9. {A C} ->{W},
  10. {A W} -> {C},
  11. {D W} ->{C},
  12. {T W} -> {A},
  13. {T W} ->{C},
  14. {A T} -> {C W},
  15. {T W} ->{A C},
  16. {A C T} -> {W},
  17. {A T W} ->{C},
  18. {C T W} -> {A}.