### CS534 Artificial Intelligence Homework 6 - Cisco Spring 2013

#### PROF. CAROLINA RUIZ

Due Date: May 28, 2013 at the beginning of class

• Instructions:
• Read Sections 18.1-18.4, 18.7, 18.9, 20.2 of the textbook.
• See multiple examples of Decision Trees, Naive Bayes, and Bayesian Net problems in the solutions to homework and exam problems on the webpages of my offerings of CS4341 (ugrad AI).
• This is an individual homework. Please state in your homework submission any sources you used in constructing your solutions, as well as whom you discussed these homework problems with.

• Problems: Turn in written solutions to each of the problems below at the beginning of class when the homework is due.

1. Problem 1: Decision Trees. (35 points)

Consider the following subset of the Contact Lenses Dataset:

```ATTRIBUTES:	POSSIBLE VALUES:
age             {young,pre-presbyopic,presbyopic}
astigmatism     {no,yes}
tear-prod-rate  {reduced,normal}
contact-lenses  {soft,hard,none} <- classification target
```
 age astigmatism tear-prod-rate contact-lenses young no normal soft young yes reduced none young yes normal hard pre-presbyopic no reduced none pre-presbyopic no normal soft pre-presbyopic yes normal hard pre-presbyopic yes normal none pre-presbyopic yes normal none presbyopic no reduced none presbyopic no normal none presbyopic yes reduced none presbyopic yes normal hard

• (30 points) Construct the full decision tree for the above Contact Lenses dataset where contact-lenses is the (classification) target attribute. Show all the steps of the calculations and of the construction of your decision tree. For your convenience, the logarithm in base 2 of selected values are provided.

 x 1/2 1/3 1/4 3/4 1/5 2/5 3/5 1/6 5/6 1/7 2/7 3/7 4/7 1 log2(x) -1 -1.5 -2 -0.4 -2.3 -1.3 -0.7 -2.5 -0.2 -2.8 -1.8 -1.2 -0.8 0

• (5 points) Use the decision tree you constructed to predict the classification (contact-lenses) value of the following test instances. Compute the accuracy of the model on these test data instances. Show your work.
```young,      no,  reduced, none     your decision tree predicts: ______
pre-pre,    yes, reduced, none     your decision tree predicts: ______
presbyopic, no,  normal,  soft     your decision tree predicts: ______
presbyopic, yes, normal,  hard     your decision tree predicts: ______

```

2. Problem 2: Naive Bayes Models. (35 points)

Consider the full Contact Lenses Dataset (24 instances):

```ATTRIBUTES:	         POSSIBLE VALUES:
age                      {young,pre-presbyopic,presbyopic}
spectacle-prescription:  {myope,hypermetrope}
astigmatism              {no,yes}
tear-prod-rate           {reduced,normal}
contact-lenses           {soft,hard,none} <- classification target
```
 age spectacle-prescription astigmatism tear-prod-rate contact-lenses young myope no reduced none young myope no normal soft young myope yes reduced none young myope yes normal hard young hypermetrope no reduced none young hypermetrope no normal soft young hypermetrope yes reduced none young hypermetrope yes normal hard pre-presbyopic myope no reduced none pre-presbyopic myope no normal soft pre-presbyopic myope yes reduced none pre-presbyopic myope yes normal hard pre-presbyopic hypermetrope no reduced none pre-presbyopic hypermetrope no normal soft pre-presbyopic hypermetrope yes reduced none pre-presbyopic hypermetrope yes normal none presbyopic myope no reduced none presbyopic myope no normal none presbyopic myope yes reduced none presbyopic myope yes normal hard presbyopic hypermetrope no reduced none presbyopic hypermetrope no normal soft presbyopic hypermetrope yes reduced none presbyopic hypermetrope yes normal none

• (30 points) Construct the Naive Bayes classifier over this full dataset. That is, compute all P(A=a | contact-lenses=v) for all predicting attributes A, all attribute A's values a, and all contact-lenses values v. Show all the steps of the calculations.

Important:Remember to add 1 to all the counts to avoid the problem of having a probability that is equal to 0. For example, note that the number of instances that have astigmatism=yes among the instances that have contact-lenses=soft is equal to 0. Adding 1 to all the counts means that this count [i.e., count (astigmatism=yes | contact-lenses=soft) ] will become 1. Similarly, count(astigmatism=no | contact-lenses=soft) will be 5 + 1 = 6.

In other words, you need to construct all the conditional probability tables for the Naive Bayes net below based on the 24 data instances:

• (5 points) Use the Naive Bayes model you constructed to predict the classification (contact-lenses) value of the following test instance. Show your work.
```AGE        SPECTACLE-PRESC ASTIGMATISM TEAR-PROD-RATE
presbyopic hypermetrope    yes         normal            your Naive Bayes model predicts: ______

```

3. Problem 3: Bayesian Net Models. (35 points)

Consider the full Contact Lenses Dataset above (24 instances).

• (30 points) Construct all the conditional probability tables for the Bayesian net below based on the 24 data instances:

• (5 points) Use the Bayesian model you constructed to predict the classification (contact-lenses) value of the following test instance. Show your work.
```ASTIGMATISM	TEAR-PROD-RATE
yes  		normal   	         	your Bayes net predicts: ______

```

4. Optional Problem. (20 points)

1. Download the Weka System. Weka is a machine-leaning/data-mining environment. It provides a large collection of Java-based mining algorithms, data preprocessing filters, and experimentation capabilities. Weka is open source software issued under the GNU General Public License. For more information on the Weka system, to download the system and to get its documentation, look at Weka's webpage (http://www.cs.waikato.ac.nz/ml/weka/). You should download and use the latest Developer Version (currently weka-3-7-9) of the system.

Dataset. Use the cfs_spambase.arff dataset. This dataset is a reduced version (in Weka's "arff" input format) of the spambase dataset available at the Univ. of California Irvine (UCI) Machine Learning Repository. Read the description of this dataset available at the UCI's spambase webpage. [Note: I generated the reduced dataset by using Weka's Correlation-based Feature Selection (CFS) to select 15 attributes from the dataset's 58 original attributes.]

2. Experiments. Input the above spambase_subset.arff dataset in Weka. For each of the following machine learning methods,
• run the machine learning method in Weka over the given dataset. Use 10 fold-cross validation.
• describe in your report any interesting observations you can make about the results: accuracy of the resulting model, confusion matrix, topology of the model, relationships among attributes in the model, time taken to construct the model, ...
• describe in your report any interesting comparisons you can make among the results from the different machine learning methods used.

1. Naive Bayes. Available from the "Classify" tab, "Choose" bayes -> NaiveBayes.

2. Bayesian Networks. Available from the "Classify" tab, "Choose" bayes -> BayesNet.
• Use default parameters first.
• Then, change parameters as follows: Right-click on "K2 -P 1 -S BAYES" and select "Show properties". Change "initAsNaiveBayes" to False and "maxNrOfParents" to 3.
In each of the above cases, after Weka has constructed the model, right-click on the model under "Result list" (bottom left section of the Weka Explorer window), and select "Visualize Graph" to see the Bayesian Net constructed. Click on any node in the graph to see the conditional probability table associated with that node.

3. Artificial Neural Networks (ANN). Available from the "Classify" tab, "Choose" functions -> MultilayerPerceptron.
Right-click on "MultilayerPerceptron -L0.3 -M 0.2 ..." and select "Show properties". Change "GUI" to True.
• Observe the ANN topology and the results obtained.
• Run a second experiment, changing the number of nodes in the hidden layer to 4. For this right-click on "MultilayerPerceptron -L0.3 -M 0.2 ..." and select "Show properties". Change "hiddenLayers" from "a" to 4.

4. Decision Trees. Available from the "Classify" tab, "Choose" trees -> J48.