CS 539 Spring 2003

Computer Science Department

CS539 Machine Learning - Spring 2003
Project 3 - Neural Networks

PROF. CAROLINA RUIZ

Due Date:

Part 1: Monday, Feb. 03 2003 at 8 am.
Part 2: Monday, Feb. 10 2003 at 8 am.

Project Description
Project Assignment
Report Submission and Due Date

PROJECT DESCRIPTION

Part 1 Construct the most accurate neural network you can for predicting the class attribute of each of the following datasets (available with the Weka System):

CPU dataset
Iris dataset

Part 2 Construct the most accurate neural network you can for predicting the class attribute (CARAVAN Number of mobile home policies) in the The Insurance Company Benchmark (COIL 2000) dataset.

PROJECT ASSIGNMENT

Read Chapter 4 of the textbook about neural networks in great detail.
Read neural networks code in the Weka system in great detail.
The following are guidelines for the construction of your neural networks:
- Code: Use the neural networks methods implemented in the Weka system.
- Topology of your Neural Net: Use a 2-layer, feedforward architecture. More specifically, your net should consist of (1 input layer,) 1 hidden layer, and 1 output layer. Each nodes in a layer should be connected to each and everyone of the nodes in the next layer, and no nodes on the same layer should be connected.
  In the case of non-numeric target attributes, decide on a convention that you'll use to match output nodes values and target attribute values.
- Training and Testing Instances: For the The Insurance Company Benchmark, use the ticdata2000.txt data for training and the ticeval2000.txt data for testing. You may restrict your experiments to a subset of the instances IF Weka cannot handle your whole dataset (this is unlikely). But remember that the more accurate your neural network is, the better.
- Preprocessing of the Data: A main part of this project is the preprocessing of your dataset. You should apply relevant filters to your dataset before doing the mining and/or using the results of previous mining tasks. For instance, you may decide to remove apparently irrelevant attributes, replace missing values if any, discretize attributes in a different way, etc. Your report should contained a detailed description of the preprocessing of your dataset and justifications of the steps you followed. If Weka does not provide the functionality you need to preprocess your data as you need to obtain useful patterns, preprocess the data yourself either by writing the necessary filters (you can incorporate them in Weka if you wish).
- Evaluation and Testing: Experiment with different testing methods:
  1. Supply separate training (ticdata2000.txt) and testing (ticeval2000.txt) data to Weka.
  2. Supply training (ticdata2000.txt or ticdata2000.txt + ticeval2000.txt) data to Weka and experiment with several split ratios.
  3. Supply training (ticdata2000.txt or ticdata2000.txt + ticeval2000.txt) data to Weka and
  4. Use n-fold crossvalidation to test your results Experiment with different values for the number of folds.

REPORT AND DUE DATE

Written Report.
Your report should contain the following sections with the corresponding discussions:
1. Code Description: Describe the neural networks code that you used from Weka. Explain the algorithm underlying the code in terms of the input it receives and the output it produces, and the main steps it follows to produce this output.
2. Data: Describe the dataset that you selected in terms of the attributes present in the data, the number of instances, missing values, and other relevant characteristics.
  Provide a detail description of the preprocessing of your data. Justify the preprocessing you apply and why the resulting data is the appropriate one for mining neural networks from it.
3. Experiments: For each experiment you ran describe:
  - Data: What data did you use to construct and test your neural networks?
  - Any additional pre or post processing done to the data or the NN output in order to improve the accuracy of your net.
  - Accuracy of the resulting neural networks.
  - Discuss how this accuracy compares with that of your most accurate ZeroR experiment and decision trees from the previous assignments.
4. Summary of Results
  - For each dataset, what was the accuracy of the most accurate neural network constructed in your project?
  - strengths and the weaknesses of your project.
Oral Report. We will discuss the results from the individual projects during the class on February 03. Your oral report should summarize the different sections of your written report as described above. Each of you will have 5 minutes to explain your results and to discuss your project in class. Be prepared!
Submission and Due Date.
Please submit the following files by email to ruiz@cs.wpi.edu by the deadline specified below. Submissions received on Mondays, between 8:01 am and 10:00 am will be penalized with 30% off the grade and submissions after 10:00 am won't be accepted.
1. [your-lastname]_proj3_slides_part1.[ext] containing your slides for your oral report of Part 1. This file should be either a PDF file (ext=pdf) or a PowerPoint file (ext=ppt). Please use only lower case letters in the name file. For instance my file would be named ruiz_proj3_slides_part1.ppt
  Deadline for submission: 8:00 am on Monday, February 03 2003.
2. [your-lastname]_proj3_slides_part2.[ext] containing your slides for your oral report of Part 2. This file should be either a PDF file (ext=pdf) or a PowerPoint file (ext=ppt).
  Deadline for submission: 8:00 am on Monday, February 10 2003.
3. [your-lastname]_proj3_report.pdf containing your written report for Parts 1 and 2 in PDF.
  Deadline for submission: 8:00 am on Monday, February 10 2003.

CS539 Machine Learning - Spring 2003 Project 3 - Neural Networks

PROF. CAROLINA RUIZ

PROJECT DESCRIPTION

PROJECT ASSIGNMENT

REPORT AND DUE DATE

CS539 Machine Learning - Spring 2003
Project 3 - Neural Networks