CS 4341 C02 - Project 4

Computer Science Department

CS4341 Introduction to Artificial Intelligence
SOLUTIONS Project 4 - C 2002

By Wendy Kogel, Greg Milette, Yakov Kronrod, and Carolina Ruiz

Neural Networks
Sketch of the solution:
4)
- Modified Code: imagenet.c changed "glickman" to "sunglasses" and userid to eyes
- Max Classification Accuracy on training set: 100 %
- Number of Epochs to reach Max Accuracy: 19
- Testing Set:
7)
- Modified Code:
- Encode the output:
- Testing Set:
8)
11)
- Outputs: To recognize a pose I created an neural network with 4 output nodes that will classify for straight, up, left and right.
- Testing Set:
13)
Report Questions)

Neural Networks + Genetic Algorithms

The neural network contains 13 connections/links:

4*2 links connecting the 4 inputs to the 2 hidden nodes
plus 2*1 links connecting the 2 hidden nodes to the output node
plus 3 links, one for each of the hidden & output nodes,
representing the node's activation threshold.

Each weight can be any real number. To represent a real number using
just bits, we'll use the IEEE floating-point standard 754
(see e.g. A.S. Tanenbaum, Structured Computer Organization, 4th Edition,
Prentice Hall, 2000). This standard allows the representation of
real numbers using three precisions: single precision, double precision,
and extended precision. For simplicity, we employ here single precision,
which uses 32 (1 for the sign, 8 for the exponent, and 23 for the fraction)
bits to represent a real number:

Each individual will be represented as an array of 13 positions, one for each
connection in the net. Each position in the array will be a single precision
float as described above. Hence, each individual can be seeing as a bit-string
of lenght 13*32 = 416, where the first 32 bits represent the first weight,
the next 32 bits represent the second weight, etc..

Note: The important point here is that your representation of an
individual should accommodate the 13 weights, and that your representation
allows each weight to be any real number. Whether or not you use double
precision floats is not important.

We are given that the initial population consists of 100 individuals. The most important
thing to consider is how the individuals will represent the possible population. Primarily,
we want the individuals to be randomly chosen. One method to do this is to run through the
100 bit strings and assign each bit to a random boolean (randomly chosing 1 or 0). Also,
something to consider is repeated individuals. We would like 100 distinct individuals to
ensure that we have a variety of data and that we are not testing the neural net twice for an identical
bit string. Another aspect to
consider is the distribution of the individuals over the sample space of possible individuals.
A somewhat uniform distribution would guarantee that at least one of the individuals is 'close' to
the optimal one. However, this is not necessary since through crossovers and mutations a wide variety
of individuals would eventually be considered anyway.

The fitness of an individual must represent how close the output computed by the net is to
the desired output over all the training examples. The higher the fitness of an individual,
the better the corresponding neural net computes the target function.

An initial thought here is to use the inverse of the neural net's error function to represent the fitness
of an individual. If you remember, the error function represents how close a neural net is to computing
the correct target answer. If we run every training instance for a given individual and add up the
errors, then we have a total error. NOTE: High error is bad.
With fitness, high fitness is good. So, we divide 1 by the error and higher errors give lower fitness, and
similarly lower errors give higher fitness. There are some problems with this approach, however. If the error
is derived to be 0 (a perfect individual), dividing 1 by this would lead to a DBZ error.

The better approach is to use a function that looks as follows:
fitness = N - total error,

where (total error) is computed as the sum of errors made by the neural net for each training instance, and
N is the total number of training instances.
More precisely, given an individual I, let NNI be the
neural network obtained by assigning to each connection in the net the corresponding weight encoded
in the individual I. Hence

fitness of I = N - total error of NNI, where

total error of NNI = sum over all instances of |desired output - net's output|

(A slightly modified approach is to make total error the sum of the squares of each
error., i.e.

total error of NNI = 1/2 * sum over all instances of (desired output - net's output)^2
)

There is one underlying assumption (obvious in our case) which states that both the desired
and the actual output are values between 0 and 1.
This guarantees that
the sum over N instances will not give a number larger than N.

In this manner, the highest error is N, giving a fitness of 0 (the worst fitness).
The lowest error is 0 (perfect), which yields a fitness of N (the best fitness).

Mutation: We want to randomly flip one bit to mutate the individual. To improve efficiency, we could start by mutating the high order bits and then as the algorithm gets trained more and more start switching lower order bits. That way we start by changing the weights by a large number and then gradually zero in on the correct weights. Another important part of mutation is that we do not want to mutate the bit corresponding to the sign of the individual. A simple mutation should not cause such a drastic change in the individual. Also, we do not want to modify the higher order bits in the exponent of the individual. This would be even more extreme than changing the sign. For instance, changing a bit could move a value from
```
      3         131
 .423^  to .423^
```
by changing the bit corresponding to '128, or 2^(7) from 0 to 1. This is probably not a good idea.
Selection: The selection algorithm is quite simple. We do so by selecting 100 individuals from the current population to be included in the next generation. We first define a probability distribution over the current population (i.e. we assign a probability value to each individual in the current population in such a way that the sum of probabilities over all the individuals in the population is equal to 1). Then we do random selection with replacement using that probability distribution. We need to figure out a good way to assign probability distribution that corresponds to the chances of an individual to be selected. We choose here to use the standard selection method, but any other one described in Winston's book and covered in class (e.g. ranking method, diversity, etc.) would be fine. We already have the fitness functions that give us how well the individual performed on the training data. So, if we just divide this value by the sum of the fitness values of all the individuals in the current population, we have a probability between 0 and 1 for selection. This also directly guarantees that all the probabilities will add up to one. Then, we can simply use this probability distribution to make our selection choices.
Cross-over operators - To determine how to combine two individuals to get two new individuals. We will split the individuals at a random point, i, using single-point crossover. All the array positions up to i will come from the first individual, the rest of the array positions will come from the second individual. A second individual will be generated by doing the opposite, taking all the array positions up to i from the second individual and then taking the rest from the first individual. For our case, we must make sure that the crossover points lie only of multiples of 13 bits from the beginning of the bit string. This would ensure that these points only occur between individual weight representations and not in the middle of one. So the result of any crossover operation would result in an individual with part of their weights encoded from one parent and the rest of the NN weights from the other parent.

First, before starting the algorithm, you must decide on the termnination condition of the algorithm. There are several choices that you have here: a maximum number of iterations, a 'good-enough' individual, or something else like the first time you have k mutations and crossovers in a row that give no improvement to the performance of the NN. For the following pseudocode, let:

 a. Fitness_threshhold represent the fitness you want to achieve
 b. MAX represent the maximum number of iterations

    Generate the population of the initial 100 individuals P={I1, I2, ..., I100}
    Evaluate each I in the Population and compute Fitness(I)
    BestInd = The individual with highest fitness
    Iterations = 0;
    While ( Fitness(BestInd) < Fitness_threshold  &&  Iterations < MAX )
	 1. Select 50 pairs of individuals for crossover
	 2. For each pair produce two offspring and replace in P
	 3. Mutate some percent of the Population
	 5. Evaluate the fitness for each new member of P
       6. BestCurrent = the best fit individual from new P
       7. if( Fitness(BestCurrent) > Fitness(BestInd) )
          then BestInd = BestCurrent
       8. Iterations ++

The termination condition is determined by your criteria, most likely when the Fitness of the best individual reaches the threshhold.

The output of the algorithm will be the best individual produced, namely BestInd in the algorithm above.

Decision Trees
This is the resulting tree:

tear-prod-rate = reduced: none
tear-prod-rate = normal
|  astigmatism = no
|  |  age = young: soft
|  |  age = pre-presbyopic: soft
|  |  age = presbyopic
|  |  |  spectacle-prescrip = myope: none
|  |  |  spectacle-prescrip = hypermetrope: soft
|  astigmatism = yes
|  |  spectacle-prescrip = myope: hard
|  |  spectacle-prescrip = hypermetrope
|  |  |  age = young: hard
|  |  |  age = pre-presbyopic: none
|  |  |  age = presbyopic: none


Below is the procedure followed to construct the tree:


Looking for the root node:
Tuples: 24

Age: 			None			soft			hard
	
Young		8/24	-4/8 log2(4/8)		-2/8 log2(2/8)		-2/8 log2(2/8) = .5 
			0.5			0.5			0.5
Pre-presbyopic	8/24	-5/8 log2(5/8)		-2/8 log2(2/8)		-1/8 log2(1/8) = .43
			0.42			0.5			0.375
presbyopic	8/24	-6/8 log2(6/8)		-1/8 log2(1/8)		-1/8 log2(1/8) = .35
			0.31			0.375			0.375
= 1.28


spectacle-prescripion	none		Soft		Hard

hypermetrope	12/24	8/12 log2(8/12)	3/12 log2(3/12) 1/12 log2(1/12) = .59
			.39		.5		.3
myope		12/24	7/12 log2(7/12) 2/12 log2(2/12) 3/12 log2(3/12) = .69
 			.45		.43		.5
=1.28

astigmatism		none		Soft		Hard

no		12/24	7/12 log2(7/12)	5/12 log2(5/12) 0/12 log2(0/12) = .49
			.45		.53		0		
yes		12/24	8/12 log2(8/12) 0/12 log2(0/12) 4/12 log2(4/12) = .46
			.39		0		.53
=.95

tear-prod-rate		none		Soft		Hard

normal		12/24	3/12 log2(3/12)	5/12 log2(5/12) 4/12 log2(4/12) = .78
			.5		.53		.53
reduced		12/24	12/12 log2(12/12)0/12 log2(3/12) 0/12 log2(4/12)
			0
= .78

Use tear-prod-rate. 


Tear-prod-rate = reduced, tuples: 12 all none

Decide on next attribute for tear-prod-rate = normal
tuples left: 12

Age: 			None			soft			hard
	
Young		4/12	-0/4 log2(0/4)		-2/4 log2(2/4)		-2/4 log2(2/4) = .33 
			0			0.5			0.5
Pre-presbyopic	4/12	-1/4 log2(1/4)		-2/4 log2(2/4)		-1/4 log2(1/4) = .5
			0.5			0.5			0.5
presbyopic	4/12	-2/4 log2(2/4)		-1/4 log2(1/4)		-1/4 log2(1/4) = .5
			0.5			0.5			0.5
= 1.33


spectacle-prescripion	none		Soft		Hard

hypermetrope	6/12	2/6 log2(8/6)	3/6 log2(3/6) 	1/6 log2(1/6) = .73
			.53		.5		.43
myope		6/12	1/6 log2(1/6) 	2/6 log2(2/6) 	3/6 log2(3/6) = .73
 			.43		.52		.5
= 1.46

astigmatism		none		Soft		Hard

no		6/12	1/6 log2(1/6)	5/6 log2(5/6) 	0/6 log2(0/6) = .49
			.43		.22		0		
yes		6/12	2/6 log2(2/6) 	0/6 log2(0/6) 	4/6 log2(4/6) = .46
			.53		.39		.46
= .95

Use astigmatism


Asigmatism = no, tuples: 6

Age: 			None			soft			hard
	
Young		2/6	-0/2 log2(0/2)		-2/2 log2(2/2)		-0/2 log2(0/2) = 0 
			0			0			0
Pre-presbyopic	2/6	-0/2 log2(0/2)		-2/2 log2(2/2)		-0/2 log2(0/2) = 0
			0			0			0
presbyopic	2/6	-1/2 log2(1/2)		-1/2 log2(1/2)		-0/2 log2(0/2) = .33
			0.5			0.5			0
= .33


spectacle-prescripion	none		Soft		Hard

hypermetrope	3/6	0/3 log2(0/3)	3/3 log2(3/3) 	0/3 log2(0/3) = 0
			0		0		0
myope		3/6	1/3 log2(1/3) 	2/3 log2(2/3) 	0/3 log2(0/3) = .49
 			.53		.39		0
= .49

Use age. 


Age = young, tuples: 2 all soft
Age = pre-presbyopic, tuples: 2, all soft

Age = presbyopic, tuples 2: 1 none, 1 soft

Note that spectacle-prescription is the only attribute left so it needs
to be used to split this branch obtaining:

spectacle-prescription = hypermetrope, tuples: 1, soft
spectacle-prescription = myope, tuples: 1, none


Back tracking up to astigmatism = yes

Astigmatism = yes, tuples: 6

Age: 			None			soft			hard
	
Young		2/6	-0/2 log2(0/2)		-0/2 log2(0/2)		-2/2 log2(2/2) = 0 
			0			0			0
Pre-presbyopic	2/6	-1/2 log2(1/2)		-0/2 log2(0/2)		-1/2 log2(1/2) = .33
			0.5			0			0.5
presbyopic	2/6	-1/2 log2(1/2)		-0/2 log2(0/2)		-1/2 log2(1/2) = .33
			0.5			0			0.5
= .66


spectacle-prescripion	none		Soft		Hard

hypermetrope	3/6	2/3 log2(2/3)	0/3 log2(0/3) 	1/3 log2(1/3) = .46
			.38		0		.53
myope		3/6	0/3 log2(0/3) 	0/3 log2(0/3) 	3/3 log2(3/3) = 0
 			0		0		0
= .46

Use spectacle-prescription



spectacle-prescription = hypermetrope, tuples: 3

Note that age is the only attribute left so it needs
to be used to split this branch obtaining:

age = young, tuples: 1, hard
age = pre=presbyopic, tuples: 1, none
age = presbyopic, tuples: 1, none

spectacle-prescription = myope, tuples 3, contactlense = hard

	Test 1	Test 2
Max Accuracy:	100	98.0769
Epochs:	19-52	44

	Test 1	Test 2
Max Accuracy:	91.6667	92.3077
Epochs:	52	57

CS4341 Introduction to Artificial Intelligence SOLUTIONS Project 4 - C 2002

Looking for the root node:

CS4341 Introduction to Artificial Intelligence
SOLUTIONS Project 4 - C 2002