SOLUTIONS - Exam 2 CS 4341 B03

CS4341 Introduction to Artificial Intelligence. B-2003
SOLUTIONS - Exam 2 - December 18, 2003

Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute

Problem I. Decision Trees (25 points)

Consider the following dataset that specifies the type of contact lenses that is prescribed to a patient based on the patient's age, astigmatism, and tear production rate. The purpose of this problem is to use information theory, more precisely entropy (or disorder), to construct a decision tree that predicts the type of contact lenses that will be prescribed to a patient based on the patient's attributes.

ID	AGE	ASTIGMATISM	TEAR-PRODUCTION-RATE	CONTACT-LENSES
1.	young	no	normal	soft
2.	young	yes	reduced	none
3.	young	yes	normal	hard
4.	pre-presbyopic	no	reduced	none
5.	pre-presbyopic	no	normal	soft
6.	pre-presbyopic	yes	normal	hard
7.	pre-presbyopic	yes	normal	none
8.	pre-presbyopic	yes	normal	none
9.	presbyopic	no	reduced	none
10.	presbyopic	no	normal	none
11.	presbyopic	yes	reduced	none
12.	presbyopic	yes	normal	hard

The partial decision tree below shows part of the decision tree constructed by using the entropy (or disorder) based algorithm.


                                  ASTIGMATISM
                               /               \
                         yes  /                 \ no
                             /                   \
                TEAR-PROD-RATE                  ???
	           /      \
          normal  /        \ reduced
                 /          \
               AGE          none
              / | \
       young /PP|  \ P
            /   |   \
         hard  none  hard

In this problem, you are asked to finish the construction of this decision tree following the entropy-based algorithm. In other words, you are asked to use entropy to select the attributes that will appear in the right subtree marked with a ??? above.

(3 points) List the ID numbers of the data instances (i.e. rows in the dataset) that lie in the tree branch marked with a ??? above. Note that those data instances contain ASTIGMATISM=no
```
       
        List of ID numbers: ___ 1, 4, 5, 9, 10 ______________________
       
       
```

(15 points) Use the entropy formula to rank the attributes that can be used in the subtree marked with a ??? to predict the target/classification attribute (contact lenses). Show all the steps of the calculations. For your convenience, the logarithm in base 2 of selected values are provided.

x	1	1/2	1/3	2/3	1/4	3/4	1/5
log2(x)	0	-1	-1.6	-0.6	-2	-0.4	-2.3

For the branch astigmatism=no we measure the entropy of the remaining two predictive
attributes:
age and tear-prod-rate

AGE:
   contact lenses: 	hard		none			soft  
young       	(1/5)*[	0 		- 0                     - 1*log2(1) ]        	= 0
pre-presb	(2/5)*[	0 		- (1/2)*log2(1/2)	- (1/2)*log2(1/2) ]	= 0.4
presb       	(2/5)*[	0 		- (2/2)*log2(2/2)	- 0 ] 			= 0
										-------------
											= 0.4

TEAR-PROD-RATE
   contact lenses: 	hard		none			soft  
normal		(3/5)*[	0		- (1/3)*log2(1/3)	- (2/3)*log2(2/3)]	= 0.56
reduced		(2/5)*[	0		- (2/2)*log2(2/2) 	-0 ]	= 0
										------------
											= 0.56

Since the gain of the attribute age has a smaller entropy, we add this attribute to
the tree:

	              ASTIGMATISM
                      /         \
                yes  /           \ no
                    /             \
	       TP-rate            AGE
	      /      \            | \ \
     normal  /   red. \      young| P| \PP
            /          \          |  |  \
          AGE          none              \
         / | \
  young /PP|  \ P
       /   |   \
    hard  none  hard

(7 points) Remember to continue constructing the subtree until all the branches end in nodes with a homogeneous classification of the attribute contact lenses. That is, each branch ends in a node that contains dataset instances (rows) that have exactly the same value for contact lenses.

For the branch(astigmatism=no and age =young) all examples have class soft and for
this branch the tree will have leaf soft.

For the branch(astigmatism=no and age =presb) all examples have class none and for
this branch the tree will have leaf none.

The Branch (astigmatism=no and age =pre-presb) doesn't give a single class so we have to 
add the last available attribute TP-rate to this branch. So we will obtain the final tree:
	                            
                      ASTIGMATISM
                      /         \
                yes  /           \ no
                    /             \
	       TP-rate            AGE
	      /      \            | \ \
     normal  /   red. \      young| P| \PP
            /          \          |  |  \
          AGE        none     soft none  \
         / | \                           TP-rate
  young /PP|  \ P                        /      \
       /   |   \               reduced  /        \ normal
    hard  none  hard                   /          \
	                            none         soft

Problem II. Artificial Neural Networks + Genetic Algorithms (25 points)

Consider the following approach to training a neural net that uses genetic algorithms instead of the error backpropagation algorithm. Assume that we are training a 2-layer, feedforward neural net with 4 inputs, 1 hidden layer with 2 hidden nodes, and one output node. The output node produces an output that is a real number between 0 and 1. Also, assume that we have a collection of n training examples to train the net, each one with a desired output, which is also a real number between 0 and 1.

Individuals in the population: Each individual in the population encodes a tuple of 13 real numbers, each one corresponding to the weight of a connection in the neural net. Note that the neural network contains 13 connections/links:
- 4*2 links connecting the 4 inputs to the 2 hidden nodes
- plus 2*1 links connecting the 2 hidden nodes to the output node
- plus 3 links, one for each of the hidden & output nodes, representing the node's activation threshold.
An example of such an individual is:
[w1=1.2, w2=-321.5, ..., w12=0.277, w13=-10.2]

The fitness of an individual must represent how close the output computed by the corresponding net is to the desired output over all the training examples. The higher the fitness of an individual, the better the corresponding neural net computes the target function. The net corresponding to an individual is the neural net in which the 13 weights of the connections are as specified by the values in the individual.

(6 points) Propose a suitable fitness function. That is, describe in detail a function f that receives as input an individual (i.e. a tuple of 13 real numbers) [w1, w2, ..., w12, w13] and outputs a real value that represents the fitness ("goodness") of the individual. Explain in detail why your proposed function is an appropriate fitness function.

  
A possible such  fitness function is 

  f( [w1,  w2,  ..., w12, w13] ) = n - total error 

where (total error) is computed as the sum of errors made by the neural net for 
each training instance, and n is the total number of training instances. 
More precisely, given an individual I, let NNI be the neural network obtained by 
assigning to each connection in the net the corresponding weight encoded in the 
individual I. Hence

 fitness of I = n - total error of NNI, where

 total error of NNI = sum over all training instances of |desired output - net's output|

(A slightly modified approach is to make total error the sum of the squares of each
error, i.e.

 total error of NNI = 1/2 * sum over all instances of (desired output - net's output)^2
)

Since both the desired and the actual outputs are values between 0 and 1,
the sum over n instances will not give a number larger than n.  

In this manner, the highest error is n, giving a fitness of 0 (the worst fitness).
The lowest error is 0 (perfect), which yields a fitness of n (the best fitness).

Describe in detail the genetic algorithm (or sequence of steps) that you would employ to search for the right values for the connections' weights of the neural net using this evolutionary approach.

(2 points) Initial Population. Suppose that the population size is 100 individuals. How would you construct the initial population? Explain

My initial population consists of 100 randomly generated individuals, each one
being a tuple consisting of 13 randomly generated real numbers.

(5 points) Selection. Explain the procedure you would follow for the selection step.

The selection algorithm selects 100 individuals from the current
population to be included in the next generation. We first define a
probability distribution over the current population (i.e. we assign a
probability value to each individual in the current population in such
a way that the sum of probabilities over all the individuals in the
population is equal to 1). Then we do random selection with replacement
using that probability distribution. We need to figure out a good way
to assign probability distribution that corresponds to the chances of
an individual to be selected. We choose here to use the standard
selection method, but any other one described in Winston's book and
covered in class (e.g. ranking method, diversity, etc.) would be fine.
We already have the fitness functions that give us how well the
individual performed on the training data. So, if we just divide this
value by the sum of the fitness values of all the individuals in the
current population, we have a probability between 0 and 1 for
selection. This also directly guarantees that all the probabilities
will add up to one. Then, we can simply use this probability
distribution to make our selection choices.

(2 points) Cross-over. Explain the procedure you would follow for the cross-over step.

 
We'll use a single cross-over points will be restricted to occur only
in between two weights but not inside one weight.  To determine how to
combine two individuals to get two new individuals, we will split the
individuals at a random point, i, using single-point crossover.  All
the array (tuple) positions up to i will come from the first individual, 
the rest of the array positions will come from the second individual.
A second offspring will be generated by doing the opposite, taking all
the array positions up to i from the second individual and then taking
the rest from the first individual. So the result of any crossover
operation would result in an individual w ith part of their weights
encoded from one parent and the rest of the ANN weights from the other
parent.

(2 points) Mutations. Explain the procedure you would follow for the mutation step.

    
Mutations will be done as follows: one out of the 13 weights in an
individual will be selected at random, and with a fixed probability,
the selected weight will be replaced by a randomly generated real
number.

(5 points) Termination condition(s). What are the termination conditions of your evolutionary algorithm?

 
There are several possible termination conditions: 
(1) a maximum number of iterations, 
(2) a "good-enough" individual (in this case, a threshold parameter
    to determine "good-enough fitness" will be needed), or 
(3) a more complex condition like the first time you have k mutations 
    and crossovers in a row that give no improvement to the performance 
    of the NN.

(6 points) Overall procedure. Describe your genetic algorithm that combines selection, cross-over, and mutation to perform the search for appropriate weights for the connections in the neural net.

 For the following pseudocode, let:  
 (1) MAX represent the maximum number of iterations
 (2) Fitness_threshold represent the fitness you want to achieve 

    Generate the population of the initial 100 individuals P={I1, I2, ..., I100}
    Evaluate each I in the Population by computing Fitness(I)
    BestInd = The individual with highest fitness
    Iterations = 0;
    While ( Fitness(BestInd) < Fitness_threshold  &&  Iterations < MAX )
       1. Select 100 individuals at random with replacement following the
          probability distribution derived from the fitness function
       2. Select at random 50 pairs of individuals for crossover
       3. For each pair produce two offspring and replace in the
          parents with the offspring in Population
       4. Mutate some percent of the Population
       5. Evaluate the fitness for each new member of P
       6. BestCurrent = the best fit individual from new P
       7. if( Fitness(BestCurrent) > Fitness(BestInd) )
          then BestInd = BestCurrent
       8. Iterations ++

(2 points) Output of the genetic algorithm. Explain what it is returned as the output of this search using your genetic algorithm.
```
 The best individual produced, namely  BestInd in the algorithm above. 
```

Problem III. Machine Vision (25 points)

Consider the problem of automatically understanding which objects are present in a 3 dimensional scene from (a subset of) its 2 dimensional digital images. A 2 dimensional digital image is just a matrix of pixels. For each pixel, the matrix contains the pixel's grayscale value.

For each of the three main stages of the process, describe the stage in terms of its input/output behavior. That is, specify what the inputs and what the outputs of each of the following stages are:

(9 points) Segmentation.

(4 points) Inputs:
```
 
  Matrix of pixels
 



 
```

(5 points) Outputs:

  
  groups (or regions) of apparently related pixels 
 in the input matrix.

Brief Description (NOT REQUIRED)
Segmentation is the grouping of the pixels in the matrix into
meaningful units.  When segmenting a matrix of gray-scale values (most
of the time) the edges of these groupings would reflect the possible
boundaries of objects.  There are two common methods (although many
more exist):

1. Threshold:
   under the threshold method, the range of possible values would be
   split up into n equally sized slots. Each pixel would belong to the
   group to which its value corresponds.  After processing the entire
   matrix, the groups are all the members of the slots.  This method is
   sometimes bad because it can lead to pixels completely unrelated in
   position to be grouped just on the basis of color.

2. Assimilation:
   this is a similar strategy, but rather than belonging to a global
   group, each pixel only belongs to the pixels around it that are
   within a certain threshold function away from it. In this manner,
   similar groups are formed with similar pixels, but they are
   spatially isolated from similarly colored groups, leading to a
   better representation of the scene.

(9 points) Determination of shape.

(4 points) Inputs:

 
  groups (or regions) of apparently related pixels.
      These groups of pixels will be called "objects" from now on.

(5 points) Outputs:

 
  possible shape(s) of the input objects. These shapes are inferred 
      from shading, texture, and motion information.

Brief Description (NOT REQUIRED)
The shape of an object can be inferred from shading, texture, and motion. 

shading:
   -shadows provide an understanding of the curvature of an object
   -sudden shading changes represent corners or boundaries
   -uniform shading in a lit scene corresponds to flat surfaces

texture:
   -texture provides a perspective on the object, providing depth
   -a good example is the dimples on a golf ball

motion:
   -by monitoring progress as motion occurs, certain points can be 
    traced and a shape model developed
   
  also:
   -as an object moves, the change in the shading on the surface provides 
    information about the shape as well

(9 points) Object Identification/Recognition (for instance, determining that an unknown object in the image is say an obelisk).

(4 points) Inputs:

 
  shape(s) of each unknown object in the image
 
 database of known objects (usually called "models"). 
      For each known objects, several templates of the object may be 
      needed to deal with rotations and/or translations.

(5 points) Outputs:

 
  best match(es) of the unknown object against the known objects
 in the dataset.

Brief Description (NOT REQUIRED)
The identification of the unknown object is done by matching the object
against templates in a database.  This matching process require that
one identifies corresponding points points in the unknown object and in
the templates.  When only rotations around the y-axis are allowed, the
number of templates needed is two and the the number of corresponding
points needed is two.  Notice that for each point we do not need a z
value since in 2-D it is undetectable.  Also we do not need the y
coordinate since it always stays the same.  So, we only need to derive
a function that predicts the interpolation of the x-coordinate between
the two templates we are given.  The function is of the form x_need =
Alpha * x1 + Beta * x2.  So, we need the two points so that we can
solve a system of two equations in two unknowns.  The matching is done
by finding the alpha and beta values for two points and then use those
values of Alpha and Beta to predict the x coordinate of other control
points in the unknown object based on the x coordinates of the control
points in the templates.  Given a small threshold for error (or
mismatch), this procedure will predict which template in the database
the object fits, if any at all.

Problem IV. Natural Language Processing (25 points)

For each of the following stages of the natural language understanding process, describe the inputs and the outputs of the stage. The more complete your lists of inputs and outputs, the higher your grade in this problem.

(7 points) Speech Recognition

(4 points) Inputs:

         
          raw speech signal, and
         
 phonemes in the language. Also possibly information about 
transitional probabilities between phonemes (i.e. information 
on which phoneme is more likely to follow a sequence of one or 
more phonemes),
         
 "Dictionary" of words containing the sequence(s) of phonemes 
that form the work.

(3 points) Outputs:

         
          sentence (i.e. sequence of words found in the speech signal)

Brief Description (NOT REQUIRED) During this stage, the analog signal is analyzed and split into different frequencies (or harmonics) using filters like for instance the Fourier transformation. These frequencies are used to identify basic natural language sounds, called phonemes, that appear in the signal. Identifying these phonemes may be done using a trained neural network or matching the sound (piece of input signal) against a library of known phonemes. After the phonemes are identified then complete words are constructed from the phonemes.

Some difficulties present during this stage (just to mention a few are):

There is no simple mapping between sounds and words. Statistical information on words and phonemes is needed to help disambiguate.
There is a great variance in pronunciation from person to person due to gender, dialect spoken by the person, accent, etc.
Even the same person may pronounce words differently depending on the context.
Same sound may correspond to different words.
It's hard to split the continuous signal into words due to the fact that people often phonetically connect contiguous words.
background noise may interfere.

(7 points) Syntactic Analysis

(4 points) Inputs:

       
        sentence (i.e. a sequence of words), and 
       
 grammar (i.e syntactic rules) of the natural language

(3 points) Outputs:

       
        One or more possible parse trees (i.e. grammar structures) 
for the sentence

Brief Description (NOT REQUIRED)
During this stage, the grammar of the natural language is used to
determine the structure of the sentence.  A grammar is a collection of
rules of syntax of the language.  If the sentence is syntactically
ambiguous, this stage may output more than one possible syntactic
structure for the sentence.  A classical example of such a sentence is
"Fruit flies like a banana", in which the subject can either be "Fruit"
or "Fruit flies".

Given a sentence and a grammar, a parsing procedure constructs
one or more possible parse trees, each one representing a
possible structure for the sentence.

Possible complications are:

 Syntactically ambiguous sentences
 Having to parse syntactically incorrect sentences
 Natural languages are rich in declinations, adjectives,
 adverbs, etc. that make the analysis difficult.

(7 points) Semantic Analysis

(3 points) Inputs:

              
               parse tree(s) of the  sentence

(4 points) Outputs:

              
               (partial) translation of the meaning of the sentence
into the system's internal knowledge representation language.

Brief Description (NOT REQUIRED)
The essence of this stage is to translate the sentence into the
internal knowledge representation of the computer program, so that the
program can reason and use the information/knowledge encoded in the
sentence.

During this stage, a partial translation is made based on the parse
tree(s) obtained in the previous syntactic analysis stage.

As before, ambiguity is a possible complication for this stage. That
is, sentences that have a unique syntactic structure but more than one
possible meaning.

(7 points) Pragmatic Analysis

(4 points) Inputs:

                
                 partial representation of the meaning of the sentence
in the system's knowledge representation language
                
 context of and/or domain knowledge about the sentence

(3 points) Outputs:

                
                 final representation of the meaning of the sentence
in the system's knowledge representation language

Brief Description (NOT REQUIRED)
The purpose of this stage is to complete as much as possible the
translation of the original sentence into the internal representation
of the computer program. For this, the context of the sentence is used
to disambiguate as much as possible the meaning of the sentence, to
resolve references (e.g. "Mary bought bananas. She said they are
inexpensive". In the last sentence, She="Mary" and they="bananas").

Possible complications in this stage are difficulties in understand the
intentions of the speaker's message, metaphors, and semantic
ambiguity.

CS4341 Introduction to Artificial Intelligence. B-2003SOLUTIONS - Exam 2 - December 18, 2003

Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute

Problem I. Decision Trees (25 points)

Problem II. Artificial Neural Networks + Genetic Algorithms (25 points)

Problem III. Machine Vision (25 points)

Problem IV. Natural Language Processing (25 points)

CS4341 Introduction to Artificial Intelligence. B-2003
SOLUTIONS - Exam 2 - December 18, 2003

Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute