CS4341 Introduction to Artificial Intelligence. C-2002
SOLUTIONS - Exam 2 - February 28, 2002

Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute

By Carolina Ruiz and Yakov Kronrod

Problem I. Decision Trees (20 points)

Consider the following dataset that help us predict the RISK of a loan application based on the applicant's CREDIT HISTORY, DEBT, and INCOME.

CREDIT HISTORY DEBT INCOME RISK
1 bad low 0to15 high
2 bad high 15to35 high
3 bad low 0to15 high
4 unknown high 15to35 high
5 unknown high 0to15 high
6 good high 0to15 high
7 bad low over35 moderate
8 unknown low 15to35 moderate
9 good high 15to35 moderate
10 unknown low over35 low
11 unknown low over35 low
12 good low over35 low
13 good high over35 low
14 good high over35 low

We consider here the problem of choosing one of the predicting attributes as the root of the decision tree.
Note: You DON'T have to construct the whole decision tree, just the root of the tree.

  1. (17 points) Compute the entropy of each of the predicting attributes (CREDIT HISTORY, DEBT, and INCOME) with respect to RISK, based on the 14 instances above. Show all the steps of the calculations. For your inconvenience, the logarithm in base 2 of selected values are provided.

    x 1/2 1/4 3/4 1/5 2/5 3/5 1/6 5/6 1/7 2/7 3/7 4/7 1
    log2(x) -1 -2 -0.4 -2.3 -1.3 -0.7 -2.5 -0.2 -2.8 -1.8 -1.2 -0.8 0

  2. (3 points) Which one of those predicting attributes (CREDIT HISTORY, DEBT, and INCOME) would be selected as the root of a decision tree? Explain your answer.

Solutions


          RISK: Low               Moderate          High
          x/14*[-y/x log2(y/x)     -z/x log2(z/x)    -w/x log2(w/x) ]
----------------------------------------------------------------------
CREDIT
HISTORY
Bad       4/14*[-0                 -1/4 log2(1/4)    -3/4 log2(3/4) ]
+Unknown  5/14*[-2/5 log2(2/5)     -1/5 log2(1/5)    -2/5 log2(2/5) ]
+Good     5/14*[-3/5 log2(3/5)     -1/5 log2(1/5)    -1/5 log2(1/5) ]
= 1.265
----------------------------------------------------------------------
DEBT
Low       7/14*[-3/7 log2(3/7)     -2/7 log2(2/7)    -2/7 log2(2/7) ]
+ High    7/14*[-2/7 log2(2/7)     -1/7 log2(1/7)    -4/7 log2(4/7) ]
=1.467
----------------------------------------------------------------------
INCOME
0to15     4/14*[-0                 -0                -4/4 log2(4/4) ]
+ 15to35  4/14*[-0                 -2/4 log2(2/4)    -2/4 log2(2/4) ]
+ over35  6/14*[-5/6 log2(5/6)     -1/6 log2(1/6)    -0             ]
=0.564
----------------------------------------------------------------------

Hence, INCOME should be selected as the root of the decision tree.
this is the predicting attribute with the lowest value of entropy.


Problem 2. Artificial Neural Networks (20 points)

Consider the process of training an artificial neural network so that it computes a previously chosen target function. Assume that we train the net using a given set of n (n > 0) instances, each of which consists of k
(k > 0) numeric inputs together with the desired numeric output.

Mark the right answer for each of the following questions.


Problem 3. Genetic Algorithms (20 points)

Consider a genetic algorithm in which individuals are represented using a 5-bit string of the form b1b2b3b4b5. An example of an individual is 001001 for which b1=0, b2=0, b3=1, b4=0, b5=1.

The fitness function is defined over these individuals as follows:
f(b1b2b3b4b5) = b1 + b2 + b3 + b4 + b5 + AND(b1,b2,b3,b4,b5),

where AND(b1,b2,b3,b4,b5)=1 if b1=b2=b3=b4=b5=1; and AND(b1,b2,b3,b4,b5)=0 otherwise.

Answer the following questions. SHOW EACH STEP OF YOUR WORK.

  1. (10 points) Selection Complete the following table showing the probabilities of selecting each of the individuals below according to the standard selection method.
    NOTE: Following Winston's notation, below I'm calling "fitness" (in quotes) the probability that an individual has of being selected under a selection method.
    
      Individual     Fitness               Standard "fitness"
                     or quality            or probability of  
                                           being selected
    
      00101          0+0+1+0+1+0 = 2       2/14 = .14
    
      11101		 1+1+1+0+1+0 = 4       4/14 = .29
    
      00000		 0+0+0+0+0+0 = 0       0/14 = 0
    
      10010		 1+0+0+1+0+0 = 2       2/14 = .14
    
      11111  	 1+1+1+1+1+1 = 6       6/14 = .43
    
                                -------
                                  14
    
  2. (5 points) Crossover Suppose that a single crossover point will be used for crossover. This point has been chosen as the point between the 2nd and the 3rd bits (i.e., between b2 and b3). Show the two offspring that will result from crossing over the following two individuals:

         PARENTS                     OFFSPRING:
      
         First parent:  00.101       First offspring:  00111
    
         Second parent: 10.111       Second offspring: 10101
       
    
  3. (5 points) Mutation Explain how the standard mutation method is applied after selection and crossover to form the "next" generation of a population.

    For each individual, a bit of the individual will be randomly selected and it will be flipped with a given "mutation rate" probability.

     


Problem 4. Machine Vision (20 points)

Consider the problem of automatically understanding which objects are present in a 3 dimensional scene from (a subset of) its 2 dimensional digital images. A 2 dimensional digital image is just a matrix of pixels.

Describe each of the following concepts in a few sentences.

  1. (5 points) Segmentation.
    Segmentation is the grouping of the pixels in the matrix into
    meaningful units.  When segmenting a matrix of gray-scale values (most
    of the time) the edges of these groupings would reflect the possible
    boundaries of objects.  There are two common methods (although many
    more exist):
    
    1. Threshold:
       under the threshold method, the range of possible values would be
       split up into n equally sized slots. Each pixel would belong to the
       group to which its value corresponds.  After processing the entire
       matrix, the groups are all the members of the slots.  This method is
       sometimes bad because it can lead to pixels completely unrelated in
       position to be grouped just on the basis of color.
    
    2. Assimilation:
       this is a similar strategy, but rather than belonging to a global
       group, each pixel only belongs to the pixels around it that are
       within a certain threshold function away from it. In this manner,
       similar groups are formed with similar pixels, but they are
       spatially isolated from similarly colored groups, leading to a
       better representation of the scene.
    
    
  2. (9 points) Determination of shape. Describe and/or give examples of how the shape of an object can be inferred from shading, texture, and motion.
    shading:
       -shadows provide an understanding of the curvature of an object
       -sudden shading changes represent corners or boundaries
       -uniform shading in a lit scene corresponds to flat surfaces
    
    texture:
       -texture provides a perspective on the object, providing depth
       -a good example is the dimples on a golf ball
    
    motion:
       -by monitoring progress as motion occurs, certain points can be 
        traced and a shape model developed
       
      also:
       -as an object moves, the change in the shading on the surface provides 
        information about the shape as well
    
  3. (6 points) Determination of shape by matching the object agains templates in a database. Explain how the matching is done. State how many templates and how many corresponding points are needed to identify an object when only rotations around the y-axis are allowed.
    The number of templates needed is two.
    The number of points needed is two. 
    Notice that for each point we do not need a z value since in 2-D it is
    undetectable.  Also we do not need the y coordinate since it always
    stays the same.  So, we only need to derive a function that predicts
    the interpolation of the x-coordinate between the two templates we are
    given.  The function is of the form x_need = Alpha * x1 + Beta * x2.
    So, we need the two points so that we can solve a system of two
    equations in two unknowns.  The matching is done by finding the alpha
    and beta values for two points and then use those values of Alpha and
    Beta to predict the x coordinate of other control points in the unknown
    object based on the x coordinates of the control points in the templates.
    Given a small threshold, this procedure will predict which template the 
    object fits, if any at all.
    

Problem 5. Natural Language Processing (20 points)

For each of the following stages of the the natural language understanding process, describe in a few sentences what the stage does, what approaches/techniques to implement the stage exist, and complications that may occur in the stage. Illustrate your points. For your convenience, the input and the output of each stage is provided.
  1. (5 points) Speech Recognition
    Input: raw speech signal
    Output: sentence (i.e., sequence of words spoken)
    Description:
    During this stage, the analog signal is analyzed and split
    into different frequencies (or harmonics) using filters
    like for instance the Fourier tranformation. 
    These frequencies are used to identify basic natural language
    sounds, called phonemes, that appear in the signal.
    Identifying these phonemes may be done using a trained
    neural network or matching the sound (piece of input signal)
    against a library of known phonemes.
    After the phonemes are identified then complete words are constructed
    from the phonemes.
    
    

    Some difficulties present during this stage (just to mention a few are):

    • There is no simple mapping between sounds and words. Statistical information on words and phonemes is needed to help disambiguite.
    • There is a great variance in pronunciation from person to person due to gender, dialect spoken by the person, accent, etc.
    • Even the same person may pronounce words differently depending on the context.
    • Same sound may correspond to different words.
    • It's hard to split the continuous signal into words due to the fact that people often phonetically connect contiguous words.
    • background noise may interfere.
  2. (5 points) Syntactic Analysis
    Input: sentence (i.e., sequence of words spoken)
    Output: structure(s) of the sentence
    Description:
    During this stage, the grammar of the natural language is
    used to determine the structure of the sentence.
    A grammar is a collection of rules of syntax of the
    language.
    If the sentence is syntactically ambiguous, this stage may
    output more than one possible syntactic structure for the
    sentence.  A classical example of such a sentence is
    "Fruit flies like a banana", in which the subject can
    either be "Fruit" or "Fruit flies".
    
    Given a sentence and a grammar, a parsing
    procedure constructs one or more possible 
    parse trees, each one representing a 
    possible structure for the sentence.
    
    Possible complications are:
    
    • Syntactically ambiguous sentences
    • Having to parse syntactically incorrect sentences
    • Natural languages are rich in declinations, adjectives, adverbs, etc. that make the analysis difficult.
  3. (5 points) Semantic Analysis
    Input: structure(s) of the sentence
    Output: partial representation of the meaning of the sentence
    Description:
    The essence of this stage is to tranlate the
    sentence into the internal knowledge representation
    of the computer program, so that the program can
    reason and use the information/knowledge encoded
    in the sentence.
    
    During this stage, a partial translation is made
    based on the parse tree(s) obtained in the previous
    syntactic analysis stage.
    
    As before, ambiguity is a possible complication for 
    this stage. That is, sentences that have a unique
    syntactic structure but more than one possible 
    meaning.
    
  4. (5 points) Pragmatic Analysis
    Input: partial representation of the meaning of the sentence
    Output: final representation of the meaning of the sentence
    Description:
    
    The purpose of this stage is to complete as much
    as possible the translation of the original sentence
    into the internal representation of the computer
    program. For this, the context of the sentence
    is used to disambiguite as much as possible the
    meaning of the sentence, to resolve references
    (e.g. "Mary bought bananas. She said they are 
    inexpensive". In the last sentence, She="Mary"
    and they="bananas").
    
    Possible complications in this stage are
    difficulties in understanding the intentions of
    the speaker's message, metaphors, and semantic
    ambiguity.