## CS4341 Introduction to Artificial Intelligence. C-2002 SOLUTIONS - Exam 2 - February 28, 2002

### By Carolina Ruiz and Yakov Kronrod

#### Problem I. Decision Trees (20 points)

Consider the following dataset that help us predict the RISK of a loan application based on the applicant's CREDIT HISTORY, DEBT, and INCOME.

 CREDIT HISTORY DEBT INCOME RISK 1 bad low 0to15 high 2 bad high 15to35 high 3 bad low 0to15 high 4 unknown high 15to35 high 5 unknown high 0to15 high 6 good high 0to15 high 7 bad low over35 moderate 8 unknown low 15to35 moderate 9 good high 15to35 moderate 10 unknown low over35 low 11 unknown low over35 low 12 good low over35 low 13 good high over35 low 14 good high over35 low

We consider here the problem of choosing one of the predicting attributes as the root of the decision tree.
Note: You DON'T have to construct the whole decision tree, just the root of the tree.

1. (17 points) Compute the entropy of each of the predicting attributes (CREDIT HISTORY, DEBT, and INCOME) with respect to RISK, based on the 14 instances above. Show all the steps of the calculations. For your inconvenience, the logarithm in base 2 of selected values are provided.

 x 1/2 1/4 3/4 1/5 2/5 3/5 1/6 5/6 1/7 2/7 3/7 4/7 1 log2(x) -1 -2 -0.4 -2.3 -1.3 -0.7 -2.5 -0.2 -2.8 -1.8 -1.2 -0.8 0

2. (3 points) Which one of those predicting attributes (CREDIT HISTORY, DEBT, and INCOME) would be selected as the root of a decision tree? Explain your answer.

#### Solutions

```
RISK: Low               Moderate          High
x/14*[-y/x log2(y/x)     -z/x log2(z/x)    -w/x log2(w/x) ]
----------------------------------------------------------------------
CREDIT
HISTORY
Bad       4/14*[-0                 -1/4 log2(1/4)    -3/4 log2(3/4) ]
+Unknown  5/14*[-2/5 log2(2/5)     -1/5 log2(1/5)    -2/5 log2(2/5) ]
+Good     5/14*[-3/5 log2(3/5)     -1/5 log2(1/5)    -1/5 log2(1/5) ]
= 1.265
----------------------------------------------------------------------
DEBT
Low       7/14*[-3/7 log2(3/7)     -2/7 log2(2/7)    -2/7 log2(2/7) ]
+ High    7/14*[-2/7 log2(2/7)     -1/7 log2(1/7)    -4/7 log2(4/7) ]
=1.467
----------------------------------------------------------------------
INCOME
0to15     4/14*[-0                 -0                -4/4 log2(4/4) ]
+ 15to35  4/14*[-0                 -2/4 log2(2/4)    -2/4 log2(2/4) ]
+ over35  6/14*[-5/6 log2(5/6)     -1/6 log2(1/6)    -0             ]
=0.564
----------------------------------------------------------------------

Hence, INCOME should be selected as the root of the decision tree.
this is the predicting attribute with the lowest value of entropy.

```

#### Problem 2. Artificial Neural Networks (20 points)

Consider the process of training an artificial neural network so that it computes a previously chosen target function. Assume that we train the net using a given set of n (n > 0) instances, each of which consists of k
(k > 0) numeric inputs together with the desired numeric output.

Mark the right answer for each of the following questions.

• (5 points) The activation levels of the network's output nodes depend only on (choose one):
1. The network inputs
2. RIGHT ANSWER: The network inputs and the connection weights
3. The network inputs, the connection weights, and the number of nodes
4. The number of hidden layers

• (5 points) The purpose of the error backpropagation procedure is to (choose one):
1. Search for appropriate inputs for the neural net
2. Search for an appropriate number of layers for the neural net
3. Search for appropriate outputs for the hidden nodes in the neural net
4. RIGHT ANSWER: Search for appropriate connection weights in the neural net

• (5 points) The search strategy followed by the error backpropagation procedure is (choose one):
1. Depth-1st search
2. Greedy search
4. Iterative deepening

• (5 points) Which one of the following is NOT a sound termination condition for the error backpropagation procedure (choose one):
1. stop when the sum (over all the training instances) of the difference between the desired output and the net's output is small enough
2. RIGHT ANSWER: stop when the output of all the hidden nodes is equal to 0
3. stop when you have trained the net for more than 15,000 epochs
4. stop when the changes in all the connection weights in the net are negligible from one epoch to the next

• (5 EXTRA-CREDIT points) For any numeric target function, the maximum number of hidden layers needed to approximate the function to arbitrary accuracy is (choose one):
1. one
3. three
4. four

#### Problem 3. Genetic Algorithms (20 points)

Consider a genetic algorithm in which individuals are represented using a 5-bit string of the form b1b2b3b4b5. An example of an individual is 001001 for which b1=0, b2=0, b3=1, b4=0, b5=1.

The fitness function is defined over these individuals as follows:
f(b1b2b3b4b5) = b1 + b2 + b3 + b4 + b5 + AND(b1,b2,b3,b4,b5),

where AND(b1,b2,b3,b4,b5)=1 if b1=b2=b3=b4=b5=1; and AND(b1,b2,b3,b4,b5)=0 otherwise.

1. (10 points) Selection Complete the following table showing the probabilities of selecting each of the individuals below according to the standard selection method.
NOTE: Following Winston's notation, below I'm calling "fitness" (in quotes) the probability that an individual has of being selected under a selection method.
```
Individual     Fitness               Standard "fitness"
or quality            or probability of
being selected

00101          0+0+1+0+1+0 = 2       2/14 = .14

11101		 1+1+1+0+1+0 = 4       4/14 = .29

00000		 0+0+0+0+0+0 = 0       0/14 = 0

10010		 1+0+0+1+0+0 = 2       2/14 = .14

11111  	 1+1+1+1+1+1 = 6       6/14 = .43

-------
14
```
2. (5 points) Crossover Suppose that a single crossover point will be used for crossover. This point has been chosen as the point between the 2nd and the 3rd bits (i.e., between b2 and b3). Show the two offspring that will result from crossing over the following two individuals:

```     PARENTS                     OFFSPRING:

First parent:  00.101       First offspring:  00111

Second parent: 10.111       Second offspring: 10101

```
3. (5 points) Mutation Explain how the standard mutation method is applied after selection and crossover to form the "next" generation of a population.

For each individual, a bit of the individual will be randomly selected and it will be flipped with a given "mutation rate" probability.

#### Problem 4. Machine Vision (20 points)

Consider the problem of automatically understanding which objects are present in a 3 dimensional scene from (a subset of) its 2 dimensional digital images. A 2 dimensional digital image is just a matrix of pixels.

Describe each of the following concepts in a few sentences.

1. (5 points) Segmentation.
```Segmentation is the grouping of the pixels in the matrix into
meaningful units.  When segmenting a matrix of gray-scale values (most
of the time) the edges of these groupings would reflect the possible
boundaries of objects.  There are two common methods (although many
more exist):

1. Threshold:
under the threshold method, the range of possible values would be
split up into n equally sized slots. Each pixel would belong to the
group to which its value corresponds.  After processing the entire
matrix, the groups are all the members of the slots.  This method is
position to be grouped just on the basis of color.

2. Assimilation:
this is a similar strategy, but rather than belonging to a global
group, each pixel only belongs to the pixels around it that are
within a certain threshold function away from it. In this manner,
similar groups are formed with similar pixels, but they are
spatially isolated from similarly colored groups, leading to a
better representation of the scene.

```
2. (9 points) Determination of shape. Describe and/or give examples of how the shape of an object can be inferred from shading, texture, and motion.
```shading:
-shadows provide an understanding of the curvature of an object
-sudden shading changes represent corners or boundaries
-uniform shading in a lit scene corresponds to flat surfaces

texture:
-texture provides a perspective on the object, providing depth
-a good example is the dimples on a golf ball

motion:
-by monitoring progress as motion occurs, certain points can be
traced and a shape model developed

also:
-as an object moves, the change in the shading on the surface provides
information about the shape as well
```
3. (6 points) Determination of shape by matching the object agains templates in a database. Explain how the matching is done. State how many templates and how many corresponding points are needed to identify an object when only rotations around the y-axis are allowed.
```The number of templates needed is two.
The number of points needed is two.
Notice that for each point we do not need a z value since in 2-D it is
undetectable.  Also we do not need the y coordinate since it always
stays the same.  So, we only need to derive a function that predicts
the interpolation of the x-coordinate between the two templates we are
given.  The function is of the form x_need = Alpha * x1 + Beta * x2.
So, we need the two points so that we can solve a system of two
equations in two unknowns.  The matching is done by finding the alpha
and beta values for two points and then use those values of Alpha and
Beta to predict the x coordinate of other control points in the unknown
object based on the x coordinates of the control points in the templates.
Given a small threshold, this procedure will predict which template the
object fits, if any at all.
```

#### Problem 5. Natural Language Processing (20 points)

For each of the following stages of the the natural language understanding process, describe in a few sentences what the stage does, what approaches/techniques to implement the stage exist, and complications that may occur in the stage. Illustrate your points. For your convenience, the input and the output of each stage is provided.
1. (5 points) Speech Recognition
Input: raw speech signal
Output: sentence (i.e., sequence of words spoken)
Description:
```During this stage, the analog signal is analyzed and split
into different frequencies (or harmonics) using filters
like for instance the Fourier tranformation.
These frequencies are used to identify basic natural language
sounds, called phonemes, that appear in the signal.
Identifying these phonemes may be done using a trained
neural network or matching the sound (piece of input signal)
against a library of known phonemes.
After the phonemes are identified then complete words are constructed
from the phonemes.

Some difficulties present during this stage (just to mention
a few are):

There is no simple mapping between sounds and words. Statistical
information on words and phonemes is needed to help disambiguite.
There is a great variance in pronunciation from person
to person due to gender, dialect spoken by the person, accent, etc.
Even the same person may pronounce words differently
depending on the context.
Same sound may correspond to different words.
It's hard to split the continuous signal into words due to
the fact that people often phonetically connect contiguous words.
background noise may interfere.

```
2. (5 points) Syntactic Analysis
Input: sentence (i.e., sequence of words spoken)
Output: structure(s) of the sentence
Description:
```During this stage, the grammar of the natural language is
used to determine the structure of the sentence.
A grammar is a collection of rules of syntax of the
language.
If the sentence is syntactically ambiguous, this stage may
output more than one possible syntactic structure for the
sentence.  A classical example of such a sentence is
"Fruit flies like a banana", in which the subject can
either be "Fruit" or "Fruit flies".

Given a sentence and a grammar, a parsing
procedure constructs one or more possible
parse trees, each one representing a
possible structure for the sentence.

Possible complications are:

Syntactically ambiguous sentences
Having to parse syntactically incorrect sentences
Natural languages are rich in declinations, adjectives,
adverbs, etc. that make the analysis difficult.

```
3. (5 points) Semantic Analysis
Input: structure(s) of the sentence
Output: partial representation of the meaning of the sentence
Description:
```The essence of this stage is to tranlate the
sentence into the internal knowledge representation
of the computer program, so that the program can
reason and use the information/knowledge encoded
in the sentence.

During this stage, a partial translation is made
based on the parse tree(s) obtained in the previous
syntactic analysis stage.

As before, ambiguity is a possible complication for
this stage. That is, sentences that have a unique
syntactic structure but more than one possible
meaning.
```
4. (5 points) Pragmatic Analysis
Input: partial representation of the meaning of the sentence
Output: final representation of the meaning of the sentence
Description:
```
The purpose of this stage is to complete as much
as possible the translation of the original sentence
into the internal representation of the computer
program. For this, the context of the sentence
is used to disambiguite as much as possible the
meaning of the sentence, to resolve references
(e.g. "Mary bought bananas. She said they are
inexpensive". In the last sentence, She="Mary"
and they="bananas").

Possible complications in this stage are
difficulties in understanding the intentions of
the speaker's message, metaphors, and semantic
ambiguity.
```