## CS4341 Introduction to Artificial Intelligence. C2000 Solutions - Exam 2 - February 29, 2000

### By Prof. Carolina RuizDepartment of Computer ScienceWorcester Polytechnic Institute

#### Instructions

• Write your name on each page
• Ask in case of doubt

### NAME: Carolina Ruiz _______________________________________

#### Problem I. Constraint Satisfaction (20 points)

Suppose that a medical doctor needs your help constructing a constraint net to find prescriptions for his/her flu patients. The following table shows the symptoms exhibited by flu patients and the list of medications that can be prescribed for each symptom.
```Symptom                Medications

Runny nose             R1, R2
Fever                  F1, F2
Sore throat            S1, S2
```
Assume also that the FDA has found that each the following sets of medications presents dangerous drug interactions and so drugs in each set should not be prescribed together.
##### Drugs that must NOT be prescribed together
• {H1,S1}
• {H3,S2}
• {H2,R2}
• {F1,S1}
On the other hand, the following sets of drugs work in conjunction and so if one of them is prescribed then the other one must be prescribed as well.
##### Drugs that must be prescribed together
• {H2,S2}
• {S2,F1}
1. (10 points) Construct a constraint net that helps the medical doctor in finding appropriate prescriptions for his/her flu patients. Provide the constraint matrix associated to each link in the net explicitly.
```
----------         M2         ------------
| Headache |------------------| Runny Nose |
----------                    ------------
|
M1 |
|
|
-------------                    -------
| Sore Throat |------------------| Fever |
-------------        M3          -------

M1 |  S1  S2      M2 |  R1  R2     M3 |  F1  F2
-------------     -------------    -------------
H1 |  0   0       H1 |  1   1      S1 |  0   1
H2 |  0   1       H2 |  1   0      S2 |  1   0
H3 |  1   0       H3 |  1   1

```

2. (10 points) A patient may be allergic to some drugs. Hence, an appropriate prescription for a flu patient should consist of 4 medications, one for each symptom, satisfying the constraints above and to which the patient is not allergic. Find an appropriate prescription for Lucy, who is allergic to F1 and to R2. Describe in detail each of the steps of your problem solving strategy:

1. Arc Consistency Check
```During this stage of the process, arcs are revised to eliminate
values that will no longer be valid due to Lucy's allergies.
We revise below the domain of each of the variables:

Fever: Reduced domain = {F2}
F1 is eliminated because Lucy is allergic to it.

Runny Nose: Reduced domain = {R1}
R2 is eliminated because Lucy is allergic to it.

Sore Throat: Reduced domain = {S1}
S2 is eliminated because according to the constraint matrix M3
the only medication for sore throat that can be prescribed in
conjunction with F2 is S1.

H1 and H2 are eliminated because according to the constraint
matrix M1 the only medication for headache that can be
prescribed in conjunction with S1 is H3.
Note that H1 would be eliminated anyway because all entries
in M1 corresponding to H1 are 0.

```

2. Depth First Search
1. Selection of the 1st node
2. Selection of each consecutive node
```The search is quite simple since the reduced domain of each
variable contains just one value. The only possible solution
is {H3,R1,F2,S1}. Selecting the variables in any order will
lead to the same valid solution in the same amount of time.
```

### NAME: Carolina Ruiz _______________________________________

#### Problem II. Logic-Based Systems (20 points)

Consider the following set of axioms:
1. forall x [equal(x,x)]

2. forall y,z [equal(y,z) -> equal(z,y)]

3. forall w,s,t [equal(w,s) and equal (s,t) -> equal(w,t)]

4. equal(b,a)

5. equal(b,c)
And the conclusion:
• equal(c,a)
As usual, a, b, and c denote constants; and x, y, z, w, s, and t denote variables.

Prove the conclusion from the axioms by refutation using resolution:

• (10 points) Translate the axioms and the negation of the conclusion into clausal form:
```
equal(x,x)
!equal(y,z) or equal(z,y)
![equal(w,s) and equal(s,t)] or equal(w,t)
!equal(w,s) or !equal(s,t) or equal(w,t)
equal(b,a)
equal(b,c)
(negated conclusion:)  !equal(c,a)

```

• (10 points) Apply resolution (state explicitly which substitutions are made):
```
2. !equal(y,z) or equal(z,y)
5. equal(b,c)
---------------------------------------------
7.  equal(c,b)                 substituting y by b, and z by c.

7.  equal(c,b)
3. !equal(w,s) or !equal(s,t) or equal(w,t)
---------------------------------------------
8. !equal(b,t) or equal(c,t)   substituting w by c, and s by b.

8. !equal(b,t) or equal(c,t)
4. equal(b,a)
---------------------------------------------
9. equal(c,a)                  substituting t by a.

9. equal(c,a)
6. !equal(c,a)
----------------
EMPTY CLAUSE

```

### NAME: Carolina Ruiz _______________________________________

#### Problem III. Learning - Decision Trees (20 points)

The following table contains training examples that help a robot janitor predict whether or not an office contains a recycling bin.

 STATUS DEPT. OFFICE SIZE RECYCLING BIN? 1. faculty ee large no 2. staff ee small no 3. faculty cs medium yes 4. student ee large yes 5. staff cs medium no 6. faculty cs large yes 7. student ee small yes 8. staff cs medium no

1. (15 points) Use information theory to construct a minimal decision tree that predicts whether or not an office contains a recycling bin. Show each step of the computation.
```            x      1     2/3    1/3    1/2
---------------------------------------
log_2(x)     0    -0.6   -1.6    -1

Selecting the attribute for the root node:

YES                   NO

Status:

Faculty:    (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)]
Staff:    + (3/8)*[ -(0/3)*log_2(0/3) - (3/3)*log_2(3/3)]
Student:  + (2/8)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
TOTAL:    = 0.35 + 0 + 0 = 0.35

Size:

Large:      (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)]
Medium:   + (3/8)*[ -(1/3)*log_2(1/3) - (2/3)*log_2(2/3)]
Small:    + (2/8)*[ -(1/2)*log_2(1/2) - (1/2)*log_2(1/2)]
TOTAL:    = 0.35 + 0.35 + 2/8 = 0.95

Dept:

ee:         (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
cs:       + (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)]
TOTAL:    = 0.5 + 0.5 = 1

Since status is the attribute with the lowest entropy,
it is selected as the root node:

Status
/     |      \
Faculty  /      | staff \ student
/       |        \
1-,3+,6+   2-,5-,8-   4+,7+
BIN=?      BIN=NO   BIN=YES

Only the branch corresponding to Status=Faculty needs further
processing.

Selecting the attribute to split the branch Status=Faculty:

YES                   NO

Dept:
ee:         (1/3)*[ -(0/1)*log_2(0/1) - (1/1)*log_2(1/1)]
cs:       + (2/3)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)]
TOTAL:    = 0 + 0 = 0

Since the minimum possible entropy is 0 and dept has that
minimum possible entropy, it is selected as the best attribute
to split the branch Status=Faculty.  Note that we don't even have
to calculate the entropy of the attribute size since it
cannot possibly be better than 0.

Status
/     |      \
Faculty  /      | staff \ student
/       |        \
1-,3+,6+   2-,5-,8-   4+,7+
Dept      BIN=NO   BIN=YES
/    \
ee /      \ cs
/        \
1-       3+,6+
BIN=NO    BIN=YES

Each branch ends in a homogeneous set of examples so the
construction of the decision tree ends here.
```
2. (5 points) Translate your decision tree into a collection of decision rules.
```IF status=faculty AND dept=ee THEN recycling-bin=no

IF status=faculty AND dept=cs THEN recycling-bin=yes

IF status=staff THEN recycling-bin=no

IF status=student THEN recycling-bin=yes
```

### NAME: Carolina Ruiz _______________________________________

#### Problem IV. Learning - Neural Networks (20 points)

Consider the following function of two variables:
```        A  B  Desired Output
0  0    1
1  0    0
0  1    0
1  1    1
```
1. (5 points) Prove that this function cannot be learned by a single perceptron that uses the step function as its activation function.
```This function cannot be computed using a single perceptron since
the function is not linearly separable:

B
^
|
0    1
|
|
---1----0-------->
|              A
```
2. (15 points) Explain how you would train a neural network (in which each unit uses the sigmoid function as its activation function) to learn this function by describing in detail each of the steps below:
• (5 points) Topology of your neural nets:
1. How many hidden layers would you use?
```
Only one hidden layer is needed since this is a boolean
function and one hidden layer is sufficient to represent
any boolean function.
```
2. How many hidden units per layer?
```
I'll use just two hidden units since the function seems
simple enough.
```
3. How many connections would your net have?
```
4 connecting the input layer to the hidden layer
+ 2 connecting the hidden layer to the output layer

The 3 links connecting the fixed input -1 to each of
the units can also be taken into account here.

```
4. Draw the neural net that would result from your choices above.
```
-1
|
V
---                 -1
A ------>| C | ---------       |
\    /  ---           |      v
\  /                 |     ---
\/                   --> | E |
/\                   -->  ---
/  \                 |
/    \  ---           |
B ------>| D | ---------
---
^
|
-1

input      hidden               output
layer      layer                layer
```
• (10 points) Training your neural net:
1. How would you select the initial weights of the connections?
```I would select small (i.e. close to 0), random values.
```
2. Explain how the error back propagation algorithm would work when applied to your neural net.
```The algorithm will iterate the following set of steps:

For each of the training examples:

Propagate the inputs forward through the net.
Compute the output of each node (= the sigmoid
function applied to the weighted sum of the inputs
to the node) first for the hidden layer and then for
the output layer.
Compute the errors (i.e. the betas) associated to
each node, starting from the output layer and moving
back to the hidden layer.
Compute the weight changes (i.e. the deltas)
associated with the current training example for
each of the connections in the net.

After processing all examples, update the weight
of each connection by assigning to it the sum of that
weight's delta for each of the examples plus the current
value of the weight.

until we obtain "satisfactory weights". (See the answer
to the following question for more precise stopping
criteria).

```
3. When would you stop the iterations of the error back propagation algorithm?
```The following are possible stopping criteria:

When the updated weights after an iteration of the
error back propagation procedure are almost identical to
the weights before that iteration.
This is in general the best stopping criterion.
When the output of the net is very close to the
desired one. Yeahhhh!
When the number of iterations exceeds a pre-defined
threshold. Say the procedure has run for 20,000 iterations
and it doesn't seem to be converging.
When the output error seems to be increasing instead
of decreasing.

```
4. What would you do if the trained neural net does not generate the desired outputs?
```I would run the error back propagation again modifying
the parameters to correct what seems to be the problem
with the current application of the procedure:

If the updated weights after an iteration of the
error back propagation procedure are almost identical to
the weights before that iteration but the output is not
the desired one:
I would select new (small, random) initial weights and
run the process again.
If the number of iterations exceeds a pre-defined
threshold. Say the procedure has run for 20,000 iterations
and it doesn't seem to be converging:
I would increase the learning rate and/or
select new (small, random) initial weights and
run the process again.
If the output error seems to be increasing instead
of decreasing:
I would decrease the learning rate and/or
select new (small, random) initial weights and
run the process again.
```

### NAME: Carolina Ruiz _______________________________________

#### Problem V. Learning - Genetic Algorithms (20 points)

Consider a genetic algorithm in which the rank method is used to select candidates. Recall that, given a fixed value p between 0 and 1, the probability of picking the first-ranked candidate is p; the probability of picking the second-ranked candidate is p times the remaining probability (1-p), that is p*(1-p); the probability of picking the third-ranked candidate is p times the remaining probability (1-(p + p*(1-p))), that is p*(1-(p + p*(1-p))) = p*(1-p)(1-p); and so on, until only one candidate is left, in which case this candidate receives the remaining probability, so that the probabilities of all the individuals in the population sum up to 1.
1. (7 points) Suppose that you have five candidates. You decide to select one using the rank-space method with p=0.25. Compute the probabilities for each of the five as a function of rank.
```
Individuals     Rank      Rank Fitness
-----------------------------------------
C1           1st         0.25

C2           2nd         0.1875    = 0.25*(1-0.25)

C3           3rd         0.140625  = 0.25*(1-0.4375)

C4           4th         0.1054687 = 0.25*(1-0.578125)

C5           5th         0.3164063 = 1-0.6835937

Note that:

The sum of the rank fitnesses of C1 + C2 = 0.4375
The sum of the rank fitnesses of C1 + C2 + C3 = 0.578125
The sum of the rank fitnesses of C1 + C2 + C3 + C4 = 0.6835937
The sum of the rank fitnesses of C1 + C2 + C3 + C4 + C5 = 1

```
2. (7 points) What is peculiar about the rank fitnesses that you computed in the previous part?
```That the probability of selecting the least fit element (i.e. C5)
is greater than the probabilities of selecting any of the other
(fitter) individuals.
```
3. (6 points) Can you avoid the peculiarity you observed in part 2 by restricting the probability p to a particular range? Your answer should be independent of the number of individuals in the population.
```Pick the constant p strictly greater than 0.5. In that way,
it is guaranteed that the rank fitness is a monotonicly
decreasing function of the rank, since each individual will
consume, in turn, at least half of the remaining probability.

```