CREDIT HISTORY | DEBT | INCOME | RISK | |
1 | bad | low | 0to15 | high |
2 | bad | high | 15to35 | high |
3 | bad | low | 0to15 | high |
4 | unknown | high | 15to35 | high |
5 | unknown | high | 0to15 | high |
6 | good | high | 0to15 | high |
7 | bad | low | over35 | moderate |
8 | unknown | low | 15to35 | moderate |
9 | good | high | 15to35 | moderate |
10 | unknown | low | over35 | low |
11 | unknown | low | over35 | low |
12 | good | low | over35 | low |
13 | good | high | over35 | low |
14 | good | high | over35 | low |
We consider here the problem of choosing one of the predicting
attributes as the root of the decision tree.
Note: You DON'T have to construct the whole decision
tree, just the root of the tree.
x | 1/2 | 1/4 | 3/4 | 1/5 | 2/5 | 3/5 | 1/6 | 5/6 | 1/7 | 2/7 | 3/7 | 4/7 | 1 |
log2(x) | -1 | -2 | -0.4 | -2.3 | -1.3 | -0.7 | -2.5 | -0.2 | -2.8 | -1.8 | -1.2 | -0.8 | 0 |
RISK: Low Moderate High x/14*[-y/x log2(y/x) -z/x log2(z/x) -w/x log2(w/x) ] ---------------------------------------------------------------------- CREDIT HISTORY Bad 4/14*[-0 -1/4 log2(1/4) -3/4 log2(3/4) ] +Unknown 5/14*[-2/5 log2(2/5) -1/5 log2(1/5) -2/5 log2(2/5) ] +Good 5/14*[-3/5 log2(3/5) -1/5 log2(1/5) -1/5 log2(1/5) ] = 1.265 ---------------------------------------------------------------------- DEBT Low 7/14*[-3/7 log2(3/7) -2/7 log2(2/7) -2/7 log2(2/7) ] + High 7/14*[-2/7 log2(2/7) -1/7 log2(1/7) -4/7 log2(4/7) ] =1.467 ---------------------------------------------------------------------- INCOME 0to15 4/14*[-0 -0 -4/4 log2(4/4) ] + 15to35 4/14*[-0 -2/4 log2(2/4) -2/4 log2(2/4) ] + over35 6/14*[-5/6 log2(5/6) -1/6 log2(1/6) -0 ] =0.564 ---------------------------------------------------------------------- Hence, INCOME should be selected as the root of the decision tree. this is the predicting attribute with the lowest value of entropy.
Mark the right answer for each of the following questions.
The fitness function is defined over these individuals
as follows:
f(b1b2b3b4b5) = b1 + b2 + b3 + b4 + b5 + AND(b1,b2,b3,b4,b5),
where AND(b1,b2,b3,b4,b5)=1 if b1=b2=b3=b4=b5=1; and AND(b1,b2,b3,b4,b5)=0 otherwise.
Answer the following questions. SHOW EACH STEP OF YOUR WORK.
Individual Fitness Standard "fitness" or quality or probability of being selected 00101 0+0+1+0+1+0 = 2 2/14 = .14 11101 1+1+1+0+1+0 = 4 4/14 = .29 00000 0+0+0+0+0+0 = 0 0/14 = 0 10010 1+0+0+1+0+0 = 2 2/14 = .14 11111 1+1+1+1+1+1 = 6 6/14 = .43 ------- 14
(5 points) Crossover Suppose that a single crossover point will be used for crossover. This point has been chosen as the point between the 2nd and the 3rd bits (i.e., between b2 and b3). Show the two offspring that will result from crossing over the following two individuals:
PARENTS OFFSPRING: First parent: 00.101 First offspring: 00111 Second parent: 10.111 Second offspring: 10101
(5 points) Mutation Explain how the standard mutation method is applied after selection and crossover to form the "next" generation of a population.
For each individual, a bit of the individual will be randomly selected and it will be flipped with a given "mutation rate" probability.
Describe each of the following concepts in a few sentences.
Segmentation is the grouping of the pixels in the matrix into meaningful units. When segmenting a matrix of gray-scale values (most of the time) the edges of these groupings would reflect the possible boundaries of objects. There are two common methods (although many more exist): 1. Threshold: under the threshold method, the range of possible values would be split up into n equally sized slots. Each pixel would belong to the group to which its value corresponds. After processing the entire matrix, the groups are all the members of the slots. This method is sometimes bad because it can lead to pixels completely unrelated in position to be grouped just on the basis of color. 2. Assimilation: this is a similar strategy, but rather than belonging to a global group, each pixel only belongs to the pixels around it that are within a certain threshold function away from it. In this manner, similar groups are formed with similar pixels, but they are spatially isolated from similarly colored groups, leading to a better representation of the scene.
shading: -shadows provide an understanding of the curvature of an object -sudden shading changes represent corners or boundaries -uniform shading in a lit scene corresponds to flat surfaces texture: -texture provides a perspective on the object, providing depth -a good example is the dimples on a golf ball motion: -by monitoring progress as motion occurs, certain points can be traced and a shape model developed also: -as an object moves, the change in the shading on the surface provides information about the shape as well
The number of templates needed is two. The number of points needed is two. Notice that for each point we do not need a z value since in 2-D it is undetectable. Also we do not need the y coordinate since it always stays the same. So, we only need to derive a function that predicts the interpolation of the x-coordinate between the two templates we are given. The function is of the form x_need = Alpha * x1 + Beta * x2. So, we need the two points so that we can solve a system of two equations in two unknowns. The matching is done by finding the alpha and beta values for two points and then use those values of Alpha and Beta to predict the x coordinate of other control points in the unknown object based on the x coordinates of the control points in the templates. Given a small threshold, this procedure will predict which template the object fits, if any at all.
During this stage, the analog signal is analyzed and split into different frequencies (or harmonics) using filters like for instance the Fourier tranformation. These frequencies are used to identify basic natural language sounds, called phonemes, that appear in the signal. Identifying these phonemes may be done using a trained neural network or matching the sound (piece of input signal) against a library of known phonemes. After the phonemes are identified then complete words are constructed from the phonemes.Some difficulties present during this stage (just to mention a few are):
During this stage, the grammar of the natural language is used to determine the structure of the sentence. A grammar is a collection of rules of syntax of the language. If the sentence is syntactically ambiguous, this stage may output more than one possible syntactic structure for the sentence. A classical example of such a sentence is "Fruit flies like a banana", in which the subject can either be "Fruit" or "Fruit flies". Given a sentence and a grammar, a parsing procedure constructs one or more possible parse trees, each one representing a possible structure for the sentence. Possible complications are:
The essence of this stage is to tranlate the sentence into the internal knowledge representation of the computer program, so that the program can reason and use the information/knowledge encoded in the sentence. During this stage, a partial translation is made based on the parse tree(s) obtained in the previous syntactic analysis stage. As before, ambiguity is a possible complication for this stage. That is, sentences that have a unique syntactic structure but more than one possible meaning.
The purpose of this stage is to complete as much as possible the translation of the original sentence into the internal representation of the computer program. For this, the context of the sentence is used to disambiguite as much as possible the meaning of the sentence, to resolve references (e.g. "Mary bought bananas. She said they are inexpensive". In the last sentence, She="Mary" and they="bananas"). Possible complications in this stage are difficulties in understanding the intentions of the speaker's message, metaphors, and semantic ambiguity.