###
Miscellaneous Notes on Parametric Machine Learning Methods in Matlab

#### Generating Normally Distributed Random Data:

Generate a set X of 100 random numbers following a normal distribution.
In this example:
the normal distribution for X has mean 90 and standard deviation 10.
>> X = random('Normal',90,10,1,100);

#### Maximum Likelihood Estimation (MLE):

Perform Maximum Likelihood Estimation
(mle matlab function)
on the set X generated above:
>> T = mle(X);
>> T
T =
91.2309 11.5657

This mean=91.2309 and standard deviation=11.5657 come from estimating maximum likelihood using the equations (4.8) on p. 68 of Alpaydin's textbook (3rd edition):
mean = sum(x^t)/N
sd^2 = sum (x^t - mean)^2 / N

where N = 100 in this example.
>> m = sum(X)/100
m =
91.2309
>> s = sqrt(sum((X - [sum(X)/100]).^2)/100)
s =
11.5657

Note that [T,pci] = mle(___) also returns the 95% confidence intervals for the parameters:
>> [T,pci] = mle(X)
T =
91.2309 11.5657
pci =
88.9244 10.2059
93.5373 13.5033

Note also that as the sample size increases, the maximum likelihood estimations get closer to the actual parameter values:
>> X2 = random('Normal',90,10,1,1000);
>> mle(X2)
ans =
89.6348 9.8753
>> X3 = random('Normal',90,10,1,10000);
>> mle(X3)
ans =
90.0433 9.8635

We can see the histogram plots for each of these random sets:
>> figure; hist(X)
>> figure; hist(X2)
>> figure; hist(X3)

**Note:** Matlab's mle function can also calculate mle on non-Gaussian distributions.

#### Generating Normally Distributed Random Data to Illustrate Parametric Classification:

Generate two sets X and Y of 100 random numbers, where each set following a normal distribution. In this example:
- the normal distribution for X has mean 90 and standard deviation 10.
- the normal distribution for Y has mean 60 and standard deviation 10.

Below is the Matlab program
two_random_normal_sets.m that I used to achieve this.
C1 = random('Normal',60,10,1,100);
C2 = random('Normal',90,5,1,100);
D(1:100,1) = C1;
D(1:100,2) = 1;
D(1:100,3) = 0;
D(101:200,1) = C2;
D(101:200,2) = 0;
D(101:200,3) = 1;

D contains now 200 data instances, whose first column is a randomly generated number, and its second and third columns tell if the number came from C1 or from C2.
We can use histograms to plot C1 and C2:

>> figure; hist(C1)
>> figure; hist(C2)

#### Parametric Classification:

We can use equations (4.25) and (4.26) on p. 74 of Alpaydin's textbook (3rd edition):
(4.25) m_i = sum_t(x^t * r^t_i)/N_i
(4.26) s^2_i = sum_t (x^t - m_i)^2 * r^t_i/ N_i
where: N_i = sum_t(r^t_i)

>> N1= sum(D(:,2))
N1 =
100
>> N2= sum(D(:,3))
N2 =
100
>> m1 = sum(D(:,1).*D(:,2))/N1
m1 =
60.9839
>> m2 = sum(D(:,1).*D(:,3))/N2
m2 =
90.1958
>> s1 = sqrt(sum(D(:,2).*(D(:,1)- m1).^2)/N1)
s1 =
8.9451
>> s2 = sqrt(sum(D(:,3).*(D(:,1)- m2).^2)/N2)
s2 =
4.5562

Now using equation (4.28) p. 75 after disregarding its first term
(which is constant) and its forth term (assuming the prior probabilities of
C1 and C2, P(C1) and P(C2) are the same), we get:
g_i(x) = - log s_i - ((x - m_i)^2/(2 s_i^2))

g1 = @(x) - log(s1) - ((x - m1)^2/(2*s1^2))
g2 = @(x) - log(s2) - ((x - m2)^2/(2*s2^2))

In order to classify a data instance x, we calculate g1(x) and g2(x), and pick the class corresponding to the larger of these two values. That is,
ChooseCi(x): If g1(x) >= g2(x) then pick C1 else pick C2

For example, for x=50:
>> if g1(50) >= g2(50)
1
else
2
end
ans =
1

ruiz@cs.wpi.edu