>> X = random('Normal',90,10,1,100);
This mean=91.2309 and standard deviation=11.5657 come from estimating maximum likelihood using the equations (4.8) on p. 68 of Alpaydin's textbook (3rd edition):>> T = mle(X); >> T T = 91.2309 11.5657
mean = sum(x^t)/N sd^2 = sum (x^t - mean)^2 / Nwhere N = 100 in this example.
Note that [T,pci] = mle(___) also returns the 95% confidence intervals for the parameters:>> m = sum(X)/100 m = 91.2309 >> s = sqrt(sum((X - [sum(X)/100]).^2)/100) s = 11.5657
Note also that as the sample size increases, the maximum likelihood estimations get closer to the actual parameter values:>> [T,pci] = mle(X) T = 91.2309 11.5657 pci = 88.9244 10.2059 93.5373 13.5033
We can see the histogram plots for each of these random sets:>> X2 = random('Normal',90,10,1,1000); >> mle(X2) ans = 89.6348 9.8753 >> X3 = random('Normal',90,10,1,10000); >> mle(X3) ans = 90.0433 9.8635
Note: Matlab's mle function can also calculate mle on non-Gaussian distributions.>> figure; hist(X) >> figure; hist(X2) >> figure; hist(X3)
D contains now 200 data instances, whose first column is a randomly generated number, and its second and third columns tell if the number came from C1 or from C2.C1 = random('Normal',60,10,1,100); C2 = random('Normal',90,5,1,100); D(1:100,1) = C1; D(1:100,2) = 1; D(1:100,3) = 0; D(101:200,1) = C2; D(101:200,2) = 0; D(101:200,3) = 1;
We can use histograms to plot C1 and C2:
>> figure; hist(C1) >> figure; hist(C2)
(4.25) m_i = sum_t(x^t * r^t_i)/N_i (4.26) s^2_i = sum_t (x^t - m_i)^2 * r^t_i/ N_i where: N_i = sum_t(r^t_i)
Now using equation (4.28) p. 75 after disregarding its first term (which is constant) and its forth term (assuming the prior probabilities of C1 and C2, P(C1) and P(C2) are the same), we get:>> N1= sum(D(:,2)) N1 = 100 >> N2= sum(D(:,3)) N2 = 100 >> m1 = sum(D(:,1).*D(:,2))/N1 m1 = 60.9839 >> m2 = sum(D(:,1).*D(:,3))/N2 m2 = 90.1958 >> s1 = sqrt(sum(D(:,2).*(D(:,1)- m1).^2)/N1) s1 = 8.9451 >> s2 = sqrt(sum(D(:,3).*(D(:,1)- m2).^2)/N2) s2 = 4.5562
g_i(x) = - log s_i - ((x - m_i)^2/(2 s_i^2))
In order to classify a data instance x, we calculate g1(x) and g2(x), and pick the class corresponding to the larger of these two values. That is,g1 = @(x) - log(s1) - ((x - m1)^2/(2*s1^2)) g2 = @(x) - log(s2) - ((x - m2)^2/(2*s2^2))
ChooseCi(x): If g1(x) >= g2(x) then pick C1 else pick C2For example, for x=50:
>> if g1(50) >= g2(50) 1 else 2 end ans = 1