#### EM Clustering Example. Prof. Ruiz

1. Generate two sets X and Y of 100 random numbers, where each set following a normal distribution. In this example:
• the normal distribution for X has mean 90 and standard deviation 10.
• the normal distribution for Y has mean 60 and standard deviation 10.
Below is the Matlab program two_random_normal_sets.m that I used to achieve this.
```X = random('Normal',90,10,1,100);
Y = random('Normal',60,10,1,100);
D(1:100,1) = X;
D(1:100,2) = 1;
D(101:200,1) = Y;
D(101:200,2) = 2;
```
D contains now 200 data instances, whose first column is a randomly generated number, and its second column tells if the number came from X or from Y. See D contents.

2. Translate D to an arff file: em_dataset_example.arff.

3. I include below the clustering results in Weka: (note that the parameters I used for EM were: weka.clusterers.EM -I 100 -N 2 -M 8.0 -S 100)
```=== Run information ===

Scheme:       weka.clusterers.EM -I 100 -N 2 -M 8.0 -S 100
Relation:     em_example
Instances:    200
Attributes:   2
A
Ignored:
class
Test mode:    Classes to clusters evaluation on training data
=== Model and evaluation on training set ===

EM
==

Number of clusters: 2

Cluster
Attribute         0       1
(0.49)  (0.51)
============================
A
mean       89.7246  60.419
std. dev.   9.0504  8.8655

Clustered Instances

0      100 ( 50%)
1      100 ( 50%)

Log likelihood: -4.1775

Class attribute: class
Classes to Clusters:

0  1  <-- assigned to cluster
94  6 | 1
6 94 | 2

Cluster 0 <-- 1
Cluster 1 <-- 2

Incorrectly clustered instances :	12.0	  6      %
```

4. We can also use Matlab to cluster D. see em_clustering_D_matlab.m. Here is the summary result of the clustering reported by Matlab:
```obj =

Gaussian mixture distribution with 2 components in 1 dimensions
Component 1:
Mixing proportion: 0.482694
Mean:    91.0523

Component 2:
Mixing proportion: 0.517306
Mean:    60.0800
```