EM Clustering Example.
Prof. Ruiz

Generate two sets X and Y of 100 random numbers, where each set following a normal distribution. In this example:
- the normal distribution for X has mean 90 and standard deviation 10.
- the normal distribution for Y has mean 60 and standard deviation 10.
Below is the Matlab program two_random_normal_sets.m that I used to achieve this.
```
X = random('Normal',90,10,1,100);
Y = random('Normal',60,10,1,100);
D(1:100,1) = X;
D(1:100,2) = 1;
D(101:200,1) = Y;
D(101:200,2) = 2;
```
D contains now 200 data instances, whose first column is a randomly generated number, and its second column tells if the number came from X or from Y. See D contents.
Translate D to an arff file: em_dataset_example.arff.

I include below the clustering results in Weka: (note that the parameters I used for EM were: weka.clusterers.EM -I 100 -N 2 -M 8.0 -S 100)

=== Run information ===

Scheme:       weka.clusterers.EM -I 100 -N 2 -M 8.0 -S 100
Relation:     em_example
Instances:    200
Attributes:   2
              A
Ignored:
              class
Test mode:    Classes to clusters evaluation on training data
=== Model and evaluation on training set ===


EM
==

Number of clusters: 2


            Cluster
Attribute         0       1
             (0.49)  (0.51)
============================
A
  mean       89.7246  60.419
  std. dev.   9.0504  8.8655

Clustered Instances

0      100 ( 50%)
1      100 ( 50%)


Log likelihood: -4.1775


Class attribute: class
Classes to Clusters:

  0  1  <-- assigned to cluster
 94  6 | 1
  6 94 | 2

Cluster 0 <-- 1
Cluster 1 <-- 2

Incorrectly clustered instances :	12.0	  6      %

We can also use Matlab to cluster D. see em_clustering_D_matlab.m. Here is the summary result of the clustering reported by Matlab:

obj = 

Gaussian mixture distribution with 2 components in 1 dimensions
Component 1:
Mixing proportion: 0.482694
Mean:    91.0523

Component 2:
Mixing proportion: 0.517306
Mean:    60.0800

EM Clustering Example. Prof. Ruiz

EM Clustering Example.
Prof. Ruiz