DM: Multimix clustering program (free!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]
DM: Multimix clustering program (free!)

From: Murray Jorgensen
Date: Thu, 13 Nov 1997 21:13:21 -0500 (EST)
Multimix was written by Lyn Hunt to fit mixture models to 
multivariate data
sets as an alternative to other approaches to cluster analysis
(unsupervised learning). Lyn developed this program as part of her 
doctoral
research under my supervision. She now has a faculty position here at 
Waikato.

Lyn and I are pleased to announce that Multimix can now be downloaded 
from

                ftp://ftp.math.waikato.ac.nz/pub/maj/   .

Multimix generalises two common types of models that are finite 
mixtures of
distributions: mixtures of multivariate normals, and latent class 
models.
In the case of multivariate normals it is possible to specify a block
diagonal covariance structure to reduce the number of parameters that 
need
to be estimated. Details are given in the paper talk.ps (or talk.dvi) 
which
may also be downloaded from the above ftp site.

We have decided to make the Fortran 77 source code available so that 
you
will be able to customise Multimix to your own data and platform. The 
sizes
of the multidimensioned arrays used in Multimix are governed by 
parameter
statements which may need to be changed from the supplied values to 
suit
your needs. 

For those who are not accustomed to a statistical modelling approach I
should make clear that in specifying the model it is important to 
keep the
number of estimated parameters as low as possible consistant with a 
good
fit to the data. Unlike some other approaches Multimix does not 
attempt to
determine an optimal number of clusters. We recommend that you first
explore solutions with 2, 3, 4, ... clusters before attempting to go 
any
further. (I say this because when I requested information about array
parameter settings it emerged from several emails that several 
respondants
were seeking what we would regard as quite a large number of 
clusters.)

Before attempting to fit your own data we recommend that you try to
reproduce the output for the Cancer example data and model supplied.

The file README.TXT describes the files available in this 
distribution and
I will paste it into this email below as well. Read the paper
TALK.DVI/TALK.PS before getting started, then read NOTES.DVI or 
NOTES.PS
for some program documentation. Happy mixture modelling!

Multimix.for    contains the program code for fitting a finite        
  
                mixture of K groups to the data.

[Missing.for]   contains a version of Multimix.for which can handle   
  
                missing values in the variables. [Currently           
  
                unavailable while minor changes are being made.]

Talk.dvi        Dvi and Postscript versions of a paper presented
Talk.ps         on 23 August 1996 to the conference ISIS96, 
                Information, Statistics and Induction in Science, 
                held in Melbourne, Australia.[Published in the
                proceedings of the Conference, edited by D. L. Dowe,
                K. B. Korb and J. J. Oliver, World Scientific: 
                Singapore]

Notes.ps        is a postscript file giving information about the     
  
                input required to run Multimix. Please read this      
  
                file.

Read3.for       contains program code for setting up a 
                parameter input file for program Multimix. This is
                useful when setting up the first few runs with a data
                set. Later it is easier to modify existing files
                with a text editor.

Flexi           This subdirectory contains a Bayesian smoothing 
                program written by Martin Upsdell. It is not connected
                with Multimix in any way. Read about Flexi in
                Flexi/Info.txt. Martin's email address is
                upsdellm@agresearch.cri.nz.


EXAMPLE OF DATA FILE, INPUT FILE, AND OUTPUT FILES

Cancer11.dat    contains the cancer data file.

Cancerdesc.txt  A description of the data in Cancer11.dat.

2band.dat       contains a parameter input file for the cancer data. 
                A two-component mixture model is to be fitted. The
                variables are partitioned into blocks. Each block or
                'cell' is assumed independent of the others within
                each component. In the model fitted by 2band.dat
                the distributions of the variables in each block are
                1       Univariate Normal
                2       3-category Discrete
                3       2-category Discrete
                4       Trivariate Normal
                5       7-category Discrete
                6       Univariate Normal
                7       Univariate Normal
                8       Univariate Normal
                9       Univariate Normal
                10      2-category Discrete
                There is some re-ordering of variables to make the
                variables in each block contiguous. An initial 
grouping
                of the observations into two clusters is specified.
                Alternatively initial parameter values could have
                been given.

General.out     is the output file generated when using the parameter 
                file 2band.dat.

Groups.out      contains the group assignment and the posterior       
          
                probabilities of assignment to the two groups when    
  
                using the parameter file 2band.dat.


Queries to Murray Jorgensen <maj@waikato.ac.nz>.


Murray Jorgensen,  Department of Statistics,  U of Waikato, Hamilton, 
NZ
-----[+64-7-838-4773]---------------------------[maj@waikato.ac.nz]-----
Doubt everything or believe everything: these are two equally 
convenient
strategies. With either we dispense with the need to think.
                                                       - Henri 
Poincare'
Prev by Date: Re: DM: Data Preparation Data Reduction
Next by Date: DM: DATA-MINING
Prev by thread: Re: DM: DATA-MINING
Next by thread: DM: RE: Your visualisation pix wanted!
Index(es):
- Date
- Thread