![]() |
|
![]() |
![]() |
|
![]() |
![]() |
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
DM: Multimix clustering program (free!)From: Murray Jorgensen Date: Thu, 13 Nov 1997 21:13:21 -0500 (EST)
Multimix was written by Lyn Hunt to fit mixture models to
multivariate data
sets as an alternative to other approaches to cluster analysis
(unsupervised learning). Lyn developed this program as part of her
doctoral
research under my supervision. She now has a faculty position here at
Waikato.
Lyn and I are pleased to announce that Multimix can now be downloaded
from
ftp://ftp.math.waikato.ac.nz/pub/maj/ .
Multimix generalises two common types of models that are finite
mixtures of
distributions: mixtures of multivariate normals, and latent class
models.
In the case of multivariate normals it is possible to specify a block
diagonal covariance structure to reduce the number of parameters that
need
to be estimated. Details are given in the paper talk.ps (or talk.dvi)
which
may also be downloaded from the above ftp site.
We have decided to make the Fortran 77 source code available so that
you
will be able to customise Multimix to your own data and platform. The
sizes
of the multidimensioned arrays used in Multimix are governed by
parameter
statements which may need to be changed from the supplied values to
suit
your needs.
For those who are not accustomed to a statistical modelling approach I
should make clear that in specifying the model it is important to
keep the
number of estimated parameters as low as possible consistant with a
good
fit to the data. Unlike some other approaches Multimix does not
attempt to
determine an optimal number of clusters. We recommend that you first
explore solutions with 2, 3, 4, ... clusters before attempting to go
any
further. (I say this because when I requested information about array
parameter settings it emerged from several emails that several
respondants
were seeking what we would regard as quite a large number of
clusters.)
Before attempting to fit your own data we recommend that you try to
reproduce the output for the Cancer example data and model supplied.
The file README.TXT describes the files available in this
distribution and
I will paste it into this email below as well. Read the paper
TALK.DVI/TALK.PS before getting started, then read NOTES.DVI or
NOTES.PS
for some program documentation. Happy mixture modelling!
Multimix.for contains the program code for fitting a finite
mixture of K groups to the data.
[Missing.for] contains a version of Multimix.for which can handle
missing values in the variables. [Currently
unavailable while minor changes are being made.]
Talk.dvi Dvi and Postscript versions of a paper presented
Talk.ps on 23 August 1996 to the conference ISIS96,
Information, Statistics and Induction in Science,
held in Melbourne, Australia.[Published in the
proceedings of the Conference, edited by D. L. Dowe,
K. B. Korb and J. J. Oliver, World Scientific:
Singapore]
Notes.ps is a postscript file giving information about the
input required to run Multimix. Please read this
file.
Read3.for contains program code for setting up a
parameter input file for program Multimix. This is
useful when setting up the first few runs with a data
set. Later it is easier to modify existing files
with a text editor.
Flexi This subdirectory contains a Bayesian smoothing
program written by Martin Upsdell. It is not connected
with Multimix in any way. Read about Flexi in
Flexi/Info.txt. Martin's email address is
upsdellm@agresearch.cri.nz.
EXAMPLE OF DATA FILE, INPUT FILE, AND OUTPUT FILES
Cancer11.dat contains the cancer data file.
Cancerdesc.txt A description of the data in Cancer11.dat.
2band.dat contains a parameter input file for the cancer data.
A two-component mixture model is to be fitted. The
variables are partitioned into blocks. Each block or
'cell' is assumed independent of the others within
each component. In the model fitted by 2band.dat
the distributions of the variables in each block are
1 Univariate Normal
2 3-category Discrete
3 2-category Discrete
4 Trivariate Normal
5 7-category Discrete
6 Univariate Normal
7 Univariate Normal
8 Univariate Normal
9 Univariate Normal
10 2-category Discrete
There is some re-ordering of variables to make the
variables in each block contiguous. An initial
grouping
of the observations into two clusters is specified.
Alternatively initial parameter values could have
been given.
General.out is the output file generated when using the parameter
file 2band.dat.
Groups.out contains the group assignment and the posterior
probabilities of assignment to the two groups when
using the parameter file 2band.dat.
Queries to Murray Jorgensen <maj@waikato.ac.nz>.
Murray Jorgensen, Department of Statistics, U of Waikato, Hamilton,
NZ
-----[+64-7-838-4773]---------------------------[maj@waikato.ac.nz]-----
Doubt everything or believe everything: these are two equally
convenient
strategies. With either we dispense with the need to think.
- Henri
Poincare'
|
MHonArc 2.2.0