Re: DM: Missing items in clustering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Missing items in clustering

From: David L Dowe
Date: Wed, 8 Oct 1997 22:54:48 -0400 (EDT)

   Dear DM people,
        particularly those interested in mixture modelling and its 
synonymns
        (clustering, numerical taxonomy, intrinsic classification, 
etc.),

   There are a couple of recurring themes in this mailing list:

One theme is that people mail in from time to time asking whether 
anyone
knows of a good program for mixture modelling (or its synonyms).
Various people, usually including me, respond.

A related sub-theme is that people mail in from time to time wishing 
to do
mixture modelling with either multinomial (multi-category) variables  
and/or
with missing data.

If these issues do not interest you, please read no further.
If these issues do interest you, possibly (e.g.) administration could 
store
away the above two items as FAQs, as they certainly are asked 
frequently.

Re mixture modelling programs, see (e.g.)

-  Murray Jorgensen's stuff (as in his e-mail below)           and/or

-  my Snob program (dating back to 1968) with Chris Wallace
   http://www.cs.monash.edu.au/~dld/Snob.html                  and/or

-  my mixture modelling page, which is chock-a-block full of refs and 
links
   http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html 

Re mixture modelling programs which deal (as in the request below from
Raj Kumaralingam) with missing data or with multinomial data,
two of the not many programs for doing this listed in my mixtures page
http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html 
are indeed Snob (Chris Wallace and David Dowe) and
Lyn Hunt and Murray Jorgensen's MULTIMIX.
A third program for dealing with discrete data (but perhaps not for 
missing
data) is Marty Puterman's at given at 
http://markov.commerce.ubc.ca/marty/ .

At this point, usually after Murray and I reply to the DM list, this 
topic
then goes quiet (till it is next raised).   Is anyone else out there 
aware
of other mixture modelling programs for multinomial data (other than
Marty Puterman's) or missing data?

Also, re Snob and MULTIMIX (and Marty Puterman's work),
anyone out there want to do an empirical study and publish (and 
report) it?

Regards, and earlier e-mail is appended below.           - David.

Dr. David Dowe, Dept of Computer Science, Monash University, Clayton,
Victoria 3168, Australia  dld@cs.monash.edu.au     Fax:+61 3 9905-5146
http://www.cs.monash.edu.au/~dld/
http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html 
http://www.cs.monash.edu.au/~dld/Snob.html

> From owner-datamine-l@nessie.crosslink.net Wed Oct  8 10:08:54 1997
> Date: Wed, 08 Oct 1997 12:14:25 +1300
> To: "'datamine-l@nautilus-sys.com'" <datamine-l@nautilus-sys.com>
> From: Murray Jorgensen <maj@waikato.ac.nz>
> Subject: Re: DM: Missing items in clustering!
> 
> My collegue Lyn Hunt has written a clustering program called 
>MULTIMIX based
> on ideas related to what is often called "Naive Bayes" using the EM
> algorithm for maximum likelihood estimation with missing 
>information. The
> information that is always missing is the association of objects to
> clusters, but in addition values of the measured variables may be 
>missing.
> 
> Distance matrix based clustering has an inherent problem with 
>missing
> observations because there is no underlying statistical model. One 
>ad hoc
> approach might be to regress indivudual variables against others 
>and use
> fitted values to impute the values of missing data.
> 
> At 14:32 7/10/97 -0500, you wrote:
> >Hi all,
> >Does anyone have any pointers to how to handle missing 
> >items in a clustering context (I'm currently using Euclidean metric
> >based clustering).
> >
> >Thanks in advance
> >Raj
> >
> >
> Dr Murray Jorgensen        maj@waikato.ac.nz       Phone +64-7 838 
>4773
> Department of Statistics         home phone 856 6705;      Fax 838 
>4666
> University of Waikato  
>http://www.cs.waikato.ac.nz/stats/Staff/maj.html
> Hamilton, New Zealand        **** Editor: New Zealand Statistician 
>****

Prev by Date: DM: Production mining
Next by Date: DM: PAKDD Final CFP: Papers Due 16 Oct 1997
Prev by thread: DM: PAKDD Final CFP: Papers Due 16 Oct 1997
Next by thread: DM: Production mining
Index(es):
- Date
- Thread