Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: equal-size clustering


From: Warren Sarle
Date: Thu, 4 Sep 1997 13:17:21 -0400 (EDT)
David Dowe writes:
> > From owner-datamine-l@nessie.crosslink.net Thu Sep  4 13:51:21 
>1997
> > Date: Thu, 04 Sep 1997 11:05:09 +0800
> > From: Hukan <hukan@cs.hku.hk>
> > To: datamine-l@nautilus-sys.com
> > Subject: DM: equal-size clustering
> > ...
> >    I have a special clustering problem. Given a set of points in 
>the
> > multidimensional space, we want to cluster these points under the
> > limition that the sizes of clusters are (almost) equal. Could 
>anyone
> > give me some suggestions?
> ...
> I would do this by MML (Minimum Message Length), and would use Snob
> http://www.cs.monash.edu.au/~dld/Snob.html
> modified so that the relative class abundances had to be (almost) 
>equal,
> and I would try to quantify "(almost) equal" with the best Bayesian 
>priors
> I could.
> 
> No doubt, others will come up with alternative suggestions.

It is impossible to say what the best method of analysis is without
knowing what the purpose of the analysis is.

You can have a Bayesian prior that says that the population mixing
probabilities are exactly equal, but that will not force the sample
mixing proportions to be approximately equal. If the distribution is
more or less uniform, the sample mixing proportions will be nearly
equal, but if the population contains well-separated clusters with
radically different mixing probabilities, the prior will have little
effect.

K-means and numerous similar methods implicitly assume that the
population mixing probabilities are exactly equal. One popular
method for forcing the sample mixing proportions to be more nearly
equal is given in: 

   Desieno, D. (1988), "Adding a conscience to competitive learning,"
   Proc. Int. Conf. on Neural Networks, I, 117-124, IEEE Press. 

Rather than try to force a false model on the data, it might be better
to transform the data to have an approximately uniform distribution.
But, as I said, it is impossible to know whether this is appropriate
without knowing the purpose of the analysis.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not 
necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
* Do not send me unsolicited commercial, political, or religious 
email *



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0