Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Clustering algorithm for high-dimensional Boolean space


From: David L Dowe
Date: Fri, 5 Dec 1997 01:41:23 -0500 (EST)
> From owner-datamine-l@nessie.crosslink.net Fri Dec  5 08:58:55 1997
> From: "Rao, Bharat" <bharat@scr.siemens.com>
> To: datamine-l@nautilus-sys.com
> Subject: DM: Clustering algorithm for high-dimensional Boolean space
> Date: Thu, 4 Dec 1997 15:25:06 -0500
> 
> Hello,

   Bharat et al, Hi.


> 
> I'm looking to cluster a dataset where the
> a) data has high-dimensionality (50<n<1000)
> b) relatively few samples ( M=O(n), and occasionally M < n)
> c) and is completely Boolean (all variables are 0/1).
> 
>       [Obviously clustering will be hard, and quite possibly
>        I will end up with a bunch of singleton clusters.  But
>        I'd like to try running some existing algorithms on this
>        data, at least for benchmarking purposes, before trying
>        to develop new algorithms.]
> 
> Can anyone point me to some existing implemented algorithms that
> cluster Boolean data.  (I have already requested a copy of COBWEB
> from Doug Fisher, and realize that AutoClass is not suited for 
>Boolean
> data.)

Snob (using MML, by Chris Wallace and me)
http://www.cs.monash.edu.au/~dld/Snob.html
deals with boolean data and should have no problems with the above.

You might also want to look at Hunt and Jorgensen's MULTIMIX.

Snob WWW page link and MULTIMIX link are below.


> 
> Also, any pointers to work on constructive induction that may be
> relevant
> for constructing new features to help clustering would be 
>appreciated.
> 
> Thanks for any help,
> 
> Bharat
> 
>       [Obviously clustering will be hard, and most likely
>        I will end up with a bunch of singleton clusters.  But
>        I'd like to try running some existing algorithms on this
>        data, at least for benchmarking purposes, before trying
>        to develop new algorithms.]
> 
> R. Bharat Rao,          E-mail:bharat@scr.siemens.com [PGP WELCOME] 
> Adaptive Information & Signal Processing, Siemens Corporate Research
> US Mail: 755 College Road East, Princeton, NJ 08540
> Phones: (609)734-6531(O) (609)734-6565(F)
> <Please ask for my public key or get it from www.pgp.com keyserver.>

- David.

(Dr.) David Dowe, Dept of Computer Science, Monash University, 
Clayton,
Victoria 3168, Australia  dld@cs.monash.edu.au     Fax:+61 3 9905-5146
http://www.cs.monash.edu.au/~dld/
http://www.cs.monash.edu.au/~dld/Snob.html
http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html 



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0