Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Clustering and categorical attributes


From: Ted Pedersen
Date: Mon, 12 Jan 1998 09:30:40 -0500 (EST)
> 
> Hello-
> 
> I have a question about non-numeric data - what are good algorithms 
>to
> use when doing cluster analysis with attributes that have many
> categories?  For instance, I have a dataset in which several fields 
>are
> alpha-numeric codes, and there are at least 1000 possible codes.
> 
> I have read of a method that is based on k-means clustering.  Are 
>there
> others?
> 

I have used McQuitty's Similarity analysis for non-numeric data. 
The algorithm is described in:

@article{Mcquitty66,
        author = {McQuitty, L.},
        title = {Similarity Analysis by Reciprocal Pairs for Discrete 
and 
Continuous Data},
        journal = {Educational and Psychological Measurement},
        volume = {26},
        year = {1966},
        pages = {825--831}}

The algorithm isn't too hard to follow and there are some very
clear examples in the paper.

SAS supports this in PROC CLUSTER. And it's a fairly simple algorithm 
and wouldn't be too hard to implement if need be. 

Best of luck,
Ted

-- 
* Ted Pedersen                     pedersen@seas.smu.edu              
* 
*                                  http://www.seas.smu.edu/~pedersen/ 
*
* Department of Computer Science and Engineering,                     
*
* Southern Methodist University, Dallas, TX 75275      (214) 768-3712 
*



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0