Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: CHAID vs. CARTŪ


From: Ronny Kohavi
Date: Wed, 24 Sep 1997 20:41:21 -0400 (EDT)

Torpy> Hi all, I joined this list about a week ago, and I have to say
Torpy> that I'm pretty impressed with the level of discussion here.
Torpy> I'm a technical marketing specialist in the marketing
Torpy> department at SPSS Inc.  Lately, we've been talking about the
Torpy> differences between CHAID algorithms and CARTŪ  algorithms, and I
Torpy> thought I'd see what you people think.

Torpy> 1) What are the pros and cons of CHAID and CARTŪ ?
Torpy> 2) What is your preference and why?
Torpy> 3) Would you always use one over the other, or would you use
Torpy> one in some situations and use the other in different
Torpy> situations.

Since the theory says that it's impossible for one algorithm to
uniformly beat any other on generalization accuracy in classification
tasks (where uniformly is for all possible target concepts), the
question is *when* (under what conditions) one algorithm is better
than another, not *whether*.

Decision tree induction is too hard to analyze analytically (i.e.,
arrive at conditions for which it outperform other algorithms) so
there are no hard rules for when to apply one algorithm versus
another.

While some techniques have proven to be useful in practice (e.g.,
bagging, boosting), the differences between CHAID and CARTŪ  are
relatively small compared to other algorithms with completely
different hypotheses spaces (e.g., nearest-neighbors, bayesian
classifiers).

My common answer to the above question is as follows:

  Since in practice the customer usually has specific
  datasets, the performance on THESE datasets is what matters.
  Hence just run several algorithms and pick the one that
  has better test-set accuracy.

This isn't perfect (and obviously the no-free-lunch theorem implies 
that
doing this over too many algorithms can't always be successful),
but it seems to work well in practice.

Projects such as Statlog, and the study we did
  Kohavi, R., Sommerfield D., Dougherty J., Data Mining using MLC++, a
  Machine Learning Library in C++. Tools with AI '96.
and which is available off
  http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html
show that different decision-tree algorithms (C4.5, CARTŪ  in the above
study) do about the same on average.  More important differences come
up between different algorithms types.  As an example, in the recent
KDD-CUP, all decision-tree based products did lousy, while 
Naive-Bayes,
a relatively simple algorithm based on conditional independence
assumptions did well (two out of the top three used it):
  http://www.epsilon.com/KDDCUP/index.htm

Much depends on the dataset at hand!

Torpy> Please note that I'm not asking about specific products, only
Torpy> the algorithms.  Although, if you have any strong feelings
Torpy> about certain products, I'd be interested in hearing those as
Torpy> well.

A point that many miss is that it's not just the algorithm itself but
the presentation of results and the environment in which it's
integrated.  Visualizing the resulting model, for example, is
very important.  That's the great advantage of tools such as MineSet 
   http://www.sgi.com/Products/software/MineSet
and others (Angoss's Knowledge Seeker, IBM's Intelligent Miner, etc).

--

   Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk)
   Engineering Manager, Analytical Data Mining.



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0