Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: CHAID vs. CARTŪ


From: Ronny Kohavi
Date: Fri, 26 Sep 1997 09:58:51 -0400 (EDT)

Mike> Ronny Kohavi wrote:

>> Since the theory says that it's impossible for one algorithm to
>> uniformly beat any other on generalization accuracy in
>> classification tasks (where uniformly is for all possible target
>> concepts), the question is *when* (under what conditions) one
>> algorithm is better than another, not *whether*.
>> 

Mike> Do you have a reference to the paper or book that describes such
Mike> a theory?  Thanks.

Nicholas mentioned Schaffer's paper which I won't repeat.  The theorem
is actually trivial and has been mentioned through the years in
various paper (Tom Mitchell's version spaces and some older refs in
the late 60's).

David Wolpert makes things very formal in his paper, including
interesting repercussions for meta-level learning:

   author = "David H. Wolpert",
   title = "The Relationship between {PAC}, the statistical physics
              framework, the {Bayesian} framework, and the {VC} 
framework",
   year = 1994,
   booktitle = {The Mathematics of Generalization},
   publisher = {Addison Wesley},
   editor = {David H. Wolpert}


The proof is based on the following simple idea for Boolean labels:
  Without any assumptions, if algorithm A beats algorithm B
  on a dataset, then there exists a dataset with the opposite labels 
and
  it is just as likely, so B will beat A on that.

In real life, there are smoothness assumptions, especially on
real-valued attributes, which is why the theorem isn't too interesting
and why work in the field is still going on.  Identifying those
smoothness assumptions (or other types of assumptions) is the big
problem.  There are very few rules of thumb as to when one
should use one algorithm over another.

--

   Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk)
   Engineering Manager, Analytical Data Mining.
   Silicon Graphics, Inc.




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0