Re[2]: DM: discretization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re[2]: DM: discretization

From: Troy_Haines
Date: Tue, 19 Aug 1997 04:02:30 -0400 (EDT)

     
     To my knowledge discretization strategies are usually performed 
with 
     the value of an outcome variable explicitly taken into account 
     (bivariate framework maximizing some association metric) or 
clustering 
     with (few) selected variables deemed important a priori.
     
     The real trick is to design a discretization strategy that is 
optimal 
     in a multivariate world, one that takes into consideration 
interaction 
     among several variables simultaneously.  There is no reason to 
think 
     that a discretization scheme optimized in a bivariate sense will 
be 
     optimal for multivariate models (such as a multivariate logistic 
     regression model).  Of course, if tree induction is the 
algorithm of 
     choice, a bivariate discretization strategy optimized at each 
node may 
     be appropriate. 
     
     Troy.
     
     troy_haines@mail.amsinc.com


______________________________ Reply Separator 
_________________________________
Subject: Re: DM: discretization
Author:  ronnyk@cthulhu.engr.sgi.com at AMS-Internet
Date:    8/18/97 3:18 PM


     
Bob> as decision trees are much easier to induce than generalized 
Bob> classifiers, many people automatically (and blindly) discretize 
Bob> their continuous variables prior to the induction process.
     
Bob> does anyone know of general discussions of this discretizing or 
Bob> quantizing process? how should variables that represent counts 
or 
Bob> frequencies be treated?  what about the situation where all but 
Bob> one of the cases have the same value for a variable, should it 
be 
Bob> treated as continuous?
     
There's an overview paper of discretization methods in
     
Dougherty, J., Kohavi, R. and Sahami, M., Supervised and unsupervised 
discretization of continuous features. Machine Learning 1995.
     
and another paper that compares the newer optimal error minimizer T2 
in 
     
Kohavi, R., Sahami M., Error-Based and Entropy-Based Discretization 
of 
Continuous Features. KDD-96.
     
Both are available at:
   http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html
     
--
     
   Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk)

Prev by Date: Re: DM: discretization
Next by Date: DM: Re: forecasting
Prev by thread: Re: DM: discretization
Next by thread: DM: forecasting
Index(es):
- Date
- Thread