Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: Can you trust the splits produced by classification trees?


From: Tjen-Sien Lim
Date: Sun, 09 Jan 2000 23:16:29 -0600

The answer may be NO! I'm going to discuss the case where all attributes
are categorical. Exhaustive search algorithm (described in Breiman,
Friedman, Olshen & Stone, 1984) tends to select categorical attribute with
many levels as the split variable. For a categorical attribute with c
levels, you need to evaluate up to 2^{c-1} - 1 possible splits. So, the
more levels the attribute has, the more likely the attribute is selected as
the split variable just by chance.

On the other hand, CHAID and its derivatives (Kass, 1980; Hawkins & Kass,
1982; Biggs, de Ville & Suen, 1991) tend to select categorical attribute
with few levels. The algorithm penalizes categorical attributes with many
levels too severely. The adjustment proposed by Biggs, et al. (1991) seems
to be the least conservative, however.

QUEST, CRUISE, and PLUS also tend to select categorical attribute with few
levels when all categorical attributes are "equally informative" with
respect to the dependent variable. This is an artifact of the Pearson's
chi-square test for independence in a 2-way contingency table.

Hence, users of classification tree methods should exercise caution in
interpreting the resulting tree diagram when the categorical attributes
have varying levels. The selection bias won't occur when all categorical
attributes have the same number of levels. There won't be any serious bias
when all attributes are numerical and they have roughly comparable numbers
of distinct values.

The case of mixed attributes (numerical and categorical) is more
complicated and I haven't studied it deeply. My preliminary simulation
results (not for citation yet) can be downloaded from

    http://www.recursive-partitioning.com/plus/split.pdf

Thank your for your attention. I'd welcome any discussion/comment.

--
Tjen-Sien Lim
tslim@recursive-partitioning.com
www.Recursive-Partitioning.com
______________________________________________________________________
Get paid to write a review! http://recursive-partitioning.epinions.com






[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0