Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: boltzmann machine


From: Ronny Kohavi
Date: Mon, 29 Sep 1997 21:11:56 -0400 (EDT)

Ted> I have two unsupervised learning questions for the group.

Ted> 1) There are several bias variance decompositions of
Ted> classification error proposed for supervised classification
Ted> methods. Has anyone applied such a decomposition to the
Ted> classification errors of an unsupervised learner? I think the
Ted> ideas of bias and variance still apply in the unsupervised case
Ted> although variance may have a different meaning and I think we
Ted> have to measure these quantities differently. Any thoughts or
Ted> questions on this general area would be of great interest. I can
Ted> provide more details of what I'm thinking of doing to anyone who
Ted> might be interested.

Many decompositions of error or other measures are possible, and I'm
sure you can cook some up for several unsupervised learning.  In fact,
some unsupervised criteria are nicely decomposable.  For example:
inter- and intra- distance from centroids.

The nice thing about the bias and variance decomposition for
regression (and some of the proposed decompositions for
classification) is that the two terms have a rather natural
interpretation as being "biased" from the "optimal" on average, and
"varing" around the average.  There is a usual tradeoff between the
two: when you increase the representation power of you hypothesis
space, you can sometimes reduce the bias but you're likely to increase
the variance.  This is why many "surprising" phenomena can be
explained using the decomposition.   

In unsupervised algorithms, there is no obvious measure
of bias because there is no pre-specified target function to learn
(if you don't know what the "right" answer is, how can you
know that you're biased away from it.)

Clustering is an optimization problem once you define the minimization
criteria.  Supervised learning is different: sometimes it's better to
distance yourself from the minimum error on the trainin set to improve
the generalization accuracy (e.g., decision tree pruning, weight
sharing in NN, multiple nearest-neighbors).

--

   Ronny Kohavi (ronnyk@sgi.com, http://robotics.stanford.edu/~ronnyk)
   Engineering Manager, Analytical Data Mining.
   Silicon Graphics, Inc.




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0