Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Classification problem


From: Earl S. Harris, Jr.
Date: Tue, 30 May 2000 08:10:43 -0400
  • Organization: The MITRE Corporation




"T.S. Lim" wrote:
 >
 >  >From: "Yannis Kopanas" <ikopanas@ee.upatras.gr>
 >  >To: <datamine-l@nautilus-sys.com>
 >  >Subject: DM: Classification problem
 >  >Date: Thu, 25 May 2000 07:17:18 +0300
 >  >Reply-To: datamine-l@nautilus-sys.com
 >  >
 >  >
 >  >My problem has to do with the data set. I have two classes (the good guys
 >  >and the bad guys) unfortunatelly the bad guys are only 20 when the 
good guys
 >  >are 99980. Anybody who knows how to deal with it?
 >  >Thanks in advance.
 >  >     Yannis
 >
 > Your case is very extreme. Usually, I'd suggest playing with the prior

Extremely uneven? Yes.  Extremely uncommon? That depends on your domain.

 > probabilities and misclassification costs. How important are those 20 "bad
 > guys"?
 >

Also, if your learner doesn't allow you to set prior probabilities or
misclassification costs, you might try adding 50 copies of each bad guy
to your training sample.  I wouldn't remove good guys from your sample,
because your sample isn't insanely large (and I believe this practice
encourages over fitting).

Basically, you want to tell the learner that classifying the bad guys is
important.

Lastly, accuracy isn't an applicable metric in this domain.  By saying
everyone is a good guy, you get high accuracy, but no insight on
catching bad guys.  Consider using precision and recall as your metrics for
measuring the effectiveness of your rules. Informally, if some rule identifies
X members as bad and Y of them were actually bad, the rule's precision
is Y/X. And if your sample has Z bad guys, that same rule's recall is Y/Z.

I hope this helps.

Earl Harris Jr.

 > --
 > T.S. Lim
 > tslim@recursive-partitioning.com
 > www.Recursive-Partitioning.com
 >
 > ------------------------------------------------------------
 > Get paid to write review! http://recursive-partitioning.epinions.com




[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1999 Nautilus Systems, Inc. All Rights Reserved.
Email: firschng@nautilus-systems.com
Mail converted by MHonArc 2.2.0