![]() |
|
![]() |
![]() |
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Subscribe]
Re: DM: Classification problemFrom: Earl S. Harris, Jr. Date: Tue, 30 May 2000 08:10:43 -0400
"T.S. Lim" wrote: > > >From: "Yannis Kopanas" <ikopanas@ee.upatras.gr> > >To: <datamine-l@nautilus-sys.com> > >Subject: DM: Classification problem > >Date: Thu, 25 May 2000 07:17:18 +0300 > >Reply-To: datamine-l@nautilus-sys.com > > > > > >My problem has to do with the data set. I have two classes (the good guys > >and the bad guys) unfortunatelly the bad guys are only 20 when the good guys > >are 99980. Anybody who knows how to deal with it? > >Thanks in advance. > > Yannis > > Your case is very extreme. Usually, I'd suggest playing with the prior Extremely uneven? Yes. Extremely uncommon? That depends on your domain. > probabilities and misclassification costs. How important are those 20 "bad > guys"? > Also, if your learner doesn't allow you to set prior probabilities or misclassification costs, you might try adding 50 copies of each bad guy to your training sample. I wouldn't remove good guys from your sample, because your sample isn't insanely large (and I believe this practice encourages over fitting). Basically, you want to tell the learner that classifying the bad guys is important. Lastly, accuracy isn't an applicable metric in this domain. By saying everyone is a good guy, you get high accuracy, but no insight on catching bad guys. Consider using precision and recall as your metrics for measuring the effectiveness of your rules. Informally, if some rule identifies X members as bad and Y of them were actually bad, the rule's precision is Y/X. And if your sample has Z bad guys, that same rule's recall is Y/Z. I hope this helps. Earl Harris Jr. > -- > T.S. Lim > tslim@recursive-partitioning.com > www.Recursive-Partitioning.com > > ------------------------------------------------------------ > Get paid to write review! http://recursive-partitioning.epinions.com
|
MHonArc 2.2.0