Re: DM: Data Mining in Small Databases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: Data Mining in Small Databases

From: Megan Conklin
Date: Sat, 08 Jan 2000 12:30:37 -0500

I am relatively new to data mining and kdd as well, but I believe the
emphasis on large databases is because a lot of the original algorithms
designed to find patterns in data are just too slow to run on larger data
sets. A program relying on an On^2 algorithm can be alright on a small
dataset, but not on a large one.

For example, I am doing a large paper on clustering algorithms right now.
Clustering is an old technique used to find data elements which are
"similar" to each other in some way. And while there are tons of clustering
algorithms, and some of them are really old, a lot of them are simply
impractical for use on large databases. At the same time, as disk space
becomes cheaper, and data becomes easier to get (think: Internet),
databases just keep getting bigger.

So a lot of the algorithms have to be rethought to handle larger data. In
my opinion, this is why you see so much research (especially the newer
research) is on larger data sets.

-megan conklin
Nova Southeastern University
PhD student (computer science)

At 04:46 PM 1/7/00 +0200, Bostjan Brumen wrote:

 >Hi!
 >
 >I've been doing some research on Data Mining and have come into the =
 >twilight zone: why is everybody talking only about "large" databases? =
 >What about "small" databases - don't they have anything valuable inside? =
 >Don't they hide nuggets, useful patterns?
 >
 >And, nobody (best to my knowledge) has come up with a definition of =
 >"small" and "large" - not in terms of bits and bytes, but something more =
 >persistent to the change.
 >
 >If you have an opinion about the themes I outlined in the questions =
 >please drop me a note. I will appreciate your comments.
 >
 >Best,
 >Bostjan Brumen
 >

References:
- DM: Data Mining in Small Databases
  - From: Bostjan Brumen" "Bostjan Brumen" <Brumen@pori.tut.fi>

Prev by Date: DM: Data Mining in Small Databases
Next by Date: DM: ICEIS 2000 last cfp: deadline 31-Jan-2000
Prev by thread: DM: Data Mining in Small Databases
Next by thread: DM: REQUEST FOR INFORMATION
Index(es):
- Date
- Thread