Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

Re: DM: My introduction


From: Daniel X. Pape
Date: Tue, 29 Jul 1997 17:28:04 -0400 (EDT)
> > What I use for these categorizations are modified (and optimized!)
> > self-organizing maps (SOMs). I understand that many other people 
>are
> > using SOMs for their datamining, so I would love to see a number 
>of
> > discussions about people's experiences with them.
>
> Why do you use SOMs? Why not ordinary VQ? What properties of your
> textual and image collections do you expect the dimensions of the
> SOM grid to reveal?

Well, what I specifically do in my research group is create SOMs in
order to automatically categorize a data collection to allow the user
to _browse_ the collection. Once the SOM is created, I am using it to
create 2D and 3D interfaces to allow the user to graphically browse 
the
collections. Another way I am using them is to automatically 
categorize
a search result set for easy subsequent browsing - for example, if you
do a search on AltaVista you might get 2000 results... a SOM could
categorize the results so the user could easily pick the one or two
hundred most relevant results.

>From what I understand of vector quantization methods, there are two
reasons why I don't use them: One, for the most part, they are
_supervised learning_ methods. Since I am trying to categorize things
automatically, I have to rely on _unsupervised_ methods. Obviously the
user is not going to want to sit there and worry about training a VQ
during a search session. Two, the VQ methods are meant for statistical
classification or pattern recognition - not categorization. What I'm
trying to do with the SOMs is to cluster and visualize the collections
in a meaningful way. The VQ methods might eventually give more 
accurate
classifications, but I am looking for fast (maybe rough)
categorizations so the user can proceed with his task.

> I have tried to use the WEBSOM application at
> http://websom.hut.fi/websom/comp.ai.neural-nets/html/root.html
> to search for articles in comp.ai.neural-nets, and I found it quite
> useless. Dejanews works far better.

The results you got were different because the two tools you used
(WEBSOM and DejaNews) were designed for two different things. It
depends on how you were searching. If you were searching
comp.ai.neural-nets for a specific term or author or article, then of
course DejaNews will be better - DejaNews uses very powerful and very
fast search methods - but at their heart, they are just simple string
matching methods.

If you were trying to find a specific term or author or article with
WEBSOM, you will have a hard time finding it. But if you want to 
BROWSE
the collection of comp.ai.neural-nets to see what kind of articles are
in it, you would have an easier time with the WEBSOM. The WEBSOM
interface may be a bit confusing to the new user, so it might not be
that easy - but you could never BROWSE the c.a.nn collection with
DejaNews.


Dan

--
Daniel X. Pape
Digital Library Research Program
dpape@ncsa.uiuc.edu

  • Follow-Ups:

[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0