Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: Web Data Mining


From: #DEVANSHU DHYANI#
Date: Mon, 30 Nov 1998 11:06:43 -0500 (EST)
I am involved in research aimed towards extending the utility of data 
mining
to semi-structured data such as WWW documents. In applying standard 
KDD
operators such as discovering association and characteristic rules,
classification, clustering to generalization based mining on an 
information
base derived from WWW documents we are faced with the following 
questions:

1. The information base itself must support storage, retrieval and 
querying
operations on semi-sturctured data. In our search for an appropriate 
data
model we have come across the XML/DOM (anticipated as the heir to 
HTML, as
markup for web docuemnts) and the somewhat similar Lore DBMS(by the 
Stanford
Database group). Are there any other suitable data models that are
appropriate for mining related tasks on semi-sturctured data?

2. Generalization based data-mining prerequisites the availability of 
domain
specific background knowledge. Although the use of concept hierarchies
fulfils this requirement by aiding attribute oriented induction and 
concept
tree ascension (in relational databases), their disadvantage lies in 
the
need to generate them manually for each domain. We are exploring the
possibility of adapting pre-existing, shared, reusable ontologies 
(such as
those in the Ontolingua system under the DARPA Knowlege Sharing 
effort) for
this purpose. Would the use of available ontologies improve the
generalization process especially because these may cover greater 
domain
knowledge (both in depth and extent of concepts) than indegineous 
concept
hierarchies?

   Thanks for your help.

Devanshu Dhyani
Undergraduate student,
Centre for Advanced Information Systems (CAIS),
Nanyang Technological University,
Singapore.



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0