Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: teaching data mining


From: Joseph Albert Brady
Date: Sat, 20 Dec 1997 10:05:44 -0500 (EST)

I had written to this list asking for help. (See below).
I received many potentially helpful replies. I thank all
those who responded. 

List readers may be interested in the replies, which
are exerpted below. I have stripped away the names of the senders, 
from the private correspondance. Company names were not
stripped away in all cases.

My questions were:

I teach MIS subjects in the University of Delaware's
College of Business. We teach fundamental database
topics to our students. We'd like to move past that
stuff, to have our students do some datamining work. So 
far, I have not been able to figure out how to do that 
economically.
  
Three questions:
1. Are any of you aware of good introductory texts
        on datamining, which includes datamining software?

2. Are any of you aware of public domain datamining=20
        software? Failing that, very reasonably priced
        data mining software?

3. Are any of you aware of any industrial strength data
        mining packages that would be deeply discounted
        for educational institution use?

Exerpted replies follow:

***

Cognos Inc. has a reasonably priced Business Intelligence Suite
(Impromptu, PowerPaly, and Scenario) which might meet your needs.  I 
am
not sure if there is an educational discount, but you can contact the
sales department and ask them about the discount.  

****

If you are interested in OLAP, I would suggest that you look at
PowerPlay. If you want to set up an advanced graphical query 
environment
we offer a tool called Impromptu. If you are looking for real 
datamining
then try Scenario.

Each of these packages is very sophisticated, flexible, and easy to 
use.

Scenario: Designed for spotting patterns and exceptions in business 
data
that might otherwise be missed, Scenario's sophisticated
interface allows users to readily visualize the business information
being uncovered. It automates the discovery and ranking of
critical factors impacting a business, exposes hidden relationships
between factors and establishes thresholds and benchmarks.
An intuitive, cost-effective desktop tool, Scenario liberates data
mining from what is typically an expensive and time-consuming
process. Insights derived using Scenario are achieved directly by 
those
best positioned to use the knowledge and effect rapid
change. 
Scenario 1.0, runs on Windows 95 and Windows NT and requires an
IBM-compatible 486 PC and 8 MB of RAM. 

http://www.cognos.com/busintell/products/scenario_overview.html
Point your browser to www.cognos.com for more information about our
software.

***

A good site to check out for software (free and commerical) is

http://www.kdnuggets.com

It also has links to conferences, data and other related sites.

***


[]  You may consider WizSoft's data mining products --
(1) WizWhy for revealing rules and issuing predictions
(2) WizRule for revealing rules and discovering errors.

Both products --
1. Reads ASCII, dBase, Access and any ODBC compliant database
2. Reveals ALL if-then rules (Association rules)
3. Reveals mathematical formula rules
4. Run on Win 95 / Win NT

Both products have full working demo versions that are limited by the 
=
number of records (WizWhy - 250 records, WizRule - 1,000 records). 
These =
demo versions may be used in order to teach data mining.

The prices of the full versions are --
WizWhy - $ 3,995
WizRule - $ 1,395
However, as an educational institution you are entitled for a 90% =
discount, and so in your case the prices are:
WizWhy - $ 399
WizRule - $ 139

***

Predictive Data Mining by Shalom Weiss and another co-author (sorry
I do not have the book with me) is the best book I've  read so far.
It also comes with software (thought I have not tried it!)

Check out www.kdnuggets.com for software.

Probably most <vendors> will discount for academic use.  I've heard of
Silicon  Graphics  giving  their  software  MineSet  for  free  for
academic  use.  I also  know  that  Darwin  has  90%  discount  for
academic use (but that's 90% of $50K!)

***

Re 3. SAS costs educational institutions about 10% of the cost for
industrial organisations.

***

We have an economical solution for your entire campus.  For a 
$10,000/year
license fee we offer a campus a universal site license--

ALL campus machines, UNIX or PC
copies for ALL faculty and staff, no limit

Each PC licensee is required to purchase the documentation and disks 
from
the campus bookstore (about $60 wholesale--the bookstore will mark up)

A single point of contact for tech support on campus is required
(we will offer several trainings per year for campus tech support 
folks
here in San diego at no charge for the class).

This is a brand new program.  If you can enlist the computer center 
and the
departments which have interest (computer science, Business IT, 
statistics,
economics, medical school, any department with an applied stat 
component)
you might be able to get this one going.

A similar program offered to a large mid-western campus was financed 
by
the bookstore charging $125 for the package and returning $50 to the 
university to cover the site license.  Thjey had about 300 
registrants in
the first year.

 *---------------------------+---------------------------------*
   | FAX (619) 543 8888              |
 | Salford Systems           | VOICE (619) 543-8880            |
 | 8880 Rio San Diego Dr     |                                 |
 | Suite 1045                |                                 |
 | San Diego, CA 92108       | email:dstein@salford-systems.com|
 |                           | web  :   www.salford-systems.com|
 *---------------------------+---------------------------------*
 | Developers of CARTŪ  (tm) for Windows, DOS, MacOS, Unix       |
 |                                                             |
 | Comprehensive Statistical Consulting and Database Services  | 
 |               Database Mining Solutions                     |
 |               Discrete Choice Experiment Design  & Analysis | 
 *-------------------------------------------------------------*
 
*** 

For a data mining tool that uses rule induction, visit 
http://www.azmy.com
You can download SuperQuery and try it for 7 days.  The Office 
edition of
SuperQuery costs only $49.95!  And the Discovery Edition is $449.95.  
You
can download only one edition per PC.  SuperQuery is a new 
proffesional
comercial product that is very reasonably priced.

There is a white paper http://www.azmy.com/wp1.htm that explains rule
induction and the principles behind the Inference Engine in 
SuperQuery.

Let me know your comments after you download SuperQuery and try it.  
It
contains examples and a step-by-step tutorial.

AZMY Thinkware, Inc.
1450 Palisade Ave. #M1D
Fort Lee, NJ 07024
http://www.azmy.com
201 947 1881

***

Dear data mining and database marketing instructors,

Our company, Megaputer Intelligence, is a world leader in providing
data mining software and solutions. Megaputer is the developer of 
PolyAnalyst - one of the most popular and powerful data mining 
systems 
on the market. I would like to offer you cooperation of Megaputer 
Intelligence in educating students, as well as the broad business
community, about the new opportunities opened by the introduction of 
the automated machine learning technology in the fields of database 
marketing, risk analysis, quality control, etc. 

The objective of this offer is not to sell a huge number of copies of 
the software, but rather to become a part of the educational process. 
We are ready to discuss any form of cooperation. We have special very
low educational rates for PolyAnalyst for universities who become our
partners.

As an example, PolyAnalyst is being used at Kelley School of Business 
at 
Indiana University as the main data mining tool for a course in 
database 
marketing. Megaputer is a provider of data mining solutions for The 
Center for Education and Research in Retailing at IU, sponsored by 
Sears.

The Megaputer team could furnish its thorough expertize in data 
mining, 
carry out some sample data exploration projects, as well as provide 
you with the latest version of the next generation data mining 
solution 
- PolyAnalyst - at a special intoductory educational rate. In 
addition, 
a FREE evaluation copy of PolyAnalyst 3.2, is available for 
downloading 
from

http://www.megaputer.ru

PolyAnalyst represents a technological breakthrough in the field of 
knowledge discovery in databases. The system automatically discovers 
the 
EXPLICIT SYMBOLIC FORM OF RELATIONS hidden in data. 

As a first step of the proposed cooperation, please, visit our 
website to 
learn more about PolyAnalyst and its applications, and download the 
tutorial and the program itself. Next we could discuss what joint 
efforts 
we are ready to undertake for promoting the new leading edge 
technology.

Megaputer Intelligence, USA
http://www.megaputer.ru
812-325-3026 tel (not available 12/19/97 - 01/14/98)
812-339-1646 FAX
mailto:megaputers@aol.com or megaputer@glas.apc.org

***

One of the better books I have seen on the subject just came out.
It is called "Predictive Data Mining: A Practical Approach".  I was
written by Sholom M. Weiss & Nintin Induskhya.  ISBN 1-55860-478-2  
Morgan Kaufman publishes it.  The nice thing about it is that you 
can order it with a software option.  They have a bunch of command 
line tools for neural networks, decision trees, and associative rules 
along with some data reduction techniques.  I just bought the book 
last week with the software option.  The book is 39.95 and the codes 
to down load the software are 24.95
     
You can reach the publisher at 1-800-745-7323 or you can look at
the books 
        website at: http://www.data-miner.com

***

See Snob on my mixture modelling page.

Dept of Computer Science, Monash University, Clayton,
Victoria 3168, Australia  dld@cs.monash.edu.au     Fax:+61 3 9905-5146
http://www.cs.monash.edu.au/~dld/
http://www.cs.monash.edu.au/~dld/Snob.html
http://www.cs.monash.edu.au/~dld/mixture.modelling.page.html 

****

I know some good texts: 
1. "Computer systems that learn" by Weiss and Kulikowski
2. "Machine Learning" by Tom Mitchell
3. "Predictive Data Mining" by Weiss and Indurkhya

***

There is a new book out 
Data Mining: A Hands On Approach for Business Professionals by 
Robertgroth
Published by Prentice Hall PTR in their Data Warehouse Institute 
series.
It is written at a relatively elementary level but gives a good 
overview.
The most interesting piece is the inclusion of a CD Rom with three
commercial products in student versions:
Data Mind
Angoss Knowledge Seeker
Neural Network Predict

The ISBN Number is 013-756412-0

---
The best source of ongoing information about KDD is in ht KDD 
newlsetter.
It contains announcements of both free and low cost software,  The
following is copied from this newsletter.  If you don't subscribe, you
should.  It is archived so you can go into the files and get back 
issues.

Knowledge Discovery Nuggets (tm) is a free electronic newsletter for 
the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to gps@kdnuggets.com. 
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

***

I work for ISL, the producers and suppliers of Clementine.
I know that we offer substantial educational discounts
but I'm not sure what the situation is in the US.
You may find it worthwhile to contact our US office:

 ISL Decision Systems Inc
 630 Freedom Business Center
 Suite 314
 King of Prussia
 PA 19406 

 Contact: Frank V. Borrelli
 Tel +1 610 768 7725
 Fax +1 610 768 7774
 Email: isldsi@isl.co.uk 

***

One interesting book which I am currently reviewing for "PC AI" 
magazine:
"Predictive Data Mining", co-authored by Sholom Weiss, published by 
Morgan Kaufmann (approx. $35).  I would recommend either that book 
or "Computer Systems That Learn" by Weiss and Kulikowski- easily a 
classic in the literature, but very readable.

The cheapest useable software of which I am aware is DMSK ("Data 
Mining
Software Toolkit"), weighing in at approx. $25, which is a disk 
companion=
to "Predictive Data Mining" from Morgan Kaufmann.  The interface is
command-line driven, so it's not the most inviting software, 
especially 
for a generation that has never used DOS, but it is not that bad.   
The underlying modeling algorithms are fairly capable and the whole 
thing 
runs off of data files in text format, so it should be capable of 
handling 
very large data sets.  

DMSK provides a nice mix of data mining technologies (many commercial 
tools concentrate on one or two), including neural networks, decision 
tree-induction, rule-induction, clustering, text mining, association 
rule 
discovery and a variety of data preparation methods.  There is a Web 
site 
for the book and software, which should not be too hard to find.

Barring DMSK, I think your next stop on the price scale would be 
roughly 
at $200, where one can purchase BrainMaker from California Scientific
Software.  This is a pretty nice neural network package.  I use the 
$795
BrainMaker Professional version for professional work, so I can vouch 
for
this tool.  Some companies do offer either an academic discount or 
some
sort of site license, but you'd have to contact vendors directly for 
that
information.  I will check with some friends of mine at Unica (who 
make
PRW ["Pattern Recognition Workbench"]) to see what sort of deal they 
might 
be willing to make.

***

MLC++ can be used freely for research purposes, such as your course.
   http://www.sgi.com/Technology/mlc/
Compiled versions are available for SGI, SUN, NT.
Source is available, but it's not trivial to compile.

MineSet from Silicon Graphics is under a varsity agreement that
makes it extremely cheap for Universities ($20,000 otherwise).
It requires Silicon Graphics hardware.
See mineset.sgi.com/ under more information.

***

Our book, "Data Mining Techniques for Marketing, Sales, and Customer
Support" (John Wiley, ISBN 0-471-17980-9), covers data mining from 
both
the business perspective and various algorithms, with case studies.  
It
is being used for similar courses at Rice and UBC.

***

Thinking Machines has a 90% educational discount program.  That would 
make
our Darwin WindowsNT/95 client/UNIX server, parallel algorithms, 
neural
net, CARTŪ  and k-nearest neighbor, data mining software regularly $50k
software cost the university only $5-a real value.  

Thinking Machines Corporation           phone:          781.238.3418
16 New England Executive Park           fax:            781.238.3440
Burlington, MA  01803                   web:    http://www.think.com

***

   Our book, "Data Mining Techniques for Marketing, Sales, and 
Customer
   Support" (John Wiley, ISBN 0-471-17980-9), covers data mining from 
both
   the business perspective and various algorithms, with case 
studies.  It
   is being used for similar courses at Rice and UBC.
   
A course using our book is also being taught at the business school of
Dalhousie University. The syllabus is on the web at
http://ttg.sba.dal.ca/Courses/mba6522/ complete with homework 
assignments
and everything.



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0