Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: RE: Genetic Algorithms


From: A.N.Pryke
Date: Tue, 12 Aug 1997 13:15:55 -0400 (EDT)

Sarab <ss.anand@ulst.ac.uk> wrote:

> A fairly comprehensive GA related web site is:
> http://www.shef.ac.uk/~gaipp/galinks.html However, these are not
> necessarily Data Mining. The only people I am aware of that are
> working in Data Mining using GAs are Prof. Vic Rayward-Smith of
> Univ. of East Anglia and Quadstone Ltd. in Edinburgh.


My work involves a flexible search engine for data mining, which can
operate as a symbolic GA.

The system discovers classification, association and cluster
rules. Unfortunately, I don't have anything online on the GA side at
the moment. Some  information on the visualisation side is
at: http://www.cs.bham.ac.uk/~anp/haiku - this is a bit out of date,
but the pictures are still pretty!


Other systems of relevance are (with apologies to non-latex speakers):

GABIL \cite{dejong:learning-concept:91} learns classification rules
 from examples with symbolic attributes.

HDBPCS (High Dimensionality Binary Pattern Classification System)
\cite{pei.ea:classification-feature:95}- a system for
 discovering classification rules and subsequent feature extraction
 from binary data.

Beagle uses a genetic algorithm on symbolic rules to generate
classification rules \cite{forsyth:inductive-learning:89} of the form:
IF $((5*pressure) > temperature)$ THEN item is in class $C$ .

COGIN (COverage-based Genetic INduction)
\cite{greene.ea:cogin-symbolic:92} is a GA-based system for the
induction of classification rules.

(SIA) \cite{venturini:sia-supervised:93} learns conjunctive
classification rules from pre-classified examples. SIA is similar to
the AQ algorithm \cite{michalski.ea:multi-purpose-incremental:86} in
that it generates new rules using uncovered examples as a seed.

SIA01 \cite{augier.ea:learning-first:95} learns First Order Logic
(FOL) rules for binary classification

A paper entitled ``Co-operation through Hierarchical Competition in
Genetic Data Mining''\cite{radcliffe.ea:co-operation-throught:94},
Radcliffe and Surry discuss a two-level hierarchical approach which
finds rule sets with good coverage of the data. The low-level GA is
used to discover individual rules. The high-level GA is then applied
to create rulesets from these.

GA-Miner \cite{flockhart.ea:genetic-algorithm-based:96} uses a genetic
algorithm to discover three types of pattern: predictive rules with
expressions on both LHS and RHS; ``distribution shift patterns'' which
indicate that a particular attribute has a different distribution in a
subset of the data; and ``correlation patterns'' which assert that two
attributes are correlated in a particular subset.

I believe Ultragem also have a GA based data mining system.


If anyone else is working in this field and knows of other relevent
systems, please email me (or the group) and tell me about them. 

Thanks, 

  Andy


References
----------




@InProceedings{augier.ea:learning-first:95,
  author =       "S. Augier and G. Venturini and Y. Kodratoff",
  title =        "Learning First Order Logic Rules with a Genetic
                 Algorithm",
  booktitle =    "Proceedings of the First International Conference on
                 Knowledge Discovery and Data Mining (KDD'95)",
  year =         "1995",
  pages =        "21--26",

}

@InProceedings{bala.ea:using-genetic:91,
  author =       "J. Bala and K. DeJong and P. Pachowicz",
  title =        "Using Genetic Algorithms to improve the performance 
of
                 classification rules produced by symbolic inductive
                 methods",
  editor =       "Z. W. Ras and M. Zemankova",
  pages =        "286--295",
  booktitle =    "Proceedings of 6th International Symposium
                 Methodologies for Intelligent Systems ISMIS'91",
  year =         "1991",
  publisher =    "Springer-Verlag, Berlin, Germany",
  address =      "Charlotte, NC",
  month =        "16-19 " # oct,
}

@Article{bala.ea:using-genetic:91a,
  key_modifier = "a",
  author =       "J. Bala and K. DeJong and P. Pachowicz",
  title =        "Using genetic algorithms to improve the performance 
of
                 classification rules produced by symbolic inductive
                 method",
  journal =      "Lecture Notes in Computer Science",
  volume =       "542",
  pages =        "286--295",
  year =         "1991",
  ISSN =         "0302-9743",
}

@InProceedings{dejong:learning-concept:91,
  author =       "W. M. Spears K. A. DeJong",
  title =        "Learning Concept Classification Rules Using Genetic
                 Algorithms",
  year =         "1991",
  booktitle =    "Proceedings of the International Joint Conference on
                 Artificial Intelligence",
  address =      "Sidney, Australia",
  pages =        "651--656",
  keywords =     "GABIL, pittsburgh approach, binary representation",
}

@InProceedings{flockhart.ea:genetic-algorithm-based:96,
  author =       "I. W. Flockhart and N. J. Radcliffe",
  title =        "A Genetic Algorithm-Based Approach to Data Mining",
  booktitle =    "The Second International Conference on Knowledge
                 Discovery and Data Mining (KDD-96)",
  editor =       "Evangelos Simoudis and Jia Wei Han and Usama 
Fayyad",
  year =         "1996",
  month =        aug # " 2-4",
  keywords =     "GA-Miner, Genetic Algorithms, Quadstone",
  address =      "Portland, Oregon, USA",
  publisher =    "AAAI",
  annote =       "KDD-96
                 
http://www.aaai.org:80/Press/Proceedings/KDD/1996/kdd-96.html",
}

@InProceedings{greene.ea:cogin-symbolic:92,
  author =       "D. P. Greene and S. F. Smith",
  title =        "{COGIN}: Symbolic Induction with Genetic 
Algorithms",
  year =         "1992",
  booktitle =    "Proc.\ of AAAI-92",
  pages =        "111--116",
  keywords =     "GA",
}

@Article{greene.ea:competition-based-induction:93,
  author =       "D. P. Greene and S. F. Smith",
  address =      "Carnegie Mellon Univ, Sch Comp Sci, Inst Robot,
                 Pittsburgh, Pa, 15213",
  title =        "Competition-based induction of decision-models from
                 examples",
  journal =      "Machine Learning",
  year =         "1993",
  volume =       "13",
  issue =        "2-3",
  pages =        "229--257",
  abstract =     "Symbolic induction is a promising approach to
                 constructing decision models by extracting 
regularities
                 from a data set of examples. The predominant type of
                 model is a classification rule (or set of rules) that
                 maps a set of relevant environmental features into
                 specific categories or values. Classifying loan risk
                 based on borrower profiles, consumer choice from
                 purchase data, or supply levels based on operating
                 conditions are all examples of this type of model-
                 building task. Although current inductive approaches,
                 such as ID3 and CN2, perform well on certain 
problems,
                 their potential is limited by the incremental nature 
of
                 their search. Genetic algorithms (GA) have shown 
great
                 promise on complex search domains, and hence suggest 
a
                 means for overcoming these limitations. However,
                 effective use of genetic search in this context
                 requires a framework that promotes the fundamental
                 model-building objectives of predictive accuracy and
                 model simplicity. In this article we describe COGIN, 
a
                 GA-based inductive system that exploits the 
conventions
                 of induction from examples to provide this framework.
                 The novelty of COGIN lies in its use of training set
                 coverage to simultaneously promote competition in
                 various classification niches within the model and
                 constrain overall model complexity. Experimental
                 comparisons with NewID and CN2 provide evidence of 
the
                 effectiveness of the COGIN framework and the 
viability
                 of the GA approach.",
  keywords =     "GENETIC ALGORITHMS, SYMBOLIC INDUCTION, CONCEPT
                 LEARNING",
}

@Article{janikow:knowledge-intensive-genetic:93,
  author =       "C. Z. Janikow",
  address =      "Umsl, Dept Math \& Comp Sci, St Louis, Mo, 63121",
  title =        "A knowledge-intensive genetic algorithm for 
supervised
                 learning",
  journal =      "Machine Learning",
  year =         "1993",
  volume =       "13",
  issue =        "2-3",
  pages =        "189--228",
  abstract =     "Supervised learning in attribute-based spaces is one
                 of the most popular machine learning problems studied
                 and, consequently, has attracted considerable 
attention
                 of the genetic algorithm community. The full-memory
                 approach developed here uses the same high-level
                 descriptive language that is used in rule-based
                 systems. This allows for an easy utilization of
                 inference rules of the well-known inductive learning
                 methodology, which replace the traditional domain-
                 independent operators and make the search
                 task-specific. Moreover, a closer relationship 
between
                 the underlying task and the processing mechanisms
                 provides a setting for an application of more 
powerful
                 task-specific heuristics. Initial results obtained 
with
                 a prototype implementation for the simplest case of
                 single concepts indicate that genetic algorithms can 
be
                 effectively used to process high-level concepts and
                 incorporate task-specific knowledge. The method of
                 abstracting the genetic algorithm to the problem 
level,
                 described here for the supervised inductive learning,
                 can be also extended to other domains and tasks, 
since
                 it provides a framework for combining recently 
popular
                 genetic algorithm methods with traditional problem-
                 solving methodologies. Moreover, in this particular
                 case, it provides a very powerful tool enabling study
                 of the widely accepted but not so well understood
                 inductive learning methodology.",
  keywords =     "GENETIC ALGORITHMS, MACHINE LEARNING, SYMBOLIC
                 LEARNING, SUPERVISED LEARNING",
}

@TechReport{pei.ea:classification-feature:95,
  author =       "Min Pei and Ying Ding and William F Punch(III) and
                 Erik D Goodman",
  title =        "Classification and Feature Extraction of
                 High-Dimensionality Binary Patterns using a {GA} to
                 Evolve Rule",
  institution =  "Michigan State University",
  year =         "1995",
  annote =       "Uses std GA to develop classifier system type 
rules",
}

@TechReport{radcliffe.ea:co-operation-throught:94,
  author =       "N. J. Radcliffe and P. D. Surry",
  title =        "Co-operation throught Hierarchical Competition in
                 Genetic Data Mining",
  institution =  "Edinburgh Parallel Computing Centre",
  type =         "Technical Report",
  number =       "EPCC-TR94-09",
  year =         "1994",
}

@InProceedings{vafaie.ea:improving-performance:91,
  author =       "H. Vafaie and K. DeJong",
  title =        "Improving the performance of a rule induction system
                 using genetic algorithms",
  editor =       "R. S. Michalski and G. Tecuci",
  pages =        "305--315",
  booktitle =    "Proceedings of the First International Workshop on
                 Multistrategy Learning MSL-91",
  year =         "1991",
  organization = "Center for Artificial Intelligence, Fairfax, VA",
  address =      "Harpers Ferry, WV",
  month =        "7-9 " # nov,
}

@InProceedings{venturini:sia-supervised:93,
  author =       "Gilles Venturini",
  title =        "{SIA}: {A} Supervised Induction Algorithm with 
Genetic
                 Search for Learning Attributes based Concepts",
  booktitle =    "European Conference on Machine Learning (ECML-93)",
  publisher =    "Springer-Verlag",
  year =         "1993",
  keywords =     "GA, Rules, Induction, Comparison",
}


@InCollection{forsyth:inductive-learning:89,
  author =       "Richard Forsyth",
  title =        "Inductive Learning for Expert Systems",
  booktitle =    "Expert Systems Principles and Case Studies",
  publisher =    "Chapman and Hall, New York",
  year =         "1989",
}


@InProceedings{michalski.ea:multi-purpose-incremental:86,
  author =       "Ryszard S. Michalski and Igor Mozetic and Jiarong 
Hong
                 and Nada Lavrac",
  title =        "The multi-purpose incremental learning system {AQ15}
                 and its testing application to three medical 
domains",
  booktitle =    "Proceedings of the 5th national conference on
                 Artificial Intelligence",
  pages =        "1041--1045",
  address =      "Philadelphia",
  year =         "1986",
}


--
   Andy Pryke, Research Student, Computer Science, Birmingham 
University
Data Mining Information - 
http://www.cs.bham.ac.uk/~anp/TheDataMine.html 


[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0