Nautilus Systems, Inc. logo and menu bar Site Index Home
News Books
Button Bar Menu- Choices also at bottom of page About Nautilus Services Partners Case Studies Contact Us
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Subscribe]

DM: MU CS Seminar, Minimum Message Length (Dr. David Dowe)


From: David L Dowe
Date: Tue, 14 Oct 1997 01:28:31 -0400 (EDT)
> From dong@cs.mu.oz.au Tue Oct 14 10:22:19 1997
> Date: Tue, 14 Oct 97 10:05:39 +1200
> Subject: MU CS Seminar, Minimum Message Length (Dr. David Dowe)


The University of Melbourne, Computer Science Seminar

2:15pm, Tuesday, 21 October 1997
Theatre 2, Basement, SEECS Building
221 Bouverie Street, Carlton

Minimum Message Length theory and some machine learning applications

Dr. David Dowe
Dept of Computer Science
Monash University
dld@cs.monash.edu.au

Abstract:

Many statistical, machine learning and ``data mining'' techniques for
modelling data are based on Maximum Likelihood (ML), which means that 
they
typically suffer bias in small samples and, for more difficult 
problems, they
can actually converge to the wrong answer (a problem known in 
statistics as
inconsistency).  Wallace's Minimum Message Length (MML) principle 
(1968) gives
a universal objective function which is statistically consistent 
(i.e. always
converges to the right answer) and invariant, and suffers at worst 
very little
(or no) small sample bias.

We give a brief derivation of the MML principle using Bayes's theorem.
For data D, the MML theory is the hypothesis H which does all of the 
following
equivalent things:
(i)   maximises the posterior probability of H,  Pr(H|D)
(ii)  maximises the product of the prior and the likelihood,  Pr(H) . 
Pr(D|H)
(iii) minimises -log_2(Pr(H)) - log_2(Pr(D|H))

By elementary information theory, (iii) above is the length of a 
two-part
message transmitting both the hypothesis and then the data in light 
of the
hypothesis.   The objective of MML is to find the theory, H, which 
does the
three equivalent things above.

We outline in some detail how MML is used for statistical parameter
estimation and how it uses the Fisher information to determine the 
size of
the uncertainty region for its parameter estimates.  We then show the 
fairly
pronounced empirical success of MML versus classical rivals in 
estimating
the directions and concentration of (circular) angular data.

In being universally applicable, MML offers a paradigm for comparing 
models of
differing degrees of complexity, as arises in clustering and mixture 
modelling.
We show how to derive a message length for mixture models, both for 
Normal
distributions and for circular data.  This is used in the Snob 
program :
http://www.cs.monash.edu.au/~dld/Snob.html .

We also discuss results from a probabilistic football prediction 
competition
developed by the presenter and others at Monash in 1995 which has 
recently been
receiving media coverage.  Issues of optimally combining predictors 
are not
only relevant to general prediction and forecasting, but are also 
relevant to
(e.g.) DNA compression and the optimal combination of stocks in a 
market
portfolio.

The principles behind MML and Kolmogorov-Solomonoff complexity 
(Solomonoff,
1964; Kolmogorov, 1965; Chaitin, 1966) also tell us that markets are 
in general
inefficient (although the act of finding arbitrage opportunities 
might be
computationally very intensive).

Depending upon time constraints and subject to audience preference, 
options at
this point will include some or all of :
an application of Snob to clustering of protein dihedral angles,
MML inference of decision trees and (with disjunctions) decision 
graphs,
MML inference of Probabilistic Finite State Automata (PFSAs), etc.


Web pages at <URL:http://www.cs.mu.oz.au/info/seminars/seminars.html>
Enquiries to Guozhu Dong, Seminar Coordinator
dong@cs.mu.oz.au                  Phone: +61-3-9344-9101,-9166
<URL:http://www.cs.mu.oz.au/~dong>        Fax: +61-3-9348-1184



[ Home | About Nautilus | Case Studies | Partners | Contact Nautilus ]
[ Subscribe to Lists | Recommended Books ]

logo Copyright © 1998 Nautilus Systems, Inc. All Rights Reserved.
Email: nautilus-info@nautilus-systems.com
Mail converted by MHonArc 2.2.0