Bayesian Networks
By udhayaforu
@udhayaforu (8)
India
October 12, 2006 2:15am CST
50 AI MAGAZINE
u n d e r s t a n d i n g
(Charniak and Goldman
1989a, 1989b;
Goldman 1990),
vision (Levitt, Mullin,
and Binford 1989),
heuristic search
(Hansson and Mayer
1989), and so on. It
is probably fair to
say that Bayesian
networks are to a
large segment of
the AI-uncertainty
community what
resolution theorem
proving is to the AIlogic
community.
Nevertheless, despite what seems to be their
obvious importance, the ideas and techniques
have not spread much beyond the research
community responsible for them. This is
probably because the ideas and techniques are
not that easy to understand. I hope to rectify
this situation by making Bayesian networks
more accessible to
the probabilistically
unso-
Over the last few
years, a method of
reasoning using
probabilities, variously
called belief
networks, Bayesian
networks, knowledge
maps, probabilistic
causal
networks, and so on,
has become popular
within the AI probability
and uncertainty
community. This
method is best summarized
in Judea
Pearl’s (1988) book,
but the ideas are a
product of many hands. I adopted Pearl’s
name, Bayesian networks, on the grounds
that the name is completely neutral about
the status of the networks (do they really represent
beliefs, causality, or what?). Bayesian
networks have been applied to problems in
medical diagnosis (Heckerman 1990; Spiegelhalter,
Franklin, and Bull 1989), map learning
(Dean 1990), language
Bayesian Networks
without Tears
Eugene Charniak
I give an introduction to Bayesian networks for
AI researchers with a limited grounding in probability
theory. Over the last few years, this
method of reasoning using probabilities has
become popular within the AI probability and
uncertainty community. Indeed, it is probably
fair to say that Bayesian networks are to a large
segment of the AI-uncertainty community what
resolution theorem proving is to the AI-logic
community. Nevertheless, despite what seems to
be their obvious importance, the ideas and
techniques have not spread much beyond the
research community responsible for them. This is
probably because the ideas and techniques are
not that easy to understand. I hope to rectify this
situation by making Bayesian networks more
accessible to the probabilistically unsophisticated.
0738-4602/91/$4.00 ©1991 AAAI
…making
Bayesian
networks more
accessible to
the probabilistically
unsophisticated.
phisticated. That is, this article tries to make
the basic ideas and intuitions accessible to
someone with a limited grounding in probability
theory (equivalent to what is presented
in Charniak and McDermott [1985]).
An Example Bayesian Network
The best way to understand Bayesian networks
is to imagine trying to model a situation in
which causality plays a role but where our
understanding of what is actually going on is
incomplete, so we need to describe things
probabilistically. Suppose when I go home at
night, I want to know if my family is home
before I try the doors. (Perhaps the most convenient
door to enter is double locked when
nobody is home.) Now, often when my wife
leaves the house, she turns on an outdoor
light. However, she sometimes turns on this
light if she is expecting a guest. Also, we have
a dog. When nobody is home, the dog is put
in the back yard. The same is true if the dog
has bowel troubles. Finally, if the dog is in the
backyard, I will probably hear her barking (or
what I think is her barking), but sometimes I
can be confused by other dogs barking. This
example, partially inspired by Pearl’s (1988)
earthquake example, is illustrated in figure 1.
There we find a graph not unlike many we see
in AI. We might want to use such diagrams to
predict what will happen (if my family goes
out, the dog goes out) or to infer causes from
observed effects (if the light is on and the dog
is out, then my family is probably out).
The important thing to note about this
example is that the causal connections are
not absolute. Often, my family will have left
without putting out the dog or turning on a
light. Sometimes we can use these diagrams
anyway, but in such cases, it is hard to know
what to infer when not all the evidence points
the same way. Should I assume the family is
out if the light is on, but I do not hear the
dog? What if I hear the dog, but the light is
out? Naturally, if we knew the relevant probabilities,
such as P(family-out | light-on, ¬ hearbark),
then we would be all set. However,
typically, such numbers are not available for
all possible combinations of circumstances.
Bayesian networks allow us to calculate them
from a small set of probabilities, relating only
neighboring nodes.
Bayesian networks are directed acyclic graphs
(DAGs) (like figure 1), where the nodes are
random variables, and certain independence
assumptions hold, the nature of which I discuss
later. (I assume without loss of generality
that DAG is connected.) Often, as in figure 1,
the random variables can be thought of as
states of affairs, and the variables have two
possible values, true and false. However, this
need not be the case. We could, say, have a
node denoting the intensity of an earthquake
with values no-quake, trembler, rattler, major,
and catastrophe. Indeed, the variable values
do not even need to be discrete. For example,
the value of the variable earthquake might be
a Richter scale number. (However, the algorithms
I discuss only work for discrete values,
so I stick to this case.)
In what follows, I use a sans serif font for
the names of random variables, as in earthquake.
I use the name of the variable in italics
to denote the proposition that the variable
takes on some particular value (but where we
are not concerned with which one), for example,
earthquake. For the special case of Boolean
variables (with values true and false), I use the
variable name in a sans serif font to denote
the proposition that the variable has the
value true (for example, family-out). I also
show the arrows pointing downward so that
“above” and “below” can be understood to
indicate arrow direction.
The arcs in a Bayesian network specify the
independence assumptions that must hold
between the random variables. These independence
assumptions determine what probability
information is required to specify the
probability distribution among the random
variables in the network. The reader should
note that in informally talking about DAG, I
said that the arcs denote causality, whereas in
the Bayesian network, I am saying that they
specify things about the probabilities. The
next section resolves this conflict.
To specify the probability distribution of a
Bayesian network, one must give the prior
probabilities of all root nodes (nodes with no
predecessors) and the conditional probabilities
Articles
WINTER 1991 51
light-on
family-out
dog-out
bowel-problem
hear-bark
Figure 1. A Causal Graph.
The nodes denote states of affairs, and the arcs can be interpreted as causal connections.
Articles
52 AI MAGAZINE
of all nonroot nodes given all possible combinations
of their direct predecessors. Thus,
figure 2 shows a fully specified Bayesian network
corresponding to figure 1. For example,
it states that if family members leave our
house, they will turn on the outside light 60
percent of the time, but the light will be turned
on even when they do not leave 5 percent of
the time (say, because someone is expected).
Bayesian networks allow one to calculate
the conditional probabilities of the nodes in
the network given that the values of some of
the nodes have been observed. To take the
earlier example, if I observe that the light is
on (light-on = true) but do not hear my dog
(hear-bark = false), I can calculate the conditional
probability of family-out given these
pieces of evidence. (For this case, it is .5.) I
talk of this calculation as evaluating the
Bayesian network (given the evidence). In
more realistic cases, the networks would consist
of hundreds or thousands of nodes, and
they might be evaluated many times as new
information comes in. As evidence comes in,
it is tempting to think of the probabilities of
the nodes changing, but, of course, what is
changing is the conditional probability of the
nodes given the changing evidence. Sometimes
people talk about the belief of the node
changing. This way of talking is probably
harmless provided that one keeps in mind
that here, belief is simply the conditional
probability given the evidence.
In the remainder of this article, I first
describe the independence assumptions
implicit in Bayesian networks and show how
they relate to the causal interpretation of arcs
(Independence Assumptions). I then show
that given these independence assumptions,
the numbers I specified are, in fact, all that
are needed (Consistent Probabilities). Evaluating
Networks describes how Bayesian networks
are evaluated, and the next section
describes some of their applications.
Independence Assumptions
One objection to the use of probability
theory is that the complete specification of a
probability distribution requires absurdly
many numbers. For example, if there are n
binary random variables, the complete distribution
is specified by 2n-1 joint probabilities.
(If you do not know where this 2n-1 comes
from, wait until the next section, where I
define joint probabilities.) Thus, the complete
distribution for figure 2 would require 31
values, yet we only specified 10. This savings
might not seem great, but if we doubled the
…the
complete
specification
of a
probability
distribution
requires
absurdly
many
numbers.
light-on (lo)
family-out (fo)
dog-out (do)
bowel-problem (bp)
hear-bark(hb)
Figure 2. A Bayesian Network for the family-out Problem.
I added the prior probabilities for root nodes and the posterior probabilities for nonroots given all pos
1 response