Bayesian Statistics

Graphical Model

A graphical model uses a graph — directed, undirected, or mixed — to represent the conditional independence structure of a joint probability distribution, enabling efficient probabilistic reasoning, learning, and communication of complex multivariate relationships.

P(x₁, …, xₙ) = ∏ᵢ P(xᵢ | parents(xᵢ)) [Bayesian network]

High-dimensional probability distributions are unwieldy objects. A joint distribution over n binary variables requires 2ⁿ − 1 parameters in full generality — a number that grows exponentially. Graphical models tame this complexity by exploiting conditional independence: if most variables are conditionally independent of most others given a small set of neighbors, the joint distribution factorizes into a product of local functions, each involving only a few variables. The graph encodes which variables interact directly, and every conditional independence in the distribution can be read from the graph by simple graphical criteria.

Two Families of Graphical Models

Bayesian Network (Directed) P(x₁, …, xₙ) = ∏ᵢ₌₁ⁿ P(xᵢ | pa(xᵢ))

Markov Random Field (Undirected) P(x₁, …, xₙ) = (1/Z) · ∏_{c ∈ cliques} ψ_c(x_c)
where Z = Σₓ ∏_c ψ_c(x_c) is the partition function

In a Bayesian network (directed acyclic graph), each node has a conditional probability distribution given its parents. The joint distribution is the product of these local conditionals. The directed structure naturally represents causal or generative stories: a disease causes symptoms, genes influence traits, latent topics generate words.

In a Markov random field (undirected graph), edges represent symmetric interactions. The joint distribution is a product of potential functions over cliques, normalized by the partition function Z. The undirected structure naturally represents spatial correlations (pixels in an image), physical interactions (spins in a magnet), and compatibility constraints.

1920s–1930s

Sewall Wright develops path analysis, using directed graphs to represent causal relationships among variables — an early precursor to Bayesian networks.

1971

Hammersley and Clifford prove their theorem (circulated as a manuscript; published later) establishing the equivalence between Markov random fields and Gibbs distributions, laying the mathematical foundation for undirected graphical models.

1988

Judea Pearl publishes Probabilistic Reasoning in Intelligent Systems, introducing Bayesian networks and the message-passing algorithm (belief propagation) to the AI community.

2001

The junction tree algorithm, variational methods, and loopy belief propagation bring approximate inference to large-scale graphical models. Wainwright and Jordan later provide a unified variational framework.

2000s–present

Graphical models become foundational in machine learning (topic models, CRFs, VAEs), genetics (GWAS, gene regulatory networks), and causal inference (do-calculus, structural causal models).

Reading Independence from the Graph

The power of graphical models lies in their ability to make conditional independence visible. In directed models, the d-separation criterion (Pearl, 1988) determines whether two sets of variables are conditionally independent given a third set — by examining whether all paths between them are "blocked." In undirected models, the simpler graph separation criterion applies: two sets are conditionally independent given a third if the third separates them in the graph.

Graphical Models and Causality

A Bayesian network can be interpreted in two ways: as a factorization of a joint distribution (statistical), or as a structural causal model (causal). The causal interpretation, championed by Pearl and by Spirtes, Glymour, and Scheines, adds the ability to reason about interventions — what happens when we set a variable rather than observe it. The do-calculus provides rules for computing interventional distributions from observational data and the causal graph. This distinction between "seeing" and "doing" has profound implications for medicine, policy, and AI safety.

Inference Algorithms

Exact inference in graphical models involves computing marginals or conditionals of the joint distribution. For tree-structured graphs, belief propagation (sum-product algorithm) computes all marginals in linear time. For general graphs, the junction tree algorithm transforms the graph into a tree of cliques, enabling exact inference at a cost exponential in the treewidth — the size of the largest clique minus one.

When exact inference is intractable, approximate methods are used. Loopy belief propagation applies the sum-product algorithm to graphs with cycles, often producing excellent approximations despite lacking formal guarantees. Variational inference (mean-field, structured variational) optimizes a tractable approximation to the true posterior. MCMC methods (Gibbs sampling, Metropolis-Hastings) provide asymptotically exact samples from the posterior.

Learning

Learning a graphical model involves two tasks: parameter estimation (given a known structure, estimate the conditional distributions) and structure learning (infer the graph itself from data). Parameter estimation with complete data is often straightforward, using maximum likelihood or conjugate Bayesian updates. Structure learning is harder — the space of possible DAGs grows super-exponentially with the number of nodes. Score-based methods (BIC, marginal likelihood), constraint-based methods (PC algorithm, FCI), and hybrid approaches are actively researched.

"The graphical model formalism provides a marriage between probability theory and graph theory. It is a language for talking about complex distributions that is both mathematically rigorous and intuitively visual." — Michael I. Jordan, on the unifying role of graphical models

Related Topics