Bayesian Statistics

De Finettis Theorem

De Finetti's representation theorem proves that any infinitely exchangeable sequence of random variables behaves as if it were generated by first choosing a parameter from some prior distribution and then drawing observations independently — providing the deepest justification for Bayesian modeling.

P(X₁, …, Xₙ) = ∫ ∏ᵢ₌₁ⁿ f(Xᵢ|θ) dμ(θ)

Bruno de Finetti's representation theorem, first published in 1931 and refined over the following decades, answers a question that lies at the heart of statistical reasoning: why is it legitimate to model observations as "independent and identically distributed given a parameter"? The theorem shows that this modeling choice is not an arbitrary assumption but an inevitable mathematical consequence of a far weaker and more intuitive judgment — that the order of the observations does not matter.

If you believe that the joint probability of a sequence of outcomes is unchanged by any permutation of the observations — a property called exchangeability — then there must exist a prior distribution over some latent parameter such that, conditional on that parameter, the observations are independent and identically distributed. The prior is not postulated; it is derived.

De Finetti's Representation Theorem P(X₁ = x₁, …, Xₙ = xₙ) = ∫ ∏ᵢ₌₁ⁿ f(xᵢ | θ)  dμ(θ)

Where X₁, X₂, …   →  An infinitely exchangeable sequence of random variables
θ              →  A latent parameter (or mixing variable)
f(xᵢ | θ)    →  The conditional distribution of each observation given θ
μ(θ)         →  A unique prior (mixing) measure over θ

Exchangeability: The Core Assumption

A sequence of random variables X₁, X₂, …, Xₙ is exchangeable if the joint distribution is invariant under any permutation of the indices. That is, for every permutation σ:

Exchangeability Condition P(X₁ = x₁, …, Xₙ = xₙ) = P(X₁ = x_{σ(1)}, …, Xₙ = x_{σ(n)})

Exchangeability is strictly weaker than independence. Independent and identically distributed (i.i.d.) random variables are always exchangeable, but exchangeable variables need not be independent. Consider drawing balls from an urn without replacement: the draws are exchangeable (any ordering is equally likely), but they are not independent — each draw changes the composition of the urn and thus affects the probabilities of subsequent draws.

The sequence is infinitely exchangeable if the exchangeability condition holds for every finite subsequence, no matter how long. This stronger condition is what de Finetti's theorem requires.

Exchangeability vs. Independence

Independence says: knowing the outcome of one observation tells you nothing about others. Exchangeability says: the observations carry the same kind of information, and their order is irrelevant — but they may well be correlated. De Finetti's theorem reveals that this correlation arises precisely because the observations share a common unknown parameter θ. Once θ is known, the correlation vanishes and the observations become independent.

The Theorem in Detail

De Finetti proved that if X₁, X₂, … is an infinitely exchangeable sequence of binary (0/1) random variables, then there exists a unique probability measure μ on [0,1] such that:

Binary Case P(X₁ = x₁, …, Xₙ = xₙ) = ∫₀¹ θ^k (1−θ)^(n−k) dμ(θ)

Where k = x₁ + x₂ + … + xₙ   (number of successes)
μ(θ)   →  A probability measure on [0,1]

The integrand is exactly the likelihood function for n Bernoulli trials with success probability θ. The measure μ plays the role of a prior distribution over θ. The joint probability of any specific sequence of outcomes is therefore a weighted average of i.i.d. Bernoulli likelihoods, weighted by the prior.

The generalization beyond binary variables was established by later work, notably by Hewitt and Savage (1955), extending the result to arbitrary exchangeable sequences. For real-valued exchangeable sequences, the mixing measure lives over the space of all probability distributions — a random distribution generates the data, and the prior is a distribution over distributions.

Why This Matters for Bayesian Statistics

The theorem resolves what had been a persistent philosophical objection to Bayesian methods: the apparent arbitrariness of the prior. Critics argued that Bayesians simply assume a prior and a likelihood, then mechanically apply Bayes' theorem. De Finetti showed that the prior and the i.i.d. likelihood are not assumptions at all — they are mathematical consequences of the far more modest judgment of exchangeability.

A scientist who asserts "the order in which I observe my data should not affect my conclusions" has, whether she knows it or not, committed to a model in which:

1. There exists a latent parameter governing the data-generating process.
2. Given that parameter, the observations are conditionally i.i.d.
3. Uncertainty about the parameter is described by a prior distribution.

This is precisely the Bayesian setup. De Finetti's theorem shows it is the only setup compatible with exchangeability.

The Constructive Meaning of the Prior

De Finetti was a radical subjectivist who denied the existence of objective probabilities. His famous dictum — "Probability does not exist" — meant that there is no such thing as a "true" probability θ waiting to be discovered. There are only observable sequences and the judgments we make about them.

From this perspective, the prior μ(θ) is not a description of our ignorance about a real parameter. It is a mathematical artifact that the representation theorem guarantees must exist whenever our beliefs about observables satisfy exchangeability. The parameter θ is a convenient fiction — but a fiction that is mathematically indispensable.

"Probability does not exist. It is a subjective description of a person's uncertainty, not an objective property of the world." — Bruno de Finetti, Theory of Probability (1970)

This view has profound consequences. It means the Bayesian framework does not require belief in "true parameters" or "objective chance." All that is required is coherent betting behavior — and exchangeability is a natural consequence of coherence in symmetric situations.

Formal Statement and Conditions

The theorem requires infinite exchangeability. Finite exchangeability alone is not sufficient. A sequence that is exchangeable for every n up to some fixed N does not necessarily admit a representation as a mixture of i.i.d. distributions. Diaconis and Freedman (1980) studied the gap between finite and infinite exchangeability, showing that finite exchangeable sequences can be approximated by mixtures of i.i.d. distributions, with the approximation improving as n grows relative to N.

Finite Exchangeability — Approximate Representation For an N-exchangeable sequence with n ≤ N observations:
‖P − Q_mix‖ ≤ n(n−1) / (2N)

Where Q_mix   →  Closest mixture of i.i.d. distributions
‖·‖     →  Total variation distance

This bound shows that for large populations (N much larger than n), even finite exchangeability approximately justifies the Bayesian setup. This is the practical regime in most statistical applications.

Connection to Sufficient Statistics

In the binary case, the representation theorem implies that the number of successes k = Σxᵢ is a sufficient statistic. Since the joint probability depends on the individual observations only through their sum, all sequences with the same number of ones have the same probability. This is exchangeability in action — and it connects de Finetti's theorem to the classical theory of sufficiency.

More generally, for exchangeable sequences from exponential families, the sufficient statistics emerge naturally from the representation. The prior over θ and the sufficient statistics together fully determine the posterior — a fact that makes conjugate Bayesian analysis possible.

Extensions and Generalizations

Partial Exchangeability

De Finetti himself recognized that full exchangeability is sometimes too strong. In many problems, observations fall into groups, and exchangeability holds within but not across groups. He developed the notion of partial exchangeability, which leads to hierarchical Bayesian models: observations within groups are conditionally i.i.d. given group-level parameters, and the group-level parameters are themselves exchangeable, leading to a higher-level prior.

Exchangeable Random Partitions

The Aldous–Hoover theorem extends de Finetti's result to exchangeable arrays (random matrices whose distribution is invariant under permutations of rows and columns). This generalization underpins Bayesian nonparametric models for relational data, including stochastic block models and latent feature models.

Quantum de Finetti Theorems

Analogues of de Finetti's theorem have been established in quantum mechanics, where exchangeable quantum states admit representations as mixtures of i.i.d. quantum states. These results are fundamental in quantum information theory and quantum cryptography.

Historical Context

1931

Bruno de Finetti publishes "Funzione caratteristica di un fenomeno aleatorio" in Atti della R. Accademia Nazionale dei Lincei, presenting the representation theorem for binary exchangeable sequences.

1937

De Finetti delivers his famous lecture "La prévision: ses lois logiques, ses sources subjectives" at a Paris colloquium. The written version, not widely translated until 1964, becomes a foundational text of subjective probability.

1955

Hewitt and Savage generalize the representation theorem to arbitrary exchangeable sequences of random variables, not just binary ones.

1970

De Finetti's two-volume Teoria delle Probabilità (translated into English in 1974–1975) presents the complete philosophical and mathematical framework for subjective probability built on exchangeability.

1980

Diaconis and Freedman quantify the gap between finite and infinite exchangeability, establishing error bounds for the finite approximation.

A Worked Example: Coin Flips

Suppose you observe a sequence of coin flips and believe the outcomes are exchangeable — any ordering is equally likely. You do not assume the coin is fair, or that flips are independent. You only assert that the probability of "HHTHT" is the same as "THHTH" or any other permutation with two tails and three heads.

Exchangeable Coin Flips P(HHTHT) = P(THHTH) = P(HTHHT) = …

De Finetti's Representation P(k heads in n flips) = ∫₀¹ C(n,k) · θ^k · (1−θ)^(n−k) dμ(θ)

Example: Uniform Prior μ(θ) = 1 P(k heads in n flips) = ∫₀¹ C(n,k) · θ^k · (1−θ)^(n−k) dθ = C(n,k) · B(k+1, n−k+1)
                      = 1 / (n+1)

With a uniform prior, every number of heads from 0 to n is equally likely — a consequence of Laplace's rule of succession. After observing k heads in n flips, the posterior for θ is Beta(k+1, n−k+1), and the predictive probability of heads on the next flip is (k+1)/(n+2). None of this required assuming the coin has a "true" bias. It followed entirely from exchangeability and the choice of prior.

The Deep Lesson

De Finetti's theorem tells us that the entire Bayesian apparatus — priors, likelihoods, posteriors, and parameters — is not an invention imposed on data by subjective choice. It is the unique mathematical structure implied by the judgment that observations are exchangeable. To believe that order does not matter is, inescapably, to be a Bayesian.

Example: Customer Satisfaction Surveys

A restaurant chain surveys customers at 50 locations. Each customer rates their experience as "satisfied" or "not satisfied." Across all locations, about 78% of customers report satisfaction. The manager believes that the order in which surveys arrive doesn't matter — a satisfied response from Location #12 on Monday carries the same information as one from Location #37 on Friday. In other words, the responses are exchangeable.

What De Finetti's Theorem Guarantees

The moment the manager makes this exchangeability judgment, De Finetti's theorem kicks in. The theorem says the data must behave as if there is some underlying satisfaction rate θ drawn from a prior distribution, and each customer independently reports "satisfied" with probability θ.

De Finetti Representation P(X₁ = x₁, ..., Xₙ = xₙ) = ∫₀¹ θˢ(1 − θ)ⁿ⁻ˢ dF(θ)

where s = number of satisfied responses, n = total responses, F = prior distribution over θ

After observing 390 satisfied out of 500 total responses, the manager can update her beliefs about θ — the true underlying satisfaction rate. If she started with a uniform prior (every rate equally likely), her posterior concentrates tightly around 0.78.

The Key Insight

The manager never assumed a "true parameter" existed. She only assumed exchangeability — that customer order doesn't matter. De Finetti's theorem proved that this assumption automatically implies that the data behave as if generated by a hidden parameter with a prior. The Bayesian framework isn't a philosophical choice layered on top of the data; it's the inevitable mathematical consequence of treating observations as exchangeable.

Interactive Calculator

Each row is a coin flip result (heads or tails). If you judge these exchangeable — the order doesn't matter — De Finetti's theorem guarantees the data behave as if generated by a hidden bias parameter θ with a prior distribution. Watch the Beta posterior narrow as data accumulates.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

External Links