Bayesian Statistics

Admissible Decision Rule

A decision rule is admissible if no other rule has uniformly smaller risk — no competing procedure does at least as well everywhere and strictly better somewhere — a concept deeply linked to Bayesian procedures through Wald's complete class theorem.

δ is admissible iff ∄ δ′: R(θ, δ′) ≤ R(θ, δ) ∀θ, with strict inequality for some θ

In statistical decision theory, an admissible decision rule is one that cannot be uniformly improved upon. Formally, a rule δ is admissible if there is no alternative rule δ′ such that R(θ, δ′) ≤ R(θ, δ) for all θ with strict inequality for at least one θ. If such a δ′ exists, δ is said to be inadmissible, and δ′ dominates δ.

Admissibility is the most basic requirement of a good decision rule — using an inadmissible rule means accepting unnecessary risk everywhere. Yet the concept has profound consequences for the relationship between Bayesian and frequentist statistics, because of a remarkable theorem due to Abraham Wald.

Admissibility δ is admissible iff there is no δ′ with:
R(θ, δ′) ≤ R(θ, δ)  for all θ ∈ Θ
R(θ₀, δ′) < R(θ₀, δ)  for some θ₀ ∈ Θ

Risk Function R(θ, δ)  =  E_{Y|θ}[L(θ, δ(Y))]

Wald's Complete Class Theorem

The central result connecting admissibility to Bayesian inference is Wald's complete class theorem (1950): under regularity conditions, the class of Bayes decision rules (those that minimize the integrated risk for some prior) and limits of Bayes rules forms a complete class — every admissible rule is either a Bayes rule or a limit of Bayes rules.

The converse is also informative: every Bayes rule with finite Bayes risk is admissible (provided the prior assigns positive mass to every open set). This means that if you solve a Bayesian optimization problem — minimize the expected loss averaged over a proper prior — the resulting rule automatically satisfies the frequentist criterion of admissibility.

Bayes Rule δ_Bayes(y)  =  arg min_a ∫ L(θ, a) · p(θ | y) dθ

Complete Class Theorem (Wald, 1950) Every admissible rule is Bayes or a limit of Bayes rules
Every Bayes rule with respect to a prior with full support is admissible

Famous Examples of Inadmissibility

The most celebrated example of inadmissibility is Charles Stein's 1956 result: the sample mean is inadmissible for estimating a multivariate normal mean in dimension p ≥ 3 under squared error loss. The James-Stein estimator, which shrinks the sample mean toward zero (or any fixed point), dominates it. From the Bayesian perspective, this is natural: the James-Stein estimator is approximately Bayes with respect to a hierarchical prior that pools information across coordinates.

Other notable inadmissibility results include: the MLE of the variance of a normal distribution (which is biased and dominated by a scaled version), the usual F-test in certain ANOVA settings, and minimax rules that are not Bayes. Each case illustrates the same lesson: procedures that ignore the possibility of borrowing information across parameters or strata tend to be inadmissible.

Admissibility and the Foundations of Statistics

Wald's theorem has been described as the most important result in the foundations of statistics, because it provides a frequentist justification for Bayesian methods. A committed frequentist who insists on admissibility is forced to use Bayes rules (or their limits). Conversely, a Bayesian who minimizes expected loss automatically satisfies the frequentist requirement. The theorem thus reveals that the Bayesian and frequentist approaches, far from being contradictory, are two perspectives on the same underlying structure. As Lawrence Brown put it: "The complete class theorem is the fundamental link between Bayesian and frequentist statistics."

Practical Implications

In practice, admissibility guides the choice between competing procedures. When two estimators are available and one dominates the other, the dominated one should never be used. But admissibility alone does not determine which admissible rule to prefer — there may be infinitely many admissible rules with different risk profiles. Additional criteria such as minimax risk, Bayesian risk under a specific prior, or computational simplicity are needed to choose among them.

In high-dimensional settings — genomics, imaging, large-scale A/B testing — inadmissibility of naive procedures is the rule rather than the exception. Shrinkage estimators (empirical Bayes, Bayesian hierarchical models, penalized likelihood) are the standard remedy, and their superiority is a direct consequence of the complete class theorem.

Technical Conditions

Wald's theorem requires certain regularity conditions: the parameter space should be sufficiently regular (e.g., a Borel subset of Euclidean space), the loss function should be bounded below, and the class of decision rules should include all randomized rules. When these conditions fail — for example, in infinite-dimensional parameter spaces or with improper priors — admissibility results become more delicate. Blyth (1951) and later Brown (1971) refined the conditions under which improper Bayes rules are admissible, showing that the connection to Bayesian methods extends even to the improper prior case.

"The complete class theorem tells us that the search for good decision rules is, in essence, a search over priors. This is true whether or not one is a Bayesian." — James O. Berger, Statistical Decision Theory and Bayesian Analysis (2nd ed., 1985)

Interactive Calculator

Each row has a dimension (label) and a value (observed). With ≥3 dimensions, the James-Stein estimator dominates the MLE for estimating a multivariate Normal mean. The calculator compares the MLE (raw values) with the James-Stein shrinkage estimator and shows the total squared error for each.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

External Links