Bayesian Statistics

Bayesian Efficiency

Bayesian efficiency measures how close a statistical procedure comes to the performance of the optimal Bayes rule, providing a unified criterion for evaluating estimators, tests, and decision rules relative to the best possible Bayesian benchmark.

e(δ, π) = R(δ_Bayes, π) / R(δ, π)

Bayesian efficiency quantifies the relative performance of a decision rule δ by comparing its integrated (Bayes) risk to that of the optimal Bayes rule δBayes under a given prior π. A procedure with Bayesian efficiency close to 1 is nearly optimal; one with low efficiency is wasting information. This concept bridges Bayesian and frequentist perspectives: even a frequentist who does not accept a prior as a genuine belief can use Bayesian efficiency as a benchmark for evaluating estimators.

Bayesian Efficiency e(δ, π)  =  R(δ_Bayes, π) / R(δ, π)

Where R(δ, π)  =  ∫ R(θ, δ) · π(θ) dθ   (integrated / Bayes risk)
R(θ, δ)  =  E_y|θ[L(θ, δ(y))]   (frequentist risk at θ)
L(θ, a)   →  Loss function for action a when truth is θ

Efficiency range 0  <  e(δ, π)  ≤  1, with equality iff δ is Bayes-optimal under π

Relationship to Admissibility

Bayesian efficiency is intimately connected to the concept of admissibility. A decision rule is admissible if no other rule has uniformly smaller risk across all parameter values. Wald's complete class theorem establishes that, under regularity conditions, every admissible rule is either a Bayes rule (optimal for some prior) or a limit of Bayes rules. Consequently, any inadmissible rule has Bayesian efficiency strictly less than 1 for every prior — there is always a Bayes rule that outperforms it on average.

This result is remarkable because it gives a Bayesian justification for what appears to be a purely frequentist criterion. Admissibility, defined without reference to priors, turns out to be a fundamentally Bayesian property.

Asymptotic Efficiency

In large-sample settings, Bayesian efficiency connects to classical asymptotic efficiency. The maximum likelihood estimator (MLE) is asymptotically efficient in the frequentist sense: it achieves the Cramér-Rao lower bound. Under a smooth prior, the Bayes estimator (posterior mean under squared error loss) is also asymptotically efficient, and the two converge. Their Bayesian efficiency both approach 1 as n → ∞.

However, in finite samples, the Bayes estimator can substantially outperform the MLE, especially when the parameter space is high-dimensional or the MLE is on the boundary. The James-Stein estimator famously demonstrated that, for estimating a multivariate normal mean in dimension p ≥ 3, the MLE is inadmissible — it has Bayesian efficiency strictly less than 1 for any prior. The James-Stein estimator, which shrinks toward zero, dominates it uniformly.

The Stein Paradox and Bayesian Shrinkage

Charles Stein's 1956 result shocked the statistical world: the sample mean is inadmissible for estimating a three-or-more-dimensional normal mean under squared error loss. The James-Stein estimator, which shrinks toward an arbitrary point, uniformly dominates it. From a Bayesian perspective, this is entirely natural — shrinkage corresponds to using an informative prior, and the resulting estimator has higher Bayesian efficiency because it borrows strength across coordinates. This result helped legitimize Bayesian shrinkage methods and paved the way for empirical Bayes, ridge regression, and modern regularization techniques.

Computing Bayesian Efficiency

In practice, computing Bayesian efficiency requires:

(1) Specifying a loss function L(θ, a) — commonly squared error for estimation, 0-1 loss for testing, or a custom decision-theoretic loss.

(2) Computing the integrated risk R(δ, π) for the procedure under evaluation, which may require Monte Carlo integration over both the prior and the sampling distribution.

(3) Computing (or bounding) the Bayes risk R(δBayes, π), which is the minimum integrated risk achievable by any procedure.

For conjugate models, the Bayes risk often has a closed form. For complex models, simulation-based approaches are necessary. The efficiency can also be computed pointwise — as a function of θ — giving a profile that reveals where a procedure is most and least efficient.

Applications

Bayesian efficiency has been used to evaluate clinical trial designs (how close is an adaptive design to the theoretically optimal Bayesian design?), signal detection algorithms (how much information is lost by using a suboptimal detector?), and survey sampling strategies (how efficiently does a particular sampling scheme estimate population parameters relative to the Bayes-optimal scheme?). In machine learning, the concept appears implicitly in regret bounds for online learning algorithms, where the comparison to the best policy in hindsight parallels the comparison to the Bayes-optimal rule.

1950

Abraham Wald publishes Statistical Decision Functions, establishing the decision-theoretic framework in which Bayesian efficiency is defined.

1956

Charles Stein proves the inadmissibility of the sample mean in dimensions ≥ 3, demonstrating that MLE can have low Bayesian efficiency.

1961

James and Stein construct an explicit dominating estimator (the James-Stein estimator), making the efficiency gap concrete.

"Every admissible procedure is Bayes, and every Bayes procedure is admissible. This is the deepest connection between the Bayesian and frequentist worlds." — Abraham Wald, Statistical Decision Functions (1950)

Worked Example: Comparing Estimator MSE for a Normal Mean

We observe 20 values from a Normal distribution and compare the mean squared error (MSE) of the sample mean (MLE) versus a Bayesian shrinkage estimator with an N(0, 10) prior.

Given 20 observations: 2.3, 1.8, 3.1, 2.7, 1.5, 2.9, 3.5, 2.1, 1.9, 2.6,
3.0, 2.4, 1.7, 2.8, 3.2, 2.0, 2.5, 1.6, 3.3, 2.2

Step 1: MLE (Sample Mean) x̄ = 2.455, s² = 0.334

Step 2: Bayesian Estimator Prior: N(0, τ² = 10)
Posterior precision: n/s² + 1/τ² = 20/0.334 + 0.1 = 59.98 + 0.1 = 60.08
Posterior mean: (n·x̄/s² + 0/τ²) / 60.08 = (20·2.455/0.334) / 60.08 = 2.445

Step 3: MSE Comparison MLE MSE = s²/n = 0.334/20 = 0.0167
Bayes MSE = 1/60.08 = 0.0166
Relative efficiency = 0.0166/0.0167 = 0.994

With 20 observations and a true mean far from the prior mean of 0, the Bayesian estimator (2.445) is nearly identical to the MLE (2.455) — the data overwhelm the prior. The relative efficiency of 0.994 means the Bayes estimator has slightly lower MSE due to minimal shrinkage. With fewer observations or a true mean closer to 0, the Bayesian advantage would be more pronounced.

Interactive Calculator

Each row is a numeric value from a Normal population. The calculator compares three estimators of the mean: the sample mean (MLE), a Bayesian estimator with Normal prior (shrinkage toward 0), and a trimmed mean. It computes bias, variance, and MSE for each, demonstrating Bayesian efficiency gains.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

External Links