Precision, typically denoted τ, is defined as the reciprocal of variance: τ = 1/σ². While variance measures the spread of a distribution, precision measures its concentration. A high-precision distribution is tightly clustered around its mean; a low-precision distribution is diffuse. This seemingly simple reparameterization has profound consequences for Bayesian computation and interpretation.
The precision parameterization is not merely cosmetic. In the conjugate analysis of normal models — the workhorse of Bayesian statistics — precision transforms multiplicative operations into additive ones, making the mechanics of Bayesian updating transparent and elegant.
Precision Matrix (Multivariate) Λ = Σ⁻¹
where Σ is the covariance matrix and Λ is the precision matrix.
Conjugate Analysis with Precision
Consider the fundamental Bayesian problem: observing data y₁, …, yₙ from a Normal(μ, τ⁻¹) distribution with known precision τ, and inferring the mean μ under a Normal(μ₀, τ₀⁻¹) prior. The posterior is:
τₙ = τ₀ + n·τ (posterior precision = prior precision + data precision)
μₙ = (τ₀·μ₀ + n·τ·ȳ) / τₙ (precision-weighted average of prior mean and sample mean)
The posterior precision is simply the sum of the prior precision and the total data precision nτ. This additive structure has a beautiful interpretation: information accumulates. Each observation contributes τ units of precision, and the prior contributes τ₀ units. The posterior mean is the precision-weighted average of the prior mean and the sample mean, with weights proportional to their respective precisions.
This additive property does not hold for variance. In variance parameterization, the update rule involves reciprocals of sums of reciprocals — the harmonic mean structure that obscures the simple information-accumulation story.
The Gamma Prior for Precision
When precision itself is unknown, the conjugate prior is the Gamma distribution. If τ ~ Gamma(α₀, β₀) and we observe n values from Normal(μ, τ⁻¹) with known mean μ, then:
αₙ = α₀ + n/2
βₙ = β₀ + ½ Σᵢ (yᵢ − μ)²
This is the natural conjugate analysis. When both μ and τ are unknown, the joint conjugate prior is the Normal-Gamma distribution, and the resulting posterior retains the same form. This entire conjugate family is most naturally expressed in precision rather than variance.
The choice between precision and variance is not merely aesthetic — it affects prior specification, computational efficiency, and interpretability. The precision parameterization is standard in graphical models and message-passing algorithms (where information from multiple sources is combined additively). The variance parameterization is more common in applied reporting and visualization. BUGS and JAGS historically required precision parameterization for normal distributions; Stan allows either. When in doubt, follow the principle: use precision for analysis, report variance for communication.
Fisher Information and Precision
The connection between statistical precision and information runs deep. The Fisher information for a single observation from Normal(μ, σ²) with respect to μ is exactly τ = 1/σ². For n independent observations, the total Fisher information is nτ. The Cramér-Rao bound states that no unbiased estimator of μ can have variance less than 1/(nτ), which is precisely the posterior variance under a flat prior.
This is no coincidence. Under regularity conditions, the Bayesian posterior concentrates around the true parameter with a precision that matches the Fisher information. The Bernstein-von Mises theorem formalizes this: for large samples, the posterior is approximately Normal with precision equal to the observed Fisher information, regardless of the prior.
Precision Matrices in Multivariate Models
In multivariate normal models, the precision matrix Λ = Σ⁻¹ plays a central role. Its entries encode conditional independence structure: Λij = 0 if and only if variables i and j are conditionally independent given all other variables. This is the foundation of Gaussian graphical models and Gaussian Markov random fields (GMRFs).
For spatial and temporal models, the precision matrix is typically sparse — most pairs of variables are conditionally independent given their neighbors — even though the covariance matrix is dense. Working with precision matrices enables efficient computation via sparse Cholesky factorization, a key technique exploited by the INLA (Integrated Nested Laplace Approximation) methodology for fast approximate Bayesian inference.
Historical Context
Carl Friedrich Gauss develops the method of least squares for astronomical data, implicitly working with precision-weighted combinations of observations.
R. A. Fisher introduces the concept of information (now Fisher information), formally connecting precision to the amount of information a sample provides about a parameter.
The BUGS software adopts precision parameterization for all normal distributions, cementing its use in applied Bayesian modeling for a generation of practitioners.
Rue, Martino, and Chopin introduce INLA, which exploits sparse precision matrices for fast approximate Bayesian inference in latent Gaussian models.
"Using the precision rather than the variance is not a mere notational convention. It expresses the deep fact that in normal models, information is additive." — José M. Bernardo and Adrian F. M. Smith, Bayesian Theory (1994)
Worked Example: Comparing Measurement Precision Across Laboratories
Three laboratories measure the concentration of a chemical standard. Lab A uses a modern instrument, Lab B uses an older model, and Lab C uses a handheld device. We want to compare their measurement precision using a Bayesian framework with a Gamma(1, 1) prior on precision.
Lab B (n=8): 12.1, 8.5, 14.2, 7.8, 11.5, 9.2, 13.0, 8.0
Lab C (n=8): 5.0, 5.1, 4.9, 5.0, 5.1, 5.0, 4.9, 5.0
Step 1: Sample Variance and Precision Lab A: x̄ = 10.15, s² = 0.057, τ = 1/s² = 17.54
Lab B: x̄ = 10.54, s² = 5.387, τ = 1/s² = 0.186
Lab C: x̄ = 5.00, s² = 0.006, τ = 1/s² = 166.67
Step 2: Posterior Precision (Gamma Update) Posterior: Gamma(α₀ + n/2, β₀ + SS/2)
Lab A: Gamma(1 + 4, 1 + 0.20) = Gamma(5, 1.20), E[τ] = 4.17
Lab B: Gamma(1 + 4, 1 + 18.85) = Gamma(5, 19.85), E[τ] = 0.25
Lab C: Gamma(1 + 4, 1 + 0.02) = Gamma(5, 1.02), E[τ] = 4.90
Lab C has the highest precision (τ ≈ 167) — its handheld device produces remarkably consistent readings even though its accuracy (measuring 5.0 rather than the true 10.0) may differ. Lab B's old instrument has very low precision (τ ≈ 0.19), producing wildly variable readings. The Bayesian analysis reveals that precision and accuracy are distinct: Lab C is the most precise but the least accurate.