Bayesian Statistics

Epidemiology & Disease Modeling

Bayesian disease transmission models estimate key parameters like the basic reproduction number R₀ from noisy, incomplete surveillance data, enabling public health authorities to forecast outbreaks and evaluate interventions in real time.

R₀ = β · c · D

Epidemiology confronts a persistent challenge: the data most needed for decision-making arrive late, incomplete, and biased. Case counts underestimate true infections. Testing capacity varies. Reporting delays distort epidemic curves. Bayesian methods address these realities by treating unknown quantities — true incidence, transmission rates, reporting probabilities — as random variables with posterior distributions that reflect both data and prior knowledge about disease dynamics.

Compartmental Models and Bayesian Estimation

The workhorse of infectious disease modeling is the compartmental model — SIR, SEIR, and their extensions — which divides a population into states (Susceptible, Exposed, Infectious, Recovered) and tracks transitions governed by rate parameters. In a Bayesian framework, these parameters are assigned prior distributions informed by previous outbreaks or biological knowledge, and the posterior is computed by fitting the model to observed case, hospitalization, or mortality data.

SIR Model Differential Equations dS/dt = −β · S · I / N
dI/dt = β · S · I / N − γ · I
dR/dt = γ · I

Basic Reproduction Number R₀ = β / γ

The basic reproduction number R₀ — the average number of secondary infections produced by one infectious individual in a fully susceptible population — is perhaps the most important quantity in outbreak response. Bayesian estimation of R₀ and its time-varying counterpart Rₜ enables authorities to track whether an epidemic is growing or shrinking, and to assess the impact of interventions such as vaccination, social distancing, or quarantine.

Real-Time Outbreak Analysis

During the COVID-19 pandemic, Bayesian models became indispensable. The Imperial College team used a Bayesian hierarchical model to estimate the impact of non-pharmaceutical interventions across European countries, pooling information across nations while allowing for country-specific effects. The Institute for Health Metrics and Evaluation (IHME) used Bayesian curve-fitting to project hospitalizations. And EpiNow2 and other R packages provided real-time Rₜ estimation using Bayesian methods with Stan as the computational backend.

Bayesian Nowcasting

Reporting delays mean that recent case counts are always incomplete. Bayesian nowcasting corrects for this by modeling the delay distribution and estimating the true number of cases that have already occurred but not yet been reported. This was critical during COVID-19, when policy decisions depended on understanding the current state of the epidemic rather than the state from two weeks ago.

Seroprevalence and Underreporting

Bayesian methods are essential for interpreting seroprevalence surveys, where imperfect sensitivity and specificity of antibody tests must be accounted for. A Bayesian model jointly estimates the true prevalence and the test performance characteristics, properly propagating uncertainty about both. The Rogan-Gladen estimator is a special case, but full Bayesian treatment allows for partial pooling across regions and demographic groups.

Phylodynamic and Genomic Epidemiology

The integration of pathogen genomic data with epidemiological models — phylodynamics — is inherently Bayesian. Tools like BEAST (Bayesian Evolutionary Analysis Sampling Trees) jointly infer phylogenetic trees, evolutionary rates, and population dynamics from molecular sequence data. During COVID-19, Nextstrain and related platforms used Bayesian phylogeographic models to trace the spread of variants across the globe.

"All models are wrong, but some are useful — and Bayesian inference tells us precisely how uncertain we should be about each model's predictions." — adapted from George E. P. Box, widely applied in epidemiological modeling

Current Frontiers

Agent-based models calibrated with Approximate Bayesian Computation (ABC) allow simulation of heterogeneous populations where compartmental assumptions break down. Bayesian optimal experimental design guides the placement of sentinel surveillance sites. And the fusion of wastewater surveillance data with clinical case data through Bayesian data assimilation promises earlier detection of emerging outbreaks.

Interactive Calculator

Each row is a tested individual with test_result (positive or negative) and true_status (positive or negative). The calculator estimates prevalence, sensitivity, and specificity using Beta-Binomial models, then adjusts the apparent prevalence for test imperfections using the Rogan-Gladen estimator with Bayesian uncertainty.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

External Links