Bayesian Statistics

Andrew Gelman

Andrew Gelman is a leading figure in modern Bayesian statistics, known for developing the Stan probabilistic programming language, advancing hierarchical models, and authoring the widely used textbook <em>Bayesian Data Analysis</em>.

Andrew Gelman is a professor of statistics and political science at Columbia University whose work has shaped the theory, practice, and culture of Bayesian statistics over three decades. His research spans hierarchical modeling, prior specification, posterior predictive checking, and the philosophy of statistical practice. As a principal developer of the Stan probabilistic programming language and author of several landmark textbooks, Gelman has influenced how an entire generation of researchers thinks about and performs Bayesian inference.

Life and Career

1965

Born in New York City. Develops early interests in mathematics and social science.

1990

Earns his Ph.D. in statistics from Harvard University under Donald Rubin, with a thesis on iterative simulation methods for Bayesian inference.

1995

Publishes the first edition of Bayesian Data Analysis (with John Carlin, Hal Stern, and Donald Rubin), which becomes the standard reference for applied Bayesian statistics.

1998

Co-develops the R-hat convergence diagnostic with Donald Rubin, providing a practical tool for assessing MCMC convergence that becomes universally adopted.

2012

Leads the development of Stan, a probabilistic programming language using Hamiltonian Monte Carlo, enabling efficient inference for complex hierarchical models.

2020

Publishes the influential paper "Bayesian Workflow" (with collaborators), articulating a complete framework for iterative model building, checking, and revision.

Hierarchical Models and Partial Pooling

Gelman's most enduring methodological contribution is his systematic development and advocacy of hierarchical (multilevel) models. In these models, parameters are not treated as entirely separate (no pooling) or identical (complete pooling), but as drawn from a common population distribution (partial pooling). This shrinkage toward the group mean borrows strength across units, producing more stable estimates especially when group-level sample sizes are small.

His textbook Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill) demonstrated how to apply these ideas across the social sciences, from education research to political polling. Gelman's applications of multilevel modeling to election forecasting, in particular, showed how Bayesian methods could produce calibrated uncertainty estimates for complex real-world problems.

The Folk Theorem of Statistical Computing

Gelman is known for the aphorism: "When you have computational problems, often there's a problem with your model." This "folk theorem" captures a deep insight: difficulties with MCMC convergence or numerical stability frequently signal that the statistical model is poorly specified, rather than that the algorithm needs better tuning. The solution is to improve the model, not to fight the sampler.

Posterior Predictive Checks

Gelman championed the use of posterior predictive checks as a practical tool for model criticism. The idea is simple but powerful: if a model is adequate, data simulated from its posterior predictive distribution should resemble the observed data. Any systematic discrepancy identifies a feature of the data that the model fails to capture. By defining test statistics sensitive to particular aspects of the data, researchers can diagnose specific model failures and guide model improvement.

Stan and Computational Innovation

The Stan probabilistic programming language, which Gelman co-developed with Bob Carpenter, Matt Hoffman, and others, represents a major advance in Bayesian computation. Stan implements the No-U-Turn Sampler (NUTS), an adaptive variant of Hamiltonian Monte Carlo that eliminates the need for hand-tuning trajectory length. This automation, combined with Stan's expressive modeling language and automatic differentiation capabilities, has made sophisticated Bayesian inference accessible to researchers across dozens of fields.

"Bayesian inference is not about having the right prior. It's about building models that are good enough to be useful, checking them against data, and improving them when they fail." — Andrew Gelman

Philosophy of Statistical Practice

Gelman has been an outspoken critic of common statistical malpractices, including p-hacking, the garden of forking paths, and the misinterpretation of statistical significance. His blog, Statistical Modeling, Causal Inference, and Social Science, has been a major forum for discussing statistical methodology and its application. He advocates for a workflow-oriented approach to statistics in which model building, checking, and revision are iterative rather than one-shot procedures.

Legacy

Gelman's influence extends beyond any single method or software tool. Through his textbooks, software, blog, and mentorship, he has shaped how applied researchers think about uncertainty, model building, and the responsible use of statistical methods. His insistence that good Bayesian practice requires good models, not just good algorithms, has become a guiding principle for the field.

Related Topics

External Links