Bayesian Statistics

Donald B. Rubin

Donald B. Rubin developed the potential outcomes framework for causal inference and invented multiple imputation, two ideas that have become foundational to modern statistics and evidence-based policy.

ATE = E[Y(1) − Y(0)]

Donald Bruce Rubin is an American statistician whose work on causal inference, missing data, and Bayesian methods has had a transformative impact across statistics, economics, epidemiology, and the social sciences. The Rubin Causal Model, which formalizes causal effects through potential outcomes, provides the conceptual foundation for virtually all modern work on treatment effects and program evaluation. His invention of multiple imputation gave practitioners a principled, widely adopted method for handling missing data. Throughout his career, Rubin has championed the Bayesian perspective as a natural framework for both causal reasoning and the treatment of incomplete information.

Life and Career

1943

Born in Washington, D.C. Studies psychology and physics as an undergraduate at Princeton University.

1970

Earns his Ph.D. from Harvard University under William Cochran, focusing on matched sampling for causal effects in observational studies.

1974

Publishes "Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies," introducing the potential outcomes framework to a broad audience.

1976

Develops the formal theory of matching and propensity scores for causal inference in observational studies.

1987

Publishes Multiple Imputation for Nonresponse in Surveys, providing the theoretical and practical foundations for a method now standard in government statistics and clinical research.

2005

Receives the Fisher Lectureship from COPSS, among many honors. Continues to develop Bayesian approaches to causal inference.

The Rubin Causal Model

The Rubin Causal Model (RCM) defines causal effects in terms of potential outcomes. For each unit, there exists a potential outcome under treatment, Y(1), and a potential outcome under control, Y(0). The individual causal effect is their difference, Y(1) − Y(0). The fundamental problem of causal inference is that we can observe at most one of these potential outcomes for any individual, making the individual effect inherently unobservable. Causal inference thus requires assumptions that allow estimation of average effects from observed data.

Rubin Causal Model — Key Quantities Individual causal effect:   τᵢ = Yᵢ(1) − Yᵢ(0)
Average treatment effect:   ATE = E[Y(1) − Y(0)]
Average treatment effect on treated:   ATT = E[Y(1) − Y(0) | T = 1]

Ignorability (Unconfoundedness) Assumption (Y(0), Y(1)) ⊥ T | X   (treatment assignment is independent of potential outcomes given covariates)

The ignorability assumption, also called unconfoundedness or selection on observables, is the key identification condition. When it holds, as in randomized experiments, the average causal effect can be estimated by comparing treated and control group means. In observational studies, techniques such as matching, propensity score weighting, and regression adjustment aim to make this assumption more plausible by adjusting for observed confounders.

Potential Outcomes vs. Structural Equations

The potential outcomes framework (Rubin) and the structural causal model (Pearl) are the two dominant frameworks for causal inference. While they can be shown to be logically equivalent under certain conditions, they emphasize different aspects: Rubin's framework focuses on the design of studies and the plausibility of assumptions needed to identify treatment effects, while Pearl's emphasizes graphical representations of causal mechanisms. The debate between these approaches has been productive, clarifying the foundations of causal reasoning in statistics.

Multiple Imputation

Missing data is ubiquitous in practice, and naive approaches such as complete-case analysis or mean imputation can produce severely biased estimates. Rubin's multiple imputation addresses this by creating several plausible completed datasets, analyzing each one separately, and combining the results using rules that properly account for both within-imputation and between-imputation variability. The method is fundamentally Bayesian: each imputed dataset represents a draw from the posterior predictive distribution of the missing values given the observed data.

Bayesian Perspective

Rubin has consistently advocated for Bayesian methods as a natural framework for statistical inference. His approach to causal inference treats the unobserved potential outcomes as missing data, unifying causal reasoning and missing data theory under a common Bayesian umbrella. This perspective has been enormously influential, shaping how researchers think about identification, estimation, and sensitivity analysis in causal studies.

"Causal inference is fundamentally a missing data problem: we wish to compare potential outcomes that cannot all be simultaneously observed." — Donald B. Rubin

Related Topics

External Links