A/B Testing & Experimentation — BayesianStatistics.com

A/B testing — randomizing users between a control (A) and treatment (B) to measure the causal effect of a change — is the gold standard for data-driven decisions in technology, marketing, and product development. Traditional frequentist A/B testing requires pre-specifying a sample size, avoiding peeking at results before the test concludes, and interpreting p-values that do not directly answer the business question. Bayesian A/B testing addresses all three limitations by computing the posterior distribution over the treatment effect and making decisions based on direct probability statements.

The Bayesian Framework for A/B Testing

In a Bayesian A/B test, each variant's conversion rate (or mean outcome) is assigned a prior distribution — typically a Beta distribution for conversion rates or a Normal distribution for continuous metrics. As data accumulate, the posteriors update via conjugate Bayesian inference. The key output is the posterior probability that variant B is better than variant A, computed by integrating over the joint posterior.

Beta-Binomial Model for Conversion Rates Prior: θ_A ~ Beta(α₀, β₀), θ_B ~ Beta(α₀, β₀)
Data: x_A conversions in n_A trials, x_B conversions in n_B trials
Posterior: θ_A | data ~ Beta(α₀ + x_A, β₀ + n_A − x_A)
P(θ_B > θ_A | data) = probability that B is better than A

The posterior probability P(B > A) can be computed analytically for Beta posteriors, or by Monte Carlo simulation for more complex models. Analysts can also compute the posterior distribution of the lift (relative improvement), the expected loss from choosing the wrong variant, and the probability that the effect exceeds a minimum detectable threshold.

Continuous Monitoring and Early Stopping

The most practical advantage of Bayesian A/B testing is that results can be monitored continuously without inflating error rates. In frequentist testing, repeated peeking increases the false positive rate; in Bayesian testing, the posterior is valid at every time point. Decision rules based on posterior probabilities (e.g., "stop when P(B > A) > 0.95") or expected loss (e.g., "stop when the expected loss from choosing the current leader is less than $0.01 per user") allow tests to conclude as soon as sufficient evidence accumulates.

Thompson Sampling for Multi-Armed Bandits

Thompson sampling is a Bayesian algorithm for the multi-armed bandit problem — the generalization of A/B testing to multiple simultaneous variants with adaptive allocation. At each step, a sample is drawn from each variant's posterior, and the variant with the highest sample is shown to the next user. This naturally balances exploration (learning about uncertain variants) and exploitation (favoring apparently better variants), and is provably optimal in a Bayesian sense. Companies like Google, Netflix, and Spotify use Thompson sampling for real-time personalization and experimentation.

Business Decision Theory

Bayesian A/B testing integrates naturally with business decision theory. Rather than asking "is the effect statistically significant?", the Bayesian analyst asks "what is the expected revenue impact of shipping variant B?" The expected loss from choosing the wrong variant provides a direct monetary criterion for decision-making. This shifts the focus from statistical significance to practical significance and business value.

Multi-Variate and Hierarchical Experiments

Bayesian methods extend gracefully to factorial experiments (testing multiple factors simultaneously), sequential experiments (where the next test builds on previous results), and hierarchical experiments (where effects vary across user segments or geographies). Bayesian hierarchical models share information across segments, providing stable segment-level estimates even when individual segments have little data.

"The goal of an experiment is not to achieve a p-value below 0.05 — it is to make the best possible decision. Bayesian methods align statistical analysis with this goal." — Chris Stucchio, author of Bayesian A/B testing frameworks at VWO

Current Frontiers

Bayesian causal inference methods extend A/B testing to observational settings where randomization is impractical. Bayesian methods for interference effects (when one user's treatment affects another's outcome) and for long-term outcome estimation (when the metric of interest takes months to observe) address the most challenging problems in modern experimentation. And automated Bayesian experimentation platforms are making rigorous testing accessible to organizations without dedicated statisticians.

Interactive Calculator

Each row is a variant (A or B), visitors (number of visitors), and conversions (number who converted). The calculator fits independent Beta-Binomial models to each variant with a uniform Beta(1,1) prior, computes posterior conversion rates with credible intervals, estimates P(B > A) via numerical integration, and computes expected lift.

Dataset (CSV)

Click Calculate to see results, or Animate to watch the statistics update one record at a time.