The Dirichlet process (DP) is the workhorse of Bayesian nonparametric mixture modeling, providing a prior over an infinite number of clusters. However, the standard DP produces a single random measure — it cannot model situations where the mixing distribution changes with a covariate, across time, or over space. The dependent Dirichlet process (DDP) family addresses this limitation by constructing collections of random measures {Gₓ : x ∈ X} that are marginally Dirichlet processes but exhibit dependence across the index set X.
Motivation and Core Idea
Consider modeling disease subtypes across different hospitals. A standard DP mixture would assume the same cluster structure everywhere. But subtype prevalence varies geographically. A DDP allows each hospital's mixture to have its own weights and possibly its own atom locations, while sharing statistical strength: hospitals in similar regions will have similar mixture distributions.
Single-weights DDP Gₓ = Σₖ₌₁^∞ wₖ · δ_{θₖ(x)} (weights shared, atoms vary)
Single-atoms DDP Gₓ = Σₖ₌₁^∞ wₖ(x) · δ_{θₖ} (atoms shared, weights vary)
In the most general formulation, both the stick-breaking weights wₖ(x) and the atom locations θₖ(x) depend on the covariate x. In practice, restricted versions are common: in the single-atoms (or common-atoms) model, the atoms are shared across all covariate values and only the weights vary, which ensures that groups share the same cluster identities but differ in prevalence.
Steven MacEachern introduces the general framework of dependent Dirichlet processes, providing the first systematic treatment of covariate-dependent nonparametric priors.
De Iorio, Müller, Rosner, and MacEachern develop the ANOVA-DDP and linear DDP models, enabling practical regression with nonparametric error distributions that vary by group.
Griffin and Steel introduce order-based dependent stick-breaking processes, providing flexible covariate dependence through transformations of a common set of random variables.
The kernel stick-breaking process (Dunson and Park, 2008) and the probit stick-breaking process (Rodriguez and Dunson, 2011) further expand the DDP toolkit, offering different trade-offs between flexibility and computational tractability.
Key Constructions
Kernel Stick-Breaking Process
The kernel stick-breaking process (KSBP) modifies the standard stick-breaking representation by making the stick lengths depend on x through kernel functions. Specifically, each Vₖ(x) = Φ(αₖ · Kₕ(x, ξₖ)) where Kₕ is a kernel centered at location ξₖ, αₖ controls the stick length, and Φ is the standard normal CDF. This produces smooth spatial variation in the mixture weights while preserving the marginal DP-like structure.
Probit Stick-Breaking Process
The probit stick-breaking process uses latent Gaussian processes: Vₖ(x) = Φ(ψₖ(x)) where each ψₖ is drawn from a GP. This yields smooth, infinitely differentiable dependence of the mixture weights on covariates, with the GP covariance function controlling the length-scale and smoothness of variation.
The hierarchical Dirichlet process (HDP) of Teh, Jordan, Beal, and Blei (2006) is a related but distinct construction. In the HDP, each group has its own DP, but all DPs share a common base measure that is itself a DP draw. This guarantees that groups share exactly the same atoms (cluster parameters), differing only in mixing weights. DDPs are more general: they can allow both atoms and weights to vary with covariates, and the dependence structure can be continuous rather than discrete. However, the HDP's shared-atom property makes it particularly natural for topic models and other settings where discrete cluster identity must be preserved across groups.
Inference
Posterior inference for DDP models is typically carried out via MCMC, using truncated stick-breaking representations (retaining K components), slice sampling, or retrospective sampling algorithms. Variational inference has also been developed for certain DDP constructions, particularly those based on the probit stick-breaking representation where the latent GPs admit variational approximations.
A key computational challenge is that the infinite-dimensional nature of the DDP interacts with the covariate space: the posterior must learn not only the number and location of clusters but also how cluster prevalence varies across the covariate domain. This typically requires more data than a standard DP mixture to achieve comparable posterior concentration.
"The Dirichlet process is a beautiful tool, but life is not exchangeable. To model the real world, we must let our nonparametric priors change with context." — Steven MacEachern, motivating the DDP framework
Applications
DDPs have been applied to spatial epidemiology (modeling disease risk surfaces), longitudinal data analysis (evolving cluster structure over time), genomics (covariate-dependent clustering of gene expression profiles), and econometrics (heterogeneous treatment effects with nonparametric error distributions). Their flexibility in capturing distributional heterogeneity while borrowing strength across related conditions makes them a natural choice whenever a "one-size-fits-all" mixture model is too restrictive.