Languages change over time in ways that parallel biological evolution: words mutate, grammatical structures are inherited, and populations of speakers diverge and occasionally merge. Bayesian methods have entered linguistics from two directions — phylogenetic methods borrowed from evolutionary biology, and probabilistic models native to computational linguistics — producing a synthesis that has transformed our understanding of language history, structure, and cognition.
Bayesian Phylogenetic Linguistics
The comparative method in historical linguistics reconstructs language family trees from cognate sets — words in different languages that descend from a common ancestor. Bayesian phylogenetic methods, adapted from molecular evolution, place this reconstruction on a rigorous statistical footing. Languages are treated as taxa, cognate presence/absence as characters, and the posterior distribution over trees represents the uncertainty in the family's evolutionary history.
Where τ = tree topology and branch lengths
μ = rates of lexical replacement (cognate gain/loss)
θ = model parameters (rate variation, borrowing)
D = cognate data matrix (languages × word meanings)
The landmark 2003 study by Gray and Atkinson used Bayesian phylogenetics to date the origin of the Indo-European language family to approximately 8,000-9,500 years ago, consistent with the Anatolian farming hypothesis. Subsequent studies have applied these methods to Austronesian, Bantu, Sino-Tibetan, and other language families, producing dated phylogenies that can be compared with archaeological and genetic evidence for human migration.
Bayesian phylolinguistic studies typically use standardized word lists (such as the Swadesh list of 100-200 basic meanings) coded for cognacy across languages. The binary cognate matrix — does Language A use a cognate of Language B's word for "water"? — provides the data that drives phylogenetic inference. Bayesian methods handle the inevitable missing data, uncertain cognacy judgments, and rate variation across meanings through hierarchical modeling and MCMC.
Dating Language Divergence
Bayesian molecular clock models, adapted for linguistic data, estimate when proto-languages were spoken and when daughter languages diverged. The relaxed clock model allows rates of lexical change to vary across branches — some lineages innovate faster than others — while calibration dates from historical records (e.g., the date of Latin texts) anchor the timescale. The posterior distribution of divergence dates provides honest uncertainty about events that occurred thousands of years before written records.
Models of Language Change
Beyond phylogenetics, Bayesian methods model the dynamics of language change at finer scales. Bayesian models of sound change estimate the rates and pathways of phonological shifts. Models of grammatical change track the rise and fall of syntactic constructions (e.g., the decline of V2 word order in English). And Bayesian sociolinguistic models relate language variation to social factors — age, class, region — through hierarchical regression, revealing the social mechanisms of change in progress.
"Languages are the pedigrees of nations. Bayesian phylogenetics lets us read those pedigrees with rigour, dating the divergences and testing the hypotheses that historical linguists have debated for centuries." — Russell Gray, Max Planck Institute for Evolutionary Anthropology
Probabilistic Models of Grammar
Bayesian approaches to grammar induction learn the structure of language from data. Bayesian probabilistic context-free grammars infer syntactic rules and their probabilities from parsed corpora, while nonparametric Bayesian models like the infinite hidden Markov model and adaptor grammars discover the number of syntactic categories and morphological patterns from the data itself, rather than requiring them to be specified in advance.
Language Universals and Typology
Bayesian phylogenetic comparative methods test hypotheses about language universals — for instance, whether verb-object order tends to co-evolve with preposition use. By modeling trait evolution on the posterior distribution of phylogenies, these methods control for the non-independence of related languages, addressing a fundamental confound in typological studies. The result is a more rigorous understanding of which linguistic patterns reflect cognitive constraints and which reflect shared ancestry.