Bayesian Statistics

Markov Logic Network

A Markov logic network combines first-order logic with Markov random fields, attaching real-valued weights to logical formulas so that worlds satisfying more heavily weighted formulas are exponentially more probable.

P(x) = (1/Z) · exp(Σᵢ wᵢ · nᵢ(x))

Traditional logic is brittle: a single violated rule renders an entire knowledge base inconsistent. Traditional probabilistic models are flexible but lack the expressive power of first-order logic — they cannot naturally represent relational structure, quantifiers, or rules that apply to variable numbers of entities. Markov logic networks (MLNs) bridge this gap by treating first-order logical formulas as soft constraints in a Markov random field, where each formula has a weight reflecting its importance, and the probability of a possible world depends on how many formulas it satisfies and how heavily those formulas are weighted.

Formal Definition

Markov Logic Network An MLN is a set of pairs {(Fᵢ, wᵢ)} where Fᵢ is a formula in first-order logic and wᵢ ∈ ℝ is a weight.

Probability of a World P(x) = (1/Z) · exp(Σᵢ wᵢ · nᵢ(x))
where nᵢ(x) = number of true groundings of formula Fᵢ in world x
Z = Σₓ exp(Σᵢ wᵢ · nᵢ(x)) is the partition function

Given a finite set of constants (entities), each first-order formula is "grounded" by substituting all possible entity tuples for its variables. Each grounding becomes a feature in an exponential-family (log-linear) model. If a formula has weight +∞, it becomes a hard constraint — any world violating any of its groundings has probability zero, recovering standard logical entailment as a special case. Finite weights create soft constraints: violations are penalized but not forbidden.

2006

Matthew Richardson and Pedro Domingos introduce Markov logic networks in their seminal paper, unifying logical and statistical AI in a single representational framework.

2007–2009

The Alchemy system provides an open-source implementation. Singla and Domingos develop lifted inference methods that exploit the symmetry of first-order structure to avoid grounding out all clauses.

2010s

Weight learning via pseudo-likelihood, discriminative training, and structure learning algorithms mature. MLNs are applied to information extraction, social network analysis, and natural language understanding.

2016–present

Probabilistic soft logic, neural approaches to relational reasoning, and graph neural networks offer complementary and competing paradigms. MLNs remain influential as a conceptual bridge between logic and probability.

Inference

Inference in MLNs — computing marginal or conditional probabilities — is #P-hard in general, inheriting the difficulty of both logical satisfiability and probabilistic inference in Markov random fields. Practical inference relies on approximations:

MC-SAT combines ideas from MCMC sampling and satisfiability testing. It generates samples from the MLN distribution using a sequence of weighted satisfiability problems, handling deterministic (hard) and soft constraints together.

Lifted inference exploits the symmetry inherent in first-order structure. Instead of grounding every clause and reasoning about every entity individually, lifted algorithms group identical entities and perform computations on classes, achieving exponential speedups when the domain exhibits sufficient symmetry.

Variational methods and belief propagation can also be applied to the ground Markov network, though scalability remains a challenge for domains with many entities and complex formulas.

Bayesian Perspective on MLN Weights

From a Bayesian standpoint, the weights wᵢ are parameters of an exponential-family model, and one can place priors on them. A Gaussian prior on weights corresponds to L₂ regularization and penalizes extreme weights, encoding the belief that no single logical rule should dominate the distribution. Bayesian weight learning, while computationally demanding, provides uncertainty over the weights and enables model averaging. The marginal likelihood can serve as a criterion for structure learning — selecting which formulas to include — implementing a Bayesian Occam's razor that balances formula complexity against data fit.

Weight Learning

Given data (a set of ground atoms whose truth values are observed), the weights can be learned by maximizing the conditional log-likelihood or pseudo-log-likelihood. The gradient of the log-likelihood involves the difference between the observed count of true groundings and the expected count under the model — a familiar exponential-family identity. Because computing the expected counts requires inference, learning and inference are intertwined, and scalable learning often relies on pseudo-likelihood or contrastive divergence approximations.

Applications

MLNs have been applied to entity resolution (determining which database records refer to the same real-world entity), information extraction (jointly predicting entities and relations from text), social network modeling (predicting links and attributes using relational rules), and activity recognition (combining logical rules about object interactions with probabilistic observations). Their ability to compactly represent complex relational dependencies using the familiar syntax of first-order logic makes them a natural tool for knowledge-rich domains.

"A Markov logic network is just a set of weighted formulas in first-order logic. But this simple idea unifies two of the most powerful formalisms in AI — logic and graphical models — and that unification opens up a vast space of possibilities." — Pedro Domingos, The Master Algorithm (2015)

Related Topics