Graph Cuts in Computer Vision & AI

Graph cuts reformulate pixel labeling in computer vision as minimum cut / maximum flow problems on graphs, providing efficient globally optimal or near-optimal solutions to energy minimization problems that admit a natural Bayesian interpretation as MAP estimation in Markov random fields.

E(L) = Σᵢ Dᵢ(lᵢ) + Σ_{(i,j)∈𝒩} Vᵢⱼ(lᵢ, lⱼ)

Image segmentation, stereo matching, and denoising can all be formulated as labeling problems: assign each pixel a label (foreground/background, depth, clean intensity) to minimize an energy function that balances data fidelity against spatial smoothness. When this energy takes the form of a pairwise Markov random field, finding the minimum-energy labeling is equivalent to computing the MAP estimate of a Bayesian posterior — and for certain classes of energies, graph cuts provide an exact or provably near-optimal solution in polynomial time.

Energy Minimization Framework

MRF Energy Function E(L) = Σᵢ Dᵢ(lᵢ) + λ · Σ_{(i,j)∈𝒩} Vᵢⱼ(lᵢ, lⱼ)

Bayesian Interpretation P(L | I) ∝ exp(−E(L))
Dᵢ(lᵢ) = −log P(Iᵢ | lᵢ) (negative log-likelihood)
Vᵢⱼ(lᵢ, lⱼ) = −log P(lᵢ, lⱼ) (prior on spatial smoothness)

The data term Dᵢ(lᵢ) measures how well label lᵢ explains the observed intensity at pixel i — it is the negative log-likelihood. The smoothness term Vᵢⱼ penalizes neighboring pixels that have different labels, encoding a spatial prior that favors coherent regions. The parameter λ controls the relative weight of data versus prior, directly analogous to the balance between likelihood and prior in Bayesian inference. Minimizing E(L) is equivalent to finding the MAP labeling under the MRF posterior.

1989

Greig, Porteous, and Seheult demonstrate that MAP estimation in a binary MRF for image restoration can be solved exactly via graph cuts (min-cut/max-flow), noting the connection to combinatorial optimization.

2001

Boykov, Veksler, and Zabih introduce the α-expansion and α-β-swap algorithms, extending graph cuts to multi-label problems with provable approximation guarantees (within a factor of 2 of optimal for metric smoothness terms).

2004

Boykov and Kolmogorov develop efficient max-flow algorithms tailored to vision graphs, achieving significant speedups over generic network flow algorithms on grid-structured problems.

2004–2008

GrabCut (Rother, Kolmogorov, Blake) combines graph cuts with iterative Gaussian mixture model estimation, enabling interactive foreground extraction. Graph cuts become a standard tool in commercial image editing software.

Binary Graph Cuts

For binary labeling problems (e.g., foreground vs. background), the graph is constructed as follows: create a node for each pixel, connect neighboring pixels with edges weighted by the smoothness penalty, and add two terminal nodes (source S = foreground, sink T = background) with edges from each pixel to S and T weighted by the respective data terms. The minimum s-t cut of this graph partitions pixels into foreground and background, exactly minimizing the MRF energy. The max-flow/min-cut theorem guarantees that this can be solved in polynomial time.

Submodularity and Optimality

Graph cuts provide exact MAP solutions only when the energy function is submodular — that is, when the pairwise terms satisfy Vᵢⱼ(0,0) + Vᵢⱼ(1,1) ≤ Vᵢⱼ(0,1) + Vᵢⱼ(1,0). This condition holds for all "attractive" potentials that penalize disagreement (such as the Ising model). For non-submodular energies (repulsive interactions), the problem becomes NP-hard, and graph cuts provide only partial solutions with unlabeled pixels. The multi-label α-expansion algorithm extends the binary case by solving a sequence of binary subproblems, each of which is submodular, yielding a solution within a known factor of optimal.

Multi-Label Extensions

Many vision problems require more than two labels — depth estimation, texture segmentation, motion analysis. The α-expansion algorithm iterates over labels: at each step, every pixel can either keep its current label or switch to α, and this binary decision is solved optimally via graph cut. The algorithm cycles through labels until convergence, producing a labeling guaranteed to be within a factor of 2 of the global optimum for metric smoothness terms (Vᵢⱼ(a,b) ≤ Vᵢⱼ(a,c) + Vᵢⱼ(c,b) for all a, b, c).

Beyond MAP: Marginal Inference

Graph cuts compute MAP estimates, not marginals. For applications requiring posterior marginals or uncertainty estimates — How confident are we that this pixel is foreground? — other methods are needed. Belief propagation, variational inference, and MCMC can compute marginals on the same MRF energy. Some hybrid approaches use graph cuts within an MCMC sampler (the Swendsen-Wang cut algorithm) to make large, coordinated moves in the label space while maintaining correct sampling from the full posterior.

"Graph cuts showed the vision community that exact or near-exact MAP inference in MRFs was not a pipe dream — it was a polynomial-time computation, and it worked beautifully in practice." — Yuri Boykov, on the impact of graph cuts in computer vision

Energy Minimization Framework

Binary Graph Cuts

Multi-Label Extensions

Beyond MAP: Marginal Inference

Related Topics