Formal verification of the 5th Busy Beaver value

Tristan Stérin, Maja Kądziołka
bbchallenge.org

Notes for those viewing the slides online:

  • a recording of this talk is available on YouTube
    • in particular, some explanations rely on things being drawn live, with spoken commentary
  • during testing, we have noticed that having the Bitwarden extension installed makes things choppy, at least on Firefox
    • if you have it installed, consider using a private tab or a different browser profile

The busy beaver function

Tibor Radó, 1895 — 1965

\(BB(n)\): maximum number of steps done by a halting 2-symbol Turing machine with \(n\) states starting from an all-0 tape.

T. Radó. On Non-computable Functions. Bell System Technical Journal, 41(3):877–884. 1962.

BB(n) = "Maximum algorithmic bang for your buck"

UNCOMPUTABLE

BB(5) winner, first 20,000 steps

Small busy beaver values:

  • BB(1) = 1, BB(2) = 6 [Radó, 1962]
  • BB(3) = 21 [Radó and Lin, 1963]
  • BB(4) = 107 [Brady, 1983]
  • BB(5) ≥ 47,176,870 [Marxen and Buntrock, 1989]
  • BB(5) = 47,176,870 [bbchallenge, 2024]

For each 4-state Turing machine, witness that it halts within 107 steps, or prove it never halts (which can be hard in general)

Mathematical hardness of busy beaver values

Any \(\Pi_1\) statement (i.e. \(\forall x, P(x)\) for computable \(P\)) can be converted into a Turing machine that halts iff the statement is false, by searching for counterexamples.

Every positive even integer \(n > 2\) can be written as a sum of two primes.

\(4 = 2 + 2 \quad 6 = 3 + 3 \quad 8 = 5 + 3 \quad 10 = 7 + 3 \quad 12 = 7 + 5 \quad 14 = 11 + 3 \quad \ldots\)

  1. Enumerate all even numbers \(n\)
  2. For each \(n\), enumerate all primes \(p < n\) and test all sums
  3. Halt if no sum reaches the current \(n\)

There is a 25-state Turing machine that implements the above. [anonymous, 2016] Verified in Lean! [lengyijun, 2024]

Hence, knowing BB(25) is "at least as hard" as solving Goldbach's conjecture.

Other examples:

  • Riemann Hypothesis \(\rightarrow BB(744)\) [Matiyassevich and O'Rear, 2016]
  • Consistency of ZF \(\rightarrow BB(748)\) [O'Rear, 2017] [Riebel, 2024]

Georgi Georgiev's bbfind program

Georgi "Skelet" Georgiev

In 2003, Georgi Georgiev's (Skelet) publishes his bbfind program:

  • solves all 5-state machines except ~150 hard ones
    • best result since 1989 [Marxen and Buntrock, 1989]
  • ~6000 lines of undocumented Pascal code
    • concerns about correctness
  • Note: failed to find a better machine

In 2020, Scott Aaronson conjectured that there is no better machine: Marxen and Buntrock's machine is the BB(5) winner.

bbchallenge.org: The Busy Beaver Challenge

  • Created by Tristan in 2022
  • Online, asynchronous, almost exclusively communicating on Discord
  • No "management": entropically driven research
    • Even the formal proof happened through a grassroots initiative
  • 800+ members on Discord
  • ~50 active contributors
  • ~15 contributors whose contributions were directly used in the Coq proof
  • Galaxy of ~25 GitHub repositories, many languages used: C++, Python, Rust, Go, Haskell, Pascal, Coq, Lean, Dafny

Proof outline

1. Enumerate all 5-state Turing machines

  • There are \(21^{10} \approx 1.67 \cdot 10^{13}\) possible 5-state Turing machines
  • Only \(181,385,789\) after symmetries and pruning

2. Try automated proof strategies

  • Deciders: programs that, given a TM, output one of

HALTS

DOESN'T HALT

UNKNOWN \(\Downarrow\)
try another decider

\[ \forall \mathrm{tm}. \mathrm{decider}(\mathrm{tm}) = \mathtt{NONHALT} \Longrightarrow \text{tm doesn't halt} \]

3. Manually inspect what's left

  • Holdouts: machines that the decider pipeline couldn't deal with
    • Usually the most interesting ones

1. Efficient TM enumeration

Naive enumeration has many redundant machines:

  • states can be renamed and rearranged without change of behavior

  • machines can differ in unreachable parts of their code

Allen Brady, 1934 — 2024

Solution: enumerate-as-you-go, choosing transitions only when the TM actually reaches them.

This is known as Tree Normal Form enumeration [Brady, 1966]

Options for the next transition: \(\{\mathtt 0, \mathtt 1\} \times \{\mathtt L, \mathtt R\} \times \{\mathtt A, \mathtt B, \mathtt C\}\)

Rinse & repeat™

TNF enumeration is unreasonably effective

  • \(21^{10} \approx 1.67 \cdot 10^{13}\) possible Turing machines
  • Back-of-the envelope calculation: \(\frac{21^{10}}{4! \cdot 2 \cdot 2} \approx 1.74 \cdot 10^{11}\) after symmetry
  • Actually enumerated: \(181,385,789 \approx 1.81 \cdot 10^8\)

2. Deciding the undecidable: a quick tutorial

This is impossible! Let's do it anyway.

Simplest idea: there's gotta be some TMs that just enter a cycle

A bit more general: Translated Cyclers

However, the tape left behind isn't always clean

Translated Cyclers

  • This decider was first used for BB(3) [Radó and Lin, 1963]
    • originally known as Lin recurrence
  • However, we can do better:

Consider the sequence of (current state, symbol under head) pairs the machine visits during its execution. Two repetitions in this sequence are enough to declare this machine a (translated) cycler.

Side conditions apply. Not a legally binding algorithm description. Consult a mathematician before use.

Translated Cyclers can get bigger

A complex translated cycler with a large translation pattern

  • 10,000-step space-time diagram
  • Behaviour starts after about 1000 steps
  • Period is about 600 steps

They can get a LOT worse

Skelet #1, a ginormous Translated Cycler

Skelet #1 translated cycler

  • Discovered by Georgi Georgiev (a.k.a Skelet) in 2003
  • Behavior starts after \(10^{51}\) steps
  • Period of more than 8 billion steps
  • Required an individual Coq proof of nonhalting
  • One of the most complex 5-state TMs

More deciders

Outside of Cyclers, there are two strategies:

Example of a Bouncer, one such concrete class

  1. choose a specific class of machines, tailor a decider to recognize them
  1. generic methods that attempt to approximate the behavior in some way (abstract interpretation)

\(n\)-gram Closed Position Set

Machines decided by \(n\)-gram CPS

So, where are we at?

3. What are the holdouts like?

Skelet #10: counts in base Fibonacci on two sides of the tape, halts if the two counters are out of sync

All natural numbers can be expressed as a sum of Fibonacci numbers in exactly one way, if you forbid using numbers immediately adjacent in the Fibonacci sequence.

\[ 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144 \] \[ 17 = 1 + 3 + 13 \]

Skelet #1: very chaotic, starts cycling after \(10^{51}\) steps

Skelet #17: complex discrete math problem involving counting in Gray code [Chris Xu, 2024, arXiv:2407.02426]

The formal proof

Initially, formal verification was considered too ambitious and out of scope.

Original correctness strategy:

  • independent, interoperable implementations of each component
  • consensus-based development
  • modular pipeline with very loose interlinking
    • in place of the decider pipeline, the TNF enumeration used a very simple criterion:
      • if it halts within 47,176,870, then it halts
      • otherwise, mark it as undecided

In 2022, Nathan Fenner wrote formally verified versions of some of the deciders in Dafny:

  • Finite Automata Reduction
  • Halting Segment (since obsoleted by other work)

In 2023, mei joins the project.

  • always been meaning to get into formal verification (seemed interesting)
  • previous experience: small stuff in Isabelle/HOL
  • read Adam Chlipala's Certified Programming with Dependent Types and forgot most of it
  • learned Coq through Software Foundations
  • found bbchallenge through OEIS immediately afterwards

Note: due to time constraints, the rest of the talk was given using preexisting slides outside of slipshow.

See continuation on Google Slides \(\Longrightarrow\)

0