Formal verification of the 5th Busy Beaver value

Tristan Stérin, Maja Kądziołka
bbchallenge.org

Notes for those viewing the slides online:

a recording of this talk is available on YouTube
- in particular, some explanations rely on things being drawn live, with spoken commentary
during testing, we have noticed that having the Bitwarden extension installed makes things choppy, at least on Firefox
- if you have it installed, consider using a private tab or a different browser profile

The busy beaver function

Tibor Radó, 1895 — 1965

\(BB(n)\): maximum number of steps done by a halting 2-symbol Turing machine with \(n\) states starting from an all-0 tape.

T. Radó. On Non-computable Functions. Bell System Technical Journal, 41(3):877–884. 1962.

BB(n) = "Maximum algorithmic bang for your buck"

UNCOMPUTABLE

BB(5) winner, first 20,000 steps

Small busy beaver values:

BB(1) = 1, BB(2) = 6 [Radó, 1962]
BB(3) = 21 [Radó and Lin, 1963]
BB(4) = 107 [Brady, 1983]
BB(5) ≥ 47,176,870 [Marxen and Buntrock, 1989]
BB(5) = 47,176,870 [bbchallenge, 2024]

For each 4-state Turing machine, witness that it halts within 107 steps, or prove it never halts (which can be hard in general)

Mathematical hardness of busy beaver values

Any \(\Pi_1\) statement (i.e. \(\forall x, P(x)\) for computable \(P\)) can be converted into a Turing machine that halts iff the statement is false, by searching for counterexamples.

Every positive even integer \(n > 2\) can be written as a sum of two primes.

\(4 = 2 + 2 \quad 6 = 3 + 3 \quad 8 = 5 + 3 \quad 10 = 7 + 3 \quad 12 = 7 + 5 \quad 14 = 11 + 3 \quad \ldots\)

Enumerate all even numbers \(n\)
For each \(n\), enumerate all primes \(p < n\) and test all sums
Halt if no sum reaches the current \(n\)

There is a 25-state Turing machine that implements the above. [anonymous, 2016] Verified in Lean! [lengyijun, 2024]

Hence, knowing BB(25) is "at least as hard" as solving Goldbach's conjecture.

Other examples:

Riemann Hypothesis \(\rightarrow BB(744)\) [Matiyassevich and O'Rear, 2016]
Consistency of ZF \(\rightarrow BB(748)\) [O'Rear, 2017] [Riebel, 2024]

Georgi Georgiev's bbfind program

Georgi "Skelet" Georgiev

In 2003, Georgi Georgiev's (Skelet) publishes his bbfind program:

solves all 5-state machines except ~150 hard ones
- best result since 1989 [Marxen and Buntrock, 1989]
~6000 lines of undocumented Pascal code
- concerns about correctness
Note: failed to find a better machine

In 2020, Scott Aaronson conjectured that there is no better machine: Marxen and Buntrock's machine is the BB(5) winner.

bbchallenge.org: The Busy Beaver Challenge

Created by Tristan in 2022
Online, asynchronous, almost exclusively communicating on Discord
No "management": entropically driven research
- Even the formal proof happened through a grassroots initiative
800+ members on Discord
~50 active contributors
~15 contributors whose contributions were directly used in the Coq proof
Galaxy of ~25 GitHub repositories, many languages used: C++, Python, Rust, Go, Haskell, Pascal, Coq, Lean, Dafny

Proof outline

1. Enumerate all 5-state Turing machines

There are \(21^{10} \approx 1.67 \cdot 10^{13}\) possible 5-state Turing machines
Only \(181,385,789\) after symmetries and pruning

2. Try automated proof strategies

Deciders: programs that, given a TM, output one of

HALTS

DOESN'T HALT

UNKNOWN \(\Downarrow\)
try another decider

\[ \forall \mathrm{tm}. \mathrm{decider}(\mathrm{tm}) = \mathtt{NONHALT} \Longrightarrow \text{tm doesn't halt} \]

3. Manually inspect what's left

Holdouts: machines that the decider pipeline couldn't deal with
- Usually the most interesting ones

1. Efficient TM enumeration

Naive enumeration has many redundant machines:

states can be renamed and rearranged without change of behavior

machines can differ in unreachable parts of their code

Allen Brady, 1934 — 2024

Solution: enumerate-as-you-go, choosing transitions only when the TM actually reaches them.

This is known as Tree Normal Form enumeration [Brady, 1966]

Options for the next transition: \(\{\mathtt 0, \mathtt 1\} \times \{\mathtt L, \mathtt R\} \times \{\mathtt A, \mathtt B, \mathtt C\}\)

Rinse & repeat™

TNF enumeration is unreasonably effective

\(21^{10} \approx 1.67 \cdot 10^{13}\) possible Turing machines
Back-of-the envelope calculation: \(\frac{21^{10}}{4! \cdot 2 \cdot 2} \approx 1.74 \cdot 10^{11}\) after symmetry
Actually enumerated: \(181,385,789 \approx 1.81 \cdot 10^8\)

2. Deciding the undecidable: a quick tutorial

This is impossible! Let's do it anyway.

Simplest idea: there's gotta be some TMs that just enter a cycle

A bit more general: Translated Cyclers

However, the tape left behind isn't always clean

Translated Cyclers

This decider was first used for BB(3) [Radó and Lin, 1963]
- originally known as Lin recurrence
However, we can do better:

Consider the sequence of (current state, symbol under head) pairs the machine visits during its execution. Two repetitions in this sequence are enough to declare this machine a (translated) cycler.

Side conditions apply. Not a legally binding algorithm description. Consult a mathematician before use.

Translated Cyclers can get bigger

A complex translated cycler with a large translation pattern

10,000-step space-time diagram
Behaviour starts after about 1000 steps
Period is about 600 steps

They can get a LOT worse

Skelet #1, a ginormous Translated Cycler

Skelet #1 translated cycler

Discovered by Georgi Georgiev (a.k.a Skelet) in 2003
Behavior starts after \(10^{51}\) steps
Period of more than 8 billion steps
Required an individual Coq proof of nonhalting
One of the most complex 5-state TMs

More deciders

Outside of Cyclers, there are two strategies:

Example of a Bouncer, one such concrete class

choose a specific class of machines, tailor a decider to recognize them

generic methods that attempt to approximate the behavior in some way (abstract interpretation)

\(n\)-gram Closed Position Set

Machines decided by \(n\)-gram CPS

So, where are we at?

3. What are the holdouts like?

Skelet #10: counts in base Fibonacci on two sides of the tape, halts if the two counters are out of sync

All natural numbers can be expressed as a sum of Fibonacci numbers in exactly one way, if you forbid using numbers immediately adjacent in the Fibonacci sequence.

\[ 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144 \] \[ 17 = 1 + 3 + 13 \]

Skelet #1: very chaotic, starts cycling after \(10^{51}\) steps

Skelet #17: complex discrete math problem involving counting in Gray code [Chris Xu, 2024, arXiv:2407.02426]

The formal proof

Initially, formal verification was considered too ambitious and out of scope.

Original correctness strategy:

independent, interoperable implementations of each component
consensus-based development
modular pipeline with very loose interlinking
- in place of the decider pipeline, the TNF enumeration used a very simple criterion:
  - if it halts within 47,176,870, then it halts
  - otherwise, mark it as undecided

In 2022, Nathan Fenner wrote formally verified versions of some of the deciders in Dafny:

Finite Automata Reduction
Halting Segment (since obsoleted by other work)

In 2023, mei joins the project.

always been meaning to get into formal verification (seemed interesting)
previous experience: small stuff in Isabelle/HOL
read Adam Chlipala's Certified Programming with Dependent Types and forgot most of it
learned Coq through Software Foundations
found bbchallenge through OEIS immediately afterwards

Note: due to time constraints, the rest of the talk was given using preexisting slides outside of slipshow.

See continuation on Google Slides \(\Longrightarrow\)

←

⇱

→