Fokker Challenge: Smallest one-point basis in Lean 4

Searching for smallest one-point basis and verify in Lean4

A one-point basis is a single lambda term that can generate the entire untyped lambda calculus by itself through repeated application.

The currently known smallest one-point basis for untyped lambda calculus (in terms of Fokker size) have fokker size 7.

α = λλλ2 0 (1 (λ1))
B = λλλ2 (1 0) = α(α(α(α(α α) α))(α α) α)(α(α(α α)))
C = λλλ2 0 1 = α(α α α(α α α)(α α)(α α)) α α
K = λλ1 = α α(α(α α) α α α) α α
W = λλ1 0 0 = α α(α(α(α α) α))
I = λ0 = α(α(α(α α) α))(α(α α) α)
S = λλλ (2 0) (1 0) = α(α(α α(α α(α α))(α(α(α α(α α))))))α α

The Fokker size of a closed term is the number of abstractions and applications.

def fokker_size : (Term String) -> Nat
| bvar _ => 0
| fvar _ => 0
| abs t => 1 + fokker_size t
| app t1 t2 => 1 + fokker_size t1 + fokker_size t2

Our goal

We are a collaborative project aim to

Verify α is the smallest basis OR
Find the smallest basis

We use Lean 4 to formalize that all smaller closed lambda terms can't be a basis.

Our proof is based on cslib.

What we have done

There are 5420 terms with Fokker size < 7. We have proved % are not bases by 4 simple deciders.

Decider/EveryBvarUsed.lean: If every var is used, it cannot form a basis — unable to discard variable.
Decider/OnlyOneVarUsed.lean: If only one var is used, it cannot form a basis - unable to reorder.
Decider/NoDuplicate.lean: If every variable appears at most once (zero or one time), it cannot form a basis — variables cannot be duplicated.
Decider/All0.lean: In de Bruijn index, if every number is 0, it cannot form a basis — one variable is not enough, unable to construct complicated terms.

Next steps

Deal with terms without redex first
Filter terms always terminate
Filter terms always diverge
Formalize some papers
Find more deciders: especially ability to reorder variables
Investigate patterns of undecided terms

Looking for Participation

This repository is an intentional experiment in AI-native development.

We are especially hoping for AI assistance to help prove the remaining ~2000 terms are not bases.

How AI Can Help (High Priority)

This project is quite suitable for AI-driven autonomous proof.

Enrich cslib: We are blocked by formalization of standardization theorem.
Autonomous Solve Known Trival Cases: Proof of finite terms are quite tedious. AI is quite good at this.
Automated Search Unknown Trival cases
Pattern Recognition: Analyze undecided terms to identify common structures or properties, guiding the discovery of new deciders.
Mass Analysis: Systematically analyze thousands of undecided terms in parallel.
Formalization of related papers
Benchmark Platform: Our project is an excellent platform for testing LLMs' capabilities.

Proof Status Summary

Counts for proved and unproved terms, grouped by priority and difficulty.

Category	Detail	Count
Total	All terms with Fokker size < 7	5420
Proved	Verified by the 4 deciders	2814
Unproved	Remaining cases
Generation is FinSet	Priority: medium; Difficulty: easy
Two vars are enough	Priority: high; Difficulty: medium	23
With beta-redex or eta-redex	Priority: low	1469
Others	Priority: high; Difficulty: unknown	993

Although some terms are labelled as low priority, if you proved any of them, PRs are very welcome.

Input Term

Examples:

Search History: