LEAN Conjecturer

Laith Zumot • 2025

Mathematics is Lego city

Lego bricks = tiny facts

Instruction booklets = proofs

step-by-step guides that show how to snap bricks together to build bigger things (“Pythagoras’ theorem”).

LEAN

LEAN = a super-precise robot that reads the instruction booklets and shouts “ERROR!” the instant you try to snap two bricks that don’t fit.
LEAN is a proof assistant:
- A compiler for math.
- Input: theorem and a proof
- Output:“✅ Proof accepted” or “❌ Line 7—type mismatch.”
Example toy theorem:
- theorem silly (n : ℕ) : n + 0 = n := by rfl
- LEAN checks that rfl (“reflexivity”) is a legal tactic and that the types line up.

Mathematicians

Mathematicians = creative types who invent new bricks (conjectures) and booklets(proofs).

Prover LLMS

Prover LLMS = an apprentice who watches the kids and tries to invent its own bricks and booklets.
Learning by imitation

Data Scarcity

The problem: apprentices run out of Lego pieces to practice with.
Existing formal math datasets (Mathlib) are tiny when compared to trillions of tokens for general LLMs.
Not enough humans formalizing theorems

Scaling up

How do we scale up new lego production?

The Pipeline - Generation

Generation – Ask an LLM for New Lego Bricks
- Seed - pick 40 random files from Mathlib.
- “Given the theorems in this file, create as many new theorems as possible that are similar but not identical. Do NOT write proofs; just the statements.”
Postprocess
- Strip prefixes like @[simp] or protected.
- Auto-add imports (import Mathlib, import Aesop).
- Supply necessary context (variables, open namespaces).

The Pipeline - Evaluation

Filter each line on three criteria:
- Novelty:
  - using LEAN’s exact?
  - check if Mathlib or previous runs already contain it
- Syntactic validity:
  - using LEAN parser
  - check if the statement even compiles
- Non-triviality
  - using LEAN’s aesop tactic.
  - check if it is too easy (auto-provable in <1s)
Outcome: novel, non-trivial conjectures.

The Pipeline - Iteration

Concatenate filtered conjectures into new file, use as seed for another round of generations and filtering
Do this 15 times
Outcome: 100+ novel, non-trivial conjectures per run

The Pipeline - Verification

Use DeepSeek-Prover-V2 (88.9 % success rate on MiniF2F) to attempt full proofs.
- Those that pass → verified conjectures (true theorems).
- Those that fail → still useful for training (they’re plausible puzzles).
- Total raw conjectures 12k+ –> Syntactically valid 11k –> Novel 4k –> Non-trivial (aesop fails) 3.6k

The Pipeline - Reinforcement Learning

GRPO (Group Relative Policy Optimization):
- Reward = 1 if proof found, 0 otherwise.
- Fine-tune DeepSeek-Prover-V2 on the conjectures.
- 10 epochs

Past works

STP (Dong & Ma) Self-play loop: prover ↔︎ conjecturer, each training the other. Eliminates human data bottleneck; proves 26.3 % of LeanWorkbook vs. 13.2 % baseline.
Safe (Liu et al.) Uses Lean to verify every single reasoning step in natural-language CoT, not just the final answer. Step-level formal verification reduces hallucinations; introduces 30 k “FormalStep” benchmark.
DeepSeek-Prover-V2 (Ren et al.) 7 B model + RL + Monte-Carlo tree search; SOTA on MiniF2F (61.7 %). Shows the prover side that LeanConjecturer later exploits.
PutnamBench (Tsoukalas et al.) 644 undergrad competition problems formalized in Lean. Stress-test for new provers/conjecturers.
LeanNavigator (Yin & Gao) Generates 4.7 M new theorems by state-graph exploration (breadth-first search through tactic states). Massive synthetic data; complementary to LLM-based conjecturing.
Autoformalization (Ying et al.) Natural-language Olympiad problems Cheap, huge Limited to textbook-style problems
State-graph exploration (LeanNavigator) Existing Lean proofs Scales to millions Mostly shallow variants
Self-play (STP) Self-generated Curriculum adapts Risk of mode collapse

Issues

Reliance on DeepSeek-Prover to verify conjectures; a false positive slips through occasionally.
Early iterations cluster around the topics of seed files
16 GPU-hours per GRPO epoch - can probably be sped up using VLLM rollouts

Citations & Further Reading

Images created w Gemeni
[1] “LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving” arXiv 2506.22005v1 https://arxiv.org/abs/2506.22005v1

LEAN Conjecturer

Laith Zumot • 2025

Mathematics is Lego city

Instruction booklets = proofs

LEAN

Mathematicians

Prover LLMS

Data Scarcity

Scaling up

The Pipeline - Generation

The Pipeline - Evaluation

The Pipeline - Iteration

The Pipeline - Verification

The Pipeline - Reinforcement Learning

Past works

Issues

Citations & Further Reading

[1] “LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving” arXiv 2506.22005v1 https://arxiv.org/abs/2506.22005v1

Thank You for Reading!