Proofreader

Proofreader is an agentic LLM tool for pre-submission self-review of formal, proof-based papers. It helps authors find the holes in their own proofs before a reviewer, or a later reader, does. The name is a deliberate double entendre. The tool proofreads, in the copy-editor’s sense, by reading proofs, in the formal sense.

Overview

Peer review is the primary safeguard for correctness in proof-based research, but it is slow, scarce, and increasingly strained. Subtle flaws in formal arguments, a wrong-direction inequality substitution buried three pages deep in an induction, a constant silently assumed to equal one, a restated theorem whose preconditions have quietly drifted, sometimes survive review and propagate into the work that builds on them.

Proofreader addresses this gap by giving authors an adversarial first-pass reviewer they can run on demand. It is a Claude Code plugin that reads a paper, inventories its theorems and lemmas, audits each proof for correctness, attempts to construct counterexamples to suspect results, and then adjudicates each
finding through an independent defender-and-arbiter chain before reporting it. The intended use is for an author to run it on their own draft before submission, or on their own prior work, so that flaws are caught and fixed before anyone else encounters them.

Importantly, Proofreader scrutinizes the correctness of the formal arguments rather than prose quality, and it is not mechanized proof checking in Lean, Rocq, or Coq. This is natural-language proof scrutiny, performed by an LLM with optional Python execution for counterexample search. Proofreader screens, it does not certify. Its output is neither sound nor complete. It can flag correct results erroneously and it can miss genuine flaws. We have nonetheless found it useful in practice, including cases where independent authors confirmed the findings.

How it works

Proofreader decomposes review into separately invocable stages, mirroring the roles a careful reviewer plays:

Evaluate the paper in a first-pass read that produces quality scores and a complete inventory of theorems and lemmas, each with a per-result verdict.
Audit each flagged result in depth, listing issues by severity.
Search for counterexamples to suspect results, writing and running Python verification scripts to confirm or deny an issue.
Defend and arbitrate each finding through two independent stages. A defender mounts the strongest legitimate defense of the paper, and a separate arbiter, reading the audit and the defense as documents with fresh eyes, renders an independent verdict.

The defender, arbiter, and counterexample-hunter run as fresh-context subagents. This structural independence is deliberate. A defender that anticipated the arbiter’s rebuttal would soften its case, and an arbiter that had authored the audit could not judge it impartially. Running each role in an isolated context preserves the adversarial separation that helps alleviate hallucinations.

Provenance

Proofreader’s flaw-pattern library is empirically grounded. It was distilled from a retrospective screening pass over a corpus of real-time systems papers, in which a precursor pipeline acted as a mechanical first-pass reviewer to surface candidate flaws for human follow-up. Proofreader is the author-facing evolution of that work. Rather than screening published papers retrospectively, it is meant to be run by authors on their own drafts before a reviewer or later reader has the chance to find the same issues. Issues identified in this first screening pass are currently being reviewed by human authors, but legitimate flaws in the published literature have been identified.

Paper and code

Proofreader is described in Proofreader: Informally Auditing Formal Proofs with LLMs (Ward, 2026). Download the paper (PDF).

Proofreader is an open-source Claude Code plugin, released under the MIT license. See the github page for installation and usage.

If you find Proofreader useful either in identifying issues with your papers, please cite the following in subsequent errata.

@misc{ward2026proofreader,
  author       = {Ward, Bryan C.},
  title        = {{Proofreader: Informally Auditing Formal Proofs with 
                   LLMs}},
  year         = 2026,
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.21302109},
  url          = {https://doi.org/10.5281/zenodo.21302109}
}