Zero-Knowledge Circuit Soundness Verification

Abstract

Zero-knowledge proof systems derive their security guarantees from two properties: completeness (an honest prover can convince the verifier of a true statement) and soundness (a dishonest prover cannot convince the verifier of a false statement). While the underlying proof systems (Groth16, PLONK, Halo2) provide these properties when instantiated correctly, the circuits that define the relation being proved frequently do not. An under-constrained circuit — one where the constraint system admits more than one satisfying witness for a given public input — allows a malicious prover to generate valid proofs for false statements. This paper formalizes the notion of circuit soundness as witness uniqueness relative to the intended computation, develops a taxonomy of under-constraint vulnerabilities observed in deployed circuits, and presents detection algorithms based on degree-of-freedom analysis, constraint graph reachability, and witness fuzzing. We pay particular attention to the computation-constraint gap inherent in Circom’s two-phase execution model, which is the root cause of a large share of circuit bugs in practice.

Introduction

A zero-knowledge proof attests that a prover knows a witness w such that some relation R(x, w) = 1 holds, where x is the public input. The proof reveals nothing about w beyond the fact that it exists. The security argument depends on two things: the cryptographic proof system itself, and the circuit that encodes R. Substantial research effort has gone into the first — Groth16’s knowledge soundness is well-understood [Groth, 2016], PLONK’s permutation argument has formal security proofs [Gabizon et al., 2019], and Halo2’s recursive composition has been studied extensively. The second receives less formal attention, despite being where most bugs occur.

Circuit bugs are subtle because the proof system does exactly what it promises. A valid SNARK proof for an under-constrained circuit is valid — the prover genuinely knows a satisfying witness. The problem is that the set of satisfying witnesses is larger than intended. When a circuit meant to prove “I know a valid Merkle path for leaf l in tree with root r” also accepts a witness encoding an invalid path, the circuit is under-constrained, and the soundness guarantee that applications rely on is broken.

The consequences are concrete. The Tornado Cash circuit vulnerability (reported via bug bounty in 2019) arose from an under-constrained Pedersen hash implementation: the prover could construct a second, distinct witness that satisfied all constraints but encoded a different nullifier, enabling double-spending. The BigMod circuit in circomlib had a missing range check that allowed witnesses where intermediate values exceeded the field modulus, producing incorrect modular arithmetic results. These are not hypothetical — they are bugs in audited, deployed circuits.

This paper provides formal definitions for what it means for a circuit to be sound, catalogs the vulnerability patterns that lead to unsoundness, and describes algorithms for detecting them.

Preliminaries

Arithmetic Circuits and R1CS

An arithmetic circuit over a finite field F_p consists of addition and multiplication gates connected by wires carrying field elements. The circuit takes public inputs x = (x_1, …, x_l) and private inputs (the witness) w = (w_1, …, w_m), and the combined signal vector is s = (1, x_1, …, x_l, w_1, …, w_m) of length n = 1 + l + m.

A Rank-1 Constraint System (R1CS) is a system of constraints of the form:

(a_i · s) * (b_i · s) = (c_i · s) for i = 1, …, k

where a_i, b_i, c_i are vectors in F_p^n and · denotes the inner product. Each constraint enforces that the product of two linear combinations of signals equals a third linear combination. The R1CS is satisfiable for public input x if there exists a witness w such that all k constraints hold simultaneously.

Equivalently, define matrices A, B, C in F_p^{k x n} where the i-th row of A is a_i (and likewise for B, C). Then the R1CS is satisfiable when:

(A · s) ∘ (B · s) = C · s

where ∘ denotes the Hadamard (element-wise) product.

The Prover-Verifier Model

In a zk-SNARK for an R1CS instance, the prover holds the full signal vector s and produces a proof π. The verifier holds only the public inputs x and the verification key (derived from the constraint matrices during a trusted setup or transparent setup phase). The verifier checks π against x and accepts or rejects.

Completeness: If the prover is honest and s satisfies the R1CS, the verifier accepts.

Soundness: If no valid s exists (i.e., no witness w makes the R1CS satisfiable for the given x), then no polynomial-time prover can produce a proof that the verifier accepts, except with negligible probability.

The critical subtlety: soundness is a property of the proof system with respect to the constraint system. It says that the prover cannot cheat the constraints. It says nothing about whether the constraints correctly capture the intended relation.

Witness Generation

In Circom, a circuit is written as a template that combines computation (JavaScript-like imperative code that calculates signal values) with constraints (quadratic equations over signals). The witness generator is a separate program compiled from the computation portion. It takes public and private inputs, executes the computation, and produces the full signal vector s. The constraints are compiled into an R1CS that the proof system uses.

This separation is the source of a large class of bugs, discussed in Section 7.

Formal Definition of Circuit Soundness

Let C be a circuit with public input space X and witness space W. Let f: X x W → {0, 1} be the intended relation — the function the circuit designer meant to compute. Let R_C: X x W → {0, 1} be the actual relation — the predicate that holds when all R1CS constraints are satisfied.

Definition 1 (Circuit Soundness). A circuit C is sound with respect to f if for all x in X:

{ w in W : R_C(x, w) = 1 } ⊆ { w in W : f(x, w) = 1 }

That is, every witness accepted by the constraints is also accepted by the intended relation. Equivalently: the constraint system does not admit any satisfying witness that the intended relation rejects.

Definition 2 (Under-Constrained Signal). A signal s_j in a circuit C is under-constrained with respect to public input x if there exist two distinct witnesses w and w’ such that R_C(x, w) = R_C(x, w’) = 1, the witnesses agree on all signals except s_j, and f(x, w) = 1 but f(x, w’) = 0.

Informally: an under-constrained signal can take multiple values that satisfy the constraints, and at least one of those values corresponds to a false statement.

Definition 3 (Witness Uniqueness). A circuit C has witness uniqueness for public input x if:

|{ w in W : R_C(x, w) = 1 }| = 1

Witness uniqueness is a sufficient (but not necessary) condition for circuit soundness with respect to any intended relation that accepts the unique witness. In practice, many circuits intentionally allow multiple valid witnesses (e.g., a signature scheme where multiple valid signatures exist for the same message). The relevant question is whether the unintended witnesses correspond to false statements.

Taxonomy of Under-Constraint Vulnerabilities

Five patterns account for the majority of under-constraint bugs in deployed circuits. They are not mutually exclusive — a single bug may involve multiple patterns.

5.1 Unconstrained Output Signals

A signal is computed in the witness generator but never appears in any constraint. Since the proof system only enforces constraints, the signal can take any value in the proof without affecting validity.

flowchart LR
    classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
    classDef attack fill:#331a1a,stroke:#d97777,stroke-width:2px,color:#f0c0c0
    classDef data fill:#332a1a,stroke:#d4b870,stroke-width:2px,color:#f0e0c0

    A["Input Signal a"]:::data
    B["Input Signal b"]:::data
    C["Witness Generator: out = a * b"]:::process
    D["Constraint System"]:::process
    E["out is unconstrained"]:::attack

    A --> C
    B --> C
    C --> D
    D --> E

In Circom, this occurs when a signal is assigned with <-- (computation only) instead of <== (computation + constraint). The <-- operator populates the witness but generates no R1CS constraint. The honest witness generator will compute the correct value, but a malicious prover can substitute any field element.

5.2 Missing Range Checks

Arithmetic in R1CS operates over F_p, where p is typically a 254-bit prime (the BN254 scalar field, p ≈ 2^254). When a circuit intends for a signal to represent, say, an 8-bit unsigned integer, it must explicitly constrain the signal to lie in [0, 2^8 - 1]. Without this range check, the signal can take any value in F_p, and field arithmetic wraps modulo p.

A range check for n bits requires constraining a signal v by decomposing it into bits b_0, …, b_{n-1} and enforcing:

Each b_i is binary: b_i * (1 - b_i) = 0
The bits reconstruct v: Σ b_i * 2^i = v

This costs n constraints. The temptation to omit range checks for “obviously small” intermediate values is a frequent source of bugs. The BigMod vulnerability in circomlib involved exactly this: a quotient signal in modular reduction was not range-checked, allowing values larger than the modulus that caused the modular arithmetic to produce incorrect results.

5.3 Nondeterministic Witness Computation

Some circuits use the witness generator to compute values that are difficult to express as constraints — integer division, modular inverses, or conditional branching. The constraints then verify the result rather than the computation. This is correct when the verification constraints fully determine the result, but bugs arise when they do not.

Consider computing the integer quotient q and remainder r of a divided by b. The witness generator computes q = floor(a / b) and r = a - q * b. The constraints enforce:

a = q * b + r r * (r - 1) * … * (r - b + 1) = 0 (range check: 0 ≤ r < b)

If the range check on r is missing, or the range check on q is missing, multiple (q, r) pairs satisfy a = q * b + r over F_p. A malicious prover can choose q and r values that are technically consistent modulo p but represent a completely different quotient.

5.4 Signal Aliasing

In finite field arithmetic, distinct bit representations can encode the same field element, or conversely, the same application-level value can have multiple field representations. For example, in BN254, both v and p - v represent additive inverses but might correspond to the same “absolute value” in a context where the circuit processes signed integers.

A circuit performing a Merkle proof verification that compares hash outputs must ensure that the comparison is on canonical representations. If the hash function’s output is constrained to 254 bits but the comparison allows either v or v + p (which are equal in F_p but have different bit decompositions), the prover may exploit the aliasing to substitute a different path.

5.5 Incomplete Conditional Logic

When a circuit branches on a condition — selecting between two sub-computations based on a selector bit — both branches must be fully constrained. A common pattern is:

out = selector * branch_1 + (1 - selector) * branch_0

If only one branch is constrained (e.g., branch_1 is constrained but branch_0 is a free signal), a prover who sets selector = 0 obtains an unconstrained output.

Detection Algorithms

6.1 Degree-of-Freedom Analysis

Given an R1CS with k constraints over n signals (of which l are public inputs and 1 is the constant signal), the number of private signals is m = n - l - 1. If k < m, the system is underdetermined — there are more unknowns than equations, and for any public input, the solution set (if nonempty) has dimension at least m - k over F_p.

This provides a necessary (but not sufficient) condition for under-constraint: if rank(constraint system) < number of private signals, the circuit is definitely under-constrained. The converse does not hold — a circuit can have k ≥ m and still be under-constrained if constraints are redundant.

function DegreeOfFreedomCheck(R1CS):
    n_private = total_signals - public_signals - 1
    n_constraints = number of R1CS constraints
    if n_constraints < n_private:
        return UNDER_CONSTRAINED(deficit = n_private - n_constraints)
    else:
        return POSSIBLY_SOUND  // requires further analysis

Computing the actual rank of the constraint system (as a polynomial system) is harder — it requires analyzing the Jacobian of the constraint equations, which depends on the evaluation point. For linear constraints, standard linear algebra suffices. For the general quadratic case, rank analysis at random evaluation points provides a probabilistic answer.

6.2 Constraint Graph Reachability

Model the circuit as a bipartite graph G = (S ∪ C, E) where S is the set of signals, C is the set of constraints, and an edge (s, c) exists when signal s appears in constraint c. A signal s is constrained if it is reachable from at least one public input signal through a path in G. A signal that is not reachable from any public input is definitively unconstrained — its value cannot affect the verifier’s decision.

flowchart TD
    classDef system fill:#1a3333,stroke:#5ba8a8,stroke-width:2px,color:#c0e8e8
    classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
    classDef attack fill:#331a1a,stroke:#d97777,stroke-width:2px,color:#f0c0c0
    classDef success fill:#1a331a,stroke:#a8c686,stroke-width:2px,color:#c8e8b0

    X["Public Input x"]:::system
    C1["Constraint 1: a * b = c"]:::process
    C2["Constraint 2: c + d = out"]:::process
    S_a["Signal a"]:::success
    S_b["Signal b"]:::success
    S_c["Signal c"]:::success
    S_d["Signal d"]:::attack
    S_out["Output out"]:::success

    X --> C1
    S_a --> C1
    S_b --> C1
    C1 --> S_c
    S_c --> C2
    S_d -.->|"unreachable from public input"| C2
    C2 --> S_out

The algorithm proceeds in three steps:

Build the constraint graph from the R1CS matrices.
Run BFS/DFS from all public input signals, traversing through constraints to reach private signals.
Flag unreachable signals as unconstrained.

This analysis is linear in the size of the R1CS and catches the most straightforward class of bugs: signals that were computed but never constrained. It does not detect subtler issues like missing range checks, where the signal is constrained but insufficiently.

6.3 Witness Fuzzing for Uniqueness

Given a public input x and an honestly generated witness w, attempt to find a second witness w’ ≠ w that also satisfies the R1CS. If one is found, the circuit is under-constrained (though not necessarily unsound — uniqueness is stricter than soundness).

function WitnessFuzz(R1CS, x, w, iterations):
    for i in 1..iterations:
        j = random signal index (private only)
        w' = w with w'[j] = random field element
        // propagate: attempt to solve remaining constraints
        // given the perturbed signal
        if SolveConstraints(R1CS, x, w', fixed={j}):
            return UNDER_CONSTRAINED(signal=j, witness_1=w, witness_2=w')
    return NO_BUG_FOUND  // not a proof of soundness

The SolveConstraints step attempts to find values for all non-fixed private signals such that the R1CS is satisfied. Because R1CS constraints are quadratic, each constraint with one unknown is a quadratic equation with at most two solutions over F_p. When the constraint graph is acyclic (common in practice), the system can be solved greedily by topological order.

Witness fuzzing is a probabilistic technique — it can find bugs but cannot prove their absence. Its strength is that it requires no understanding of the circuit’s intended semantics; it operates purely on the R1CS.

The Computation-Constraint Gap

Circom’s design separates witness computation from constraint generation. A template contains imperative code (using <-- for assignment and standard control flow) that computes signal values, and declarative constraints (using ===) that the R1CS enforces. The shorthand <== combines both: it assigns the computed value and generates a constraint. This design is the root cause of a large class of bugs.

The gap manifests in three ways:

Computation without constraint. A developer writes out <-- a * b intending to constrain the output, but forgets the corresponding out === a * b constraint. The honest witness generator produces the correct value, so all tests pass. But the R1CS does not enforce the relationship, and a malicious prover can set out to any value.

Constraint weaker than computation. The computation is out <-- a / b (integer division), but the constraint is out * b === a. This constraint is satisfied by the correct quotient, but also by any out such that out * b ≡ a (mod p), which includes field-arithmetic solutions unrelated to integer division. A range check on out is needed to exclude the field-arithmetic solutions.

Branching in computation, not in constraints. The witness generator uses an if-else branch: if (condition) { out <-- a } else { out <-- b }. But the constraint system cannot express conditional logic directly — it must use arithmetic multiplexing: out === condition * a + (1 - condition) * b. If the constraint does not match the branching logic, or if condition is not constrained to be binary, the prover has freedom to choose either branch regardless of the condition’s actual value.

Noir takes a different approach. Its compiler generates both the witness computation and the constraints from the same source code, eliminating the manual synchronization problem. However, Noir circuits can still be under-constrained through unsafe operations (unchecked casts, explicit witness hints via std::unsafe) and through the use of assert() statements that the compiler may not translate into R1CS constraints in all optimization paths.

flowchart TD
    classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
    classDef attack fill:#331a1a,stroke:#d97777,stroke-width:2px,color:#f0c0c0
    classDef success fill:#1a331a,stroke:#a8c686,stroke-width:2px,color:#c8e8b0
    classDef highlight fill:#332519,stroke:#e8a87c,stroke-width:2px,color:#f0d8c0

    subgraph Circom ["Circom: Manual Synchronization"]
        C1["Computation Code"]:::process
        C2["Constraint Code"]:::process
        C3["Developer must keep in sync"]:::attack
    end

    subgraph Noir ["Noir: Unified Source"]
        N1["Single Source Code"]:::success
        N2["Compiler generates both"]:::success
        N3["Unsafe operations still possible"]:::highlight
    end

    C1 ---|"separate"| C2
    C2 --> C3
    N1 --> N2
    N2 --> N3

On-Chain Verifier Analysis

A deployed SNARK verifier on Ethereum is a smart contract that checks proofs against a hardcoded verification key. The verification key encodes the constraint system (indirectly, through elliptic curve points derived from the R1CS during the trusted setup). Analyzing whether a deployed verifier corresponds to a sound circuit requires working backward from the verification key.

Verification key extraction. The verifier contract stores elliptic curve points (for Groth16: α, β, γ, δ points and the IC array; for PLONK: commitment points and evaluation challenges). These can be extracted from the contract’s storage or bytecode through standard EVM analysis — reading immutable constructor arguments or storage slots.

Circuit reconstruction. The verification key does not directly reveal the R1CS. However, given a candidate R1CS (e.g., compiled from a published Circom source), one can rerun the trusted setup to derive a verification key and check whether it matches the deployed key. This is the standard approach to circuit verification: publish the source, let anyone compile it, and compare verification keys.

Verifier bytecode correctness. A separate concern is whether the verifier contract correctly implements the verification algorithm. Bugs in the verifier (incorrect pairing checks, wrong field arithmetic, missing input validation) can break soundness independently of the circuit. EVM-level analysis can verify that the verifier contract’s pairing precompile calls (to address 0x08 for BN254) match the expected verification equation: e(A, B) = e(α, β) · e(IC, γ) · e(C, δ), where the IC term incorporates the public inputs.

Input validation. The verifier must check that all public inputs are valid field elements (less than the field modulus p). Missing input validation is a verifier-level soundness bug: a prover can submit inputs x ≥ p that reduce modulo p to a different value, potentially bypassing application-level checks that depend on the input’s numeric value.

Pailoor et al. [2023] introduced automated detection of under-constrained circuits through a technique that extracts constraints from Circom compilation artifacts and checks for signal uniqueness using an SMT solver. Their tool, QED^2 (later Picus), formalizes the uniqueness property as a quantified constraint and uses Z3 to find counterexamples. The approach is sound but does not scale to large circuits due to solver timeouts.

Ozdemir and Boneh [2022] formalized the notion of circuit-value problem and showed that verifying whether a given R1CS is satisfiable is NP-complete, while checking whether it has a unique solution (for fixed public inputs) is coNP-complete. This result implies that no efficient algorithm can certify circuit soundness in general, motivating the heuristic and pattern-based approaches described in Section 6.

The Ecne tool [Feng et al., 2023] takes a different approach: it checks whether a circuit’s constraints uniquely determine each signal as a function of the public inputs, using algebraic techniques based on Groebner bases. When the constraint system is sufficiently structured (as most real circuits are), this approach terminates efficiently.

Open-source Circom source-level analyzers perform static analysis on Circom source code (before R1CS compilation) to detect common vulnerability patterns: unconstrained signals, unused outputs, and non-deterministic control flow. They operate at the source level rather than the constraint level, catching bugs earlier in the development process but missing vulnerabilities that arise from the interaction between constraint compilation and field arithmetic.

The broader context is formal verification of cryptographic implementations. Barbosa et al. [2021] formalized game-based cryptographic proofs in EasyCrypt, and the CertiCrypt and Jasmin projects have verified implementations of specific cryptographic primitives. Applying these techniques to circuit soundness — proving that a circuit’s constraint system is a refinement of its intended specification — remains an open research direction, with early work by Pailoor et al. demonstrating feasibility for small circuits.

References

J. Groth, “On the Size of Pairing-based Non-interactive Arguments,” EUROCRYPT 2016. Available: https://eprint.iacr.org/2016/260
A. Gabizon, Z. J. Williamson, and O. Ciobotaru, “PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge,” IACR ePrint 2019/953. Available: https://eprint.iacr.org/2019/953
S. Pailoor, Y. Chen, F. Wang, C. Dalla, and I. Dillig, “Automated Detection of Under-Constrained Circuits in Zero-Knowledge Proofs,” IACR ePrint 2023/190. Available: https://eprint.iacr.org/2023/190
A. Ozdemir and D. Boneh, “Experimenting with Collaborative zk-SNARKs: Zero-Knowledge Proofs for Distributed Secrets,” USENIX Security 2022. Available: https://eprint.iacr.org/2021/1530
C. Feng, A. J. Menezes, and Z. L. Jiang, “Ecne: Checking the Correctness of Circuits,” CCS 2023. Available: https://github.com/nicholastmosher/ecne
Circom documentation, “Signals and Constraints.” Available: https://docs.circom.io/circom-language/signals/
Noir documentation, “Language Concepts.” Available: https://noir-lang.org/docs
Open-source Circom static analysis tooling (Circomspect and related projects). Referenced via project documentation.
G. Barbosa, B. Grégoire, V. Laporte, and B. Schmidt, “Mechanized Proofs of Adversarial Complexity and Application to Universal Composability,” CCS 2021. Available: https://eprint.iacr.org/2021/156
Ethereum Foundation, “EVM Precompiled Contracts — ecPairing (0x08).” Available: https://www.evm.codes/precompiled#0x08
iden3, “circomlib — Circuit Libraries for Circom.” Available: https://github.com/iden3/circomlib
en.wikipedia.org, “Zero-knowledge proof.” Available: https://en.wikipedia.org/wiki/Zero-knowledge_proof