E-Graph Constraint Satisfaction

The standard approach to automated smart contract analysis borrows from formal verification: encode the program as logical constraints, hand them to an SMT solver, and ask whether certain properties can or cannot hold. Z3 and CVC5 are capable solvers, and this approach works—but it has a predictability problem. Analysis time varies from seconds to minutes depending on constraint complexity, and some queries simply time out. That unpredictability makes it hard to integrate into development workflows where fast feedback matters.

Sigvex’s CST (Constraint Satisfaction via Term-rewriting) engine takes a different approach. Instead of asking “can this expression equal this value?” it asks “does this program contain any expression equivalent to a known vulnerable pattern?” That is a structurally different question, and e-graphs answer it efficiently.

What E-Graphs Are

An e-graph (equality graph) is a data structure that represents multiple equivalent expressions simultaneously. Expressions are grouped into equivalence classes called e-classes. Each e-class contains e-nodes, where each e-node is an operator applied to child e-classes.

The core operation is merging: given the equation a = b, merge the e-classes containing a and b. No enumeration of consequences is required up front—the merged class now represents both expressions, and any expression containing either as a subterm automatically gains equivalents.

Equality saturation is the process of repeatedly applying rewrite rules to an e-graph until no new equivalences can be added. When the e-graph is saturated, it simultaneously represents every expression equivalent to the input under the given rewrite rules.

For an arithmetic example: starting from (a + b) * 1, saturation under standard algebraic rules produces a single e-graph that represents a + b, b + a, (b + a) * 1, and their further equivalents—all without choosing a canonical form.

For vulnerability detection, the e-graph’s property of representing all equivalent forms simultaneously is what makes it useful. Instead of checking one program path at a time, the analysis saturates the e-graph and then checks whether any expression in the entire equivalence structure matches a vulnerability pattern. This is a membership test, not a satisfiability query.

Rewrite Rules in the CST Engine

The CST engine’s rewrite rules fall into three categories, applied during saturation.

Arithmetic rules capture algebraic identities that the EVM executes:

ADD(a, b)   =>  ADD(b, a)       // commutativity
ADD(a, 0)   =>  a               // identity
MUL(a, 0)   =>  0               // zero product
SUB(a, a)   =>  0               // self-subtraction
DIV(a, 1)   =>  a               // identity

Comparison and control flow rules expose patterns relevant to access control:

EQ(a, a)          =>  1
ISZERO(ISZERO(a)) =>  GT(a, 0)

Vulnerability detection rules introduce marker nodes. When an addition lacks overflow protection, the rule fires and adds an OVERFLOW_MARKER to the equivalence class containing that addition:

ADD(a, b)                                          =>  OVERFLOW_MARKER
SUB(a, b)                                          =>  UNDERFLOW_MARKER
SEQ(CALL(gas, addr, val, ...), SSTORE(slot, val))  =>  REENTRANCY_MARKER

After saturation, detecting whether a vulnerability class is present reduces to asking whether the corresponding marker node exists anywhere in the e-graph. This is O(1) lookup, not solver invocation.

The Three-Phase Pipeline

Phase 1: Pattern Compilation

Eight vulnerability categories are pre-compiled at startup into structured rules, each bundling a match expression, condition functions, and reporting metadata:

Integer overflow: arithmetic addition without bounds checking
Integer underflow: subtraction producing wrap-around values (relevant for pre-0.8 contracts)
Reentrancy: external call followed by state modification
Unchecked call return: CALL return value discarded
Unauthorized access: sensitive operations without caller validation
Arbitrary jump: user-controlled JUMP destination
Price manipulation: oracle price used without spot-price manipulation resistance
Flash loan vulnerability: flash loan accepted without callback protection

Phase 2: Program Ingestion and Saturation

Contract HIR (the high-level IR from the decompilation pipeline) is translated into S-expressions that map directly to EVM operations:

(SEQ (CALL gas addr val args) (SSTORE slot val))
(ADD (CALLDATALOAD 4) (SLOAD 0))
(JUMPI dest (EQ (CALLER) (SLOAD owner_slot)))

The e-graph engine applies all rewrite rules until saturation or configurable resource limits—iteration count, node count, wall-clock time—are reached. The limits ensure predictable termination even for complex contracts. After saturation, the e-graph contains all expressions semantically equivalent to the input, including any vulnerability markers introduced during saturation.

Phase 3: Pattern Matching and Condition Checking

The matcher searches the saturated e-graph for vulnerability markers. Finding a marker triggers the associated condition functions, which check for protective mitigations before reporting a finding.

Condition functions are specific to each vulnerability class:

The reentrancy condition checks for Checks-Effects-Interactions ordering and for a canonical ReentrancyGuard mutex pattern.
The access control condition verifies that a CALLER/SLOAD comparison precedes the sensitive operation.
The unchecked call condition confirms that the CALL return value is consumed by some expression.

This two-stage design—wide-net saturation followed by condition-filtered matching—keeps both recall and precision high. Saturation finds all potential matches; condition functions eliminate cases that are protected.

Data Flow Analysis via E-Class Tracking

The CST engine also includes a taint analysis that tracks how untrusted data flows through the program. It operates in three steps:

Mark sources: E-classes containing untrusted data sources are tagged. Sources include CALLDATALOAD, CALLER, ORIGIN, CALLVALUE, CALLDATASIZE, and external call return data.

Propagate taint: When operand e-classes are tainted, the result e-class inherits the taint level. Because the e-graph’s equivalence structure means all equivalent expressions share an e-class, taint propagates through all equivalent forms simultaneously without separate per-form passes.

Check sinks: Tainted e-classes reaching security-relevant operations produce findings. The relevant sinks are SSTORE with a tainted slot parameter (arbitrary storage write), CALL/DELEGATECALL with a tainted target (arbitrary call target), and JUMP/JUMPI with a tainted destination (arbitrary control flow).

Performance Comparison

The performance difference relative to SMT-based analysis stems from a structural difference, not implementation quality. SMT-based symbolic execution produces one logical formula per execution path: a function with n conditional branches generates up to 2^n paths, each requiring a separate solver query. E-graph saturation has no path explosion—all paths through a function are represented within the same set of equivalence classes. Adding a branch merges two e-classes rather than doubling the state space.

Metric	SMT-based tools	CST engine
Typical analysis	10–60 seconds	< 1 second
Worst case	Minutes to hours	< 5 seconds
Memory scaling	Exponential with branch count	Linear with e-class count
Parallelization	Limited (sequential solver)	Full (independent e-graphs per function)
WASM deployment	Not possible	Supported

The speed improvement is not hand-wavy—it follows from this structural difference. The magnitude depends on contract complexity: simple contracts show a modest gain, while contracts with many branches (where SMT path counts grow fastest) show the largest. We have measured order-of-magnitude differences on branch-heavy contracts, but the exact ratio is contract-specific and we avoid quoting a single headline multiplier.

Protection Pattern Recognition

A practical concern with pattern-based detection is false positives—flagging code that is actually protected. The CST engine’s condition functions handle this through explicit protection recognition:

Canonical access-control patterns: Ownable (single SLOAD with CALLER comparison before sensitive operations), ReentrancyGuard (storage lock checked and set atomically around external calls), AccessControl (role-based SLOAD/CALLER combination), Pausable (boolean storage flag checked on entry).

Checks-Effects-Interactions: When state modifications precede external calls in the e-graph ordering, the reentrancy condition function recognizes the protective pattern.

SafeMath: Pre-0.8 contracts using SafeMath include characteristic overflow checks—require(a + b >= a) for addition. The condition function recognizes these and downgrades overflow findings accordingly.

All of this runs within e-graph analysis. No source code or external metadata is needed.

What CST Does Not Do

Being specific about limitations avoids misapplying the technique:

CST determines whether a vulnerability pattern exists; it does not produce the concrete inputs needed to trigger it. For a reentrancy finding, CST will identify the unsafe call/store ordering—but it won’t generate the transaction sequence and attacker contract that proves exploitability. That requires the fuzzing phase.

CST is better at reachability than constraint satisfaction. “Can this pattern occur?” is easy. “What specific input values satisfy these constraints simultaneously?” is harder and benefits from symbolic execution or fuzzing.

CST produces no false negatives for its covered patterns, but coverage is bounded. Patterns not in the rule set are not detected. Business logic vulnerabilities that depend on protocol economics cannot be expressed as structural e-graph patterns.

The Sigvex pipeline structures these phases explicitly to use each technique where it excels:

CST (< 1 second): fast e-graph pattern matching for known vulnerability classes
Static analysis (< 10 seconds): deep dataflow analysis for complex, context-dependent cases
Fuzzing (< 5 minutes): dynamic validation and exploit generation

Deployment Implications

The WASM compatibility and zero-dependency deployment change where security analysis can run:

CI/CD integration: Analysis that completes in under a second can run on every pull request without extending build times perceptibly. A 30-second analysis gate is a deterrent; a sub-second one is transparent.

Real-time monitoring: Detecting vulnerable contracts as they land on mainnet requires processing thousands of contracts daily. SMT-based analysis at 30–60 seconds per contract requires a large cluster. Sub-second CST analysis fits on fewer machines.

Browser and serverless environments: The WASM build runs in constrained environments where deploying native solver binaries is not practical. The Sigvex WASM fuzzer packages use this capability directly.

Implementation Notes

The CST engine is implemented entirely in Rust using an embedded e-graph library. There are no external solver processes, no FFI bridges, no JNI overhead. Functions are analyzed with independent e-graphs and run concurrently across available cores.

The symbolic expression representation is a simple S-expression tree. Adding new rewrite rules requires defining the pattern and its right-hand side; adding new vulnerability patterns requires defining the marker rule and the condition function. Neither requires modifying core infrastructure.

References

egg: Fast and Extensible Equality Saturation — Willsey et al., POPL 2021
Equality Saturation: A New Approach to Optimization — Tate et al., POPL 2009
Satisfiability Modulo Theories: Introduction and Applications — De Moura & Bjørner
EVM Opcodes Reference