Sigvex Platform Capabilities

Most static analysis tools for smart contracts start from source. That design choice seems natural — source code is readable, well-structured, and carries the developer’s intent. But it creates a hard constraint: you can only analyze contracts where a verified source file exists.

On Ethereum mainnet, a substantial fraction of deployed contract bytecode has no verified source on any public registry. Proxy contracts add a second problem: a proxy’s source tells you almost nothing about what its current implementation actually does. Security analysis that stops at source verification misses a large portion of what is actually running on-chain.

Sigvex is designed around the opposite assumption: the bytecode is the ground truth. The decompiler, detectors, fuzzer, and exploit generator all operate on compiled bytecode. Source code, when available, is treated as supplementary context rather than a prerequisite.

What “Bytecode-Native” Means in Practice

Bytecode-native analysis requires solving problems that source-based tools skip entirely.

Function boundary detection must be inferred from the dispatcher jump table rather than read from a function declaration. The frontend parses the bytecode, identifies the selector dispatch pattern, and reconstructs function entry points. For fallback and receive functions, there is no selector — the boundary is detected structurally.

Type information is not present in EVM bytecode. Every stack value is a 256-bit word. Type inference runs during the HIR transformation phase, assigning address, uint256, bool, bytes32, and other types based on how values are used: whether they are masked to 160 bits before CALL, whether they pass through ISZERO, whether they are used as array lengths.

Storage layout reconstruction reverse-engineers slot assignments from SLOAD and SSTORE patterns. A mapping’s slot is the keccak256 hash of the key concatenated with the base slot; the decompiler recognizes this hash pattern and reconstructs the mapping relationship. Packed struct fields are identified from partial-word reads and writes.

Proxy resolution is handled as a first-class concern. When the bytecode contains a DELEGATECALL to an address read from storage, the decompiler reads the implementation slot (EIP-1967, EIP-1167, or EIP-2535 patterns), fetches the implementation bytecode via RPC, and merges it into a unified analysis view. Nested proxies are resolved recursively.

None of this is novel in isolation — academic work on EVM decompilation goes back to at least 2018. What the Sigvex design adds is an end-to-end pipeline where the decompiled representation feeds directly into detectors, a fuzzer, and an exploit generator, all within the same Rust workspace.

Architecture: Runtime Isolation, Shared Abstractions

The platform is a Rust workspace organized as a strict layered dependency hierarchy. Lower components have no knowledge of higher ones, and higher components depend only on the interfaces exposed by what lies below. That unidirectional rule is what keeps the codebase from collapsing into a tangle as new runtimes and detectors are added.

Three runtimes — EVM, SVM, and ZK — are isolated from one another. They share only chain-agnostic abstractions: error types, async helpers, detector trait definitions, and storage interfaces. A change inside the SVM decompiler cannot accidentally break an EVM detector, because the two runtimes do not import each other’s code — they each implement the same shared interfaces.

The benefit shows up in testing. Each tier can be exercised against its immediate dependencies without standing up the full pipeline. Control flow graph construction, for instance, can be unit-tested by feeding raw bytecode directly, without instantiating the REST API server or a storage backend.

The Decompilation Pipeline

The EVM decompilation pipeline runs four stages:

Stage 1 — Frontend (Bytecode → LIR). The bytecode is disassembled into opcodes and a control flow graph is built. Function boundaries are identified from the selector dispatch table. The output is a low-level IR that is still essentially stack-based, with explicit PUSH, POP, and SWAP operations.

Stage 2 — HIR Transformation (LIR → HIR). The stack-based LIR is converted to a register-based HIR in static single assignment (SSA) form. Type inference runs here. Storage slot patterns are analyzed and mapped to variable declarations. Semantic patterns — require, assert, checked arithmetic, SafeMath calls — are lifted to first-class HIR nodes.

Stage 3 — Optimization Passes. Eight or more passes run over the HIR: constant folding, dead code elimination, common subexpression elimination, variable renaming (semantic names where the function signature database has a match), control flow simplification, and loop recognition. The passes are composable and order-dependent.

Stage 4 — Backend (HIR → Source). The HIR is lowered to either Solidity or Yul output. Contract structure is reconstructed: state variables, events, modifiers where recoverable, and function bodies. The Solidity output is not guaranteed to recompile — the goal is human readability for security review, not roundtrip fidelity.

SVM analysis follows a parallel structure adapted to eBPF/BPF bytecode. The Solana runtime differs significantly from the EVM: there is no persistent storage slot model, accounts are passed explicitly, and the execution model is a register machine from the start. The SVM pipeline absorbs these differences inside its own frontend and IR passes so that the shared detector interface, storage layer, and presentation tier remain unchanged.

Detection: Techniques and Tradeoffs

Sigvex runs over 300 detectors across EVM, SVM, and ZK runtimes, organized by severity. Details on specific detectors and the detection techniques (CFG traversal, data flow analysis, taint tracking, symbolic execution, and e-graph constraint satisfaction) are covered in Vulnerability Detection Framework and E-Graph Constraint Satisfaction. This section focuses on the design decisions that affect the pipeline as a whole.

Confidence scoring. Each finding carries a 0.0–1.0 confidence score. The score accounts for pattern match strength, whether semantic validation confirmed the pattern, context analysis (does the function have a reentrancy guard?), and historical correlation with known exploit patterns. High-confidence findings are prioritized in the report; low-confidence findings are surfaced but labeled.

Analysis modes. The detector suite supports four modes with different performance/depth tradeoffs:

Fast (under 1 second): Pattern-based scanning only, using the e-graph CST engine
Standard (under 10 seconds): Full detector suite including CFG, data flow, and semantic analysis
Deep (under 5 minutes): Standard analysis plus fuzzing campaigns and exploit generation
Exhaustive: Extended fuzzing with adaptive input generation; runtime is contract-dependent

Fast mode is suitable for CI/CD gates and high-throughput monitoring pipelines where latency matters more than depth. Deep mode is appropriate for manual security audits where the goal is to produce exploitable proof-of-concept findings.

The e-graph CST engine. Most EVM analysis tools call out to an external SMT solver for constraint solving. SMT solvers are powerful but introduce external binary dependencies, can run for seconds or minutes on complex constraints, and cannot run in WebAssembly environments. The Sigvex CST engine uses e-graphs and equality saturation instead. Vulnerability patterns are expressed as rewrite rules; after saturation, detection becomes a graph membership test. The engine is implemented entirely in Rust with no external dependencies and compiles to WASM for browser-based analysis. The performance difference against SMT-based tools is covered in detail in the e-graph research article.

Dynamic Analysis: Fuzzing and Exploit Generation

Static analysis identifies candidate vulnerability locations. Dynamic analysis validates whether those candidates are actually exploitable.

The fuzzer generates test inputs targeting the functions flagged by static detectors. Coverage-guided mutation expands the input corpus to explore branches not reached by the initial cases. For a reentrancy candidate, the fuzzer would generate a caller contract that re-enters the flagged function and attempt to trigger the state change before the initial call completes.

When a fuzzing campaign produces a crashing or anomalous result, the exploit generator constructs a proof-of-concept Solidity contract. The generated exploit includes:

The attacker contract implementing the attack sequence
A transaction sequence with expected intermediate state
An estimated profit or fund drain amount where quantifiable

The exploit output is for security validation — confirming that a flagged vulnerability is genuinely exploitable, not a false positive. Automated exploit generation is discussed in more detail in Automated Exploit Generation.

Storage Design

Each analyzed contract produces data across three storage directories with different mutability characteristics:

bytecode/ — immutable data derived from on-chain bytecode: the bytecode itself, decompiled functions, storage access patterns, IR representations
state/ — mutable on-chain state: storage slot values with block-level history, inferred storage layouts, balance and nonce snapshots
analysis/ — derived results that can be regenerated: findings, fuzzing results, call graphs, exploit output, metrics

Separating these three categories matters for cache invalidation. If a detector is updated, only the analysis/ output needs to be regenerated. The expensive decompilation step (which writes to bytecode/) does not need to re-run. If on-chain state changes, state/ is updated independently of the static analysis results.

Storage backends are swappable: local filesystem (default), AWS S3, and Azure Blob Storage are all supported via a factory pattern. The same storage interface is used regardless of backend.

What This Article Does Not Cover

Each component of the pipeline has its own research article with technical depth that would not fit here:

Decompilation pipeline internals: Decompilation Pipeline
Semantic lifting from bytecode: Semantic Lifting
E-graph constraint satisfaction: E-Graph Constraint Satisfaction
Vulnerability detector design: Vulnerability Detection Framework
Automated exploit generation: Automated Exploit Generation
Coverage-guided fuzzing: Coverage-Guided Fuzzing
Cross-contract analysis: Cross-Contract Analysis
Solana (SVM) pipeline: SVM Analysis Pipeline
Historical exploit pattern matching: Attack Pattern Intelligence

References

Grech, N. et al. “Gigahorse: Thorough, Declarative Decompilation of Smart Contracts.” ICSE 2019.
Brent, L. et al. “Vandal: A Scalable Security Analysis Framework for Smart Contracts.” arXiv 2018.
Willsey, M. et al. “egg: Fast and Extensible Equality Saturation.” POPL 2021.
EVM Opcodes Reference
Smart Contract Weakness Classification (SWC)