Sigvex Platform Capabilities
Bytecode-Native Analysis Across EVM and SVM Runtimes
An architectural overview of the Sigvex analysis pipeline: bytecode-native design across ~95 packages, 361 detectors spanning EVM, SVM, and ZK runtimes, and e-graph constraint satisfaction for security analysis without verified source code.
Sigvex Platform Capabilities
Most static analysis tools for smart contracts start from source. That design choice seems natural — source code is readable, well-structured, and carries the developer’s intent. But it creates a hard constraint: you can only analyze contracts where a verified source file exists.
On Ethereum mainnet, a substantial fraction of deployed contract bytecode has no verified source on Etherscan or any other registry. Proxy contracts add a second problem: a proxy’s source tells you almost nothing about what its current implementation actually does. Security analysis that stops at source verification misses a large portion of what is actually running on-chain.
Sigvex is designed around the opposite assumption: the bytecode is the ground truth. The decompiler, detectors, fuzzer, and exploit generator all operate on compiled bytecode. Source code, when available, is treated as supplementary context rather than a prerequisite.
What “Bytecode-Native” Means in Practice
Bytecode-native analysis requires solving problems that source-based tools skip entirely.
Function boundary detection must be inferred from the dispatcher jump table rather than read from a function declaration. The frontend parses the bytecode, identifies the selector dispatch pattern, and reconstructs function entry points. For fallback and receive functions, there is no selector — the boundary is detected structurally.
Type information is not present in EVM bytecode. Every stack value is a 256-bit word. Type inference runs during the HIR transformation phase, assigning address, uint256, bool, bytes32, and other types based on how values are used: whether they are masked to 160 bits before CALL, whether they pass through ISZERO, whether they are used as array lengths.
Storage layout reconstruction reverse-engineers slot assignments from SLOAD and SSTORE patterns. A mapping’s slot is the keccak256 hash of the key concatenated with the base slot; the decompiler recognizes this hash pattern and reconstructs the mapping relationship. Packed struct fields are identified from partial-word reads and writes.
Proxy resolution is handled as a first-class concern. When the bytecode contains a DELEGATECALL to an address read from storage, the decompiler reads the implementation slot (EIP-1967, EIP-1167, or EIP-2535 patterns), fetches the implementation bytecode via RPC, and merges it into a unified analysis view. Nested proxies are resolved recursively.
None of this is novel in isolation — academic work on EVM decompilation goes back to at least 2018 (see Porosity, Mythril, Panoramix). What the Sigvex design adds is an end-to-end pipeline where the decompiled representation feeds directly into detectors, a fuzzer, and an exploit generator, all within the same Rust workspace.
Architecture: Eight Layers, Two Runtimes
The codebase is organized into an 8-layer dependency hierarchy across approximately 95 packages. The constraint is strict: a package in layer N may only depend on packages in layers 0 through N-1. There are no upward dependencies.
Layer 7 — Presentation CLI, REST APIs, web portal
Layer 6 — Application EVM and SVM analysis pipeline orchestration
Layer 5 — Infrastructure RPC clients, storage backends (local/S3/Azure)
Layer 4 — Domain Services Exploit generation, fuzzing, security validation
Layer 3 — Domain Logic Detectors, call graph analysis, proxy resolution
Layer 2 — Transformation Bytecode → LIR → HIR → Solidity/Yul
Layer 1 — Domain Model EVM/SVM types, signatures, symbolic engine
Layer 0 — Foundation Error handling, async utilities, constants
EVM and SVM runtimes are isolated from each other at layers 0–5. They share nothing except chain-agnostic abstractions (error types, async helpers, detector trait definitions, storage interfaces). This isolation means a change to the SVM decompiler cannot accidentally break an EVM detector.
The benefit of this structure shows up in testing. Each layer can be tested against its immediate dependencies without standing up the layers above it. The CFG construction in layer 2 can be unit-tested by feeding raw bytecode directly, without instantiating the HTTP API server in layer 7.
The Decompilation Pipeline
The EVM decompilation pipeline runs four stages:
Stage 1 — Frontend (Bytecode → LIR). The bytecode is disassembled into opcodes and a control flow graph is built. Function boundaries are identified from the selector dispatch table. The output is a low-level IR that is still essentially stack-based, with explicit PUSH, POP, and SWAP operations.
Stage 2 — HIR Transformation (LIR → HIR). The stack-based LIR is converted to a register-based HIR in static single assignment (SSA) form. Type inference runs here. Storage slot patterns are analyzed and mapped to variable declarations. Semantic patterns — require, assert, checked arithmetic, SafeMath calls — are lifted to first-class HIR nodes.
Stage 3 — Optimization Passes. Eight or more passes run over the HIR: constant folding, dead code elimination, common subexpression elimination, variable renaming (semantic names where the function signature database has a match), control flow simplification, and loop recognition. The passes are composable and order-dependent.
Stage 4 — Backend (HIR → Source). The HIR is lowered to either Solidity or Yul output. Contract structure is reconstructed: state variables, events, modifiers where recoverable, and function bodies. The Solidity output is not guaranteed to recompile — the goal is human readability for security review, not roundtrip fidelity.
SVM analysis follows a parallel structure adapted to eBPF/BPF bytecode. The Solana runtime differs significantly from the EVM: there is no persistent storage slot model, accounts are passed explicitly, and the execution model is a register machine from the start. The SVM pipeline handles these differences at layers 1–2 without requiring changes to the shared detector infrastructure.
Detection: Techniques and Tradeoffs
Sigvex runs 361 detectors across EVM, SVM, and ZK runtimes, organized by severity. Details on specific detectors and the detection techniques (CFG traversal, data flow analysis, taint tracking, symbolic execution, and e-graph constraint satisfaction) are covered in Vulnerability Detection Framework and E-Graph Constraint Satisfaction. This section focuses on the design decisions that affect the pipeline as a whole.
Confidence scoring. Each finding carries a 0.0–1.0 confidence score. The score accounts for pattern match strength, whether semantic validation confirmed the pattern, context analysis (does the function have a reentrancy guard?), and historical correlation with known exploit patterns. High-confidence findings are prioritized in the report; low-confidence findings are surfaced but labeled.
Analysis modes. The detector suite supports four modes with different performance/depth tradeoffs:
- Fast (under 1 second): Pattern-based scanning only, using the e-graph CST engine
- Standard (under 10 seconds): Full detector suite including CFG, data flow, and semantic analysis
- Deep (under 5 minutes): Standard analysis plus fuzzing campaigns and exploit generation
- Exhaustive: Extended fuzzing with adaptive input generation; runtime is contract-dependent
Fast mode is suitable for CI/CD gates and high-throughput monitoring pipelines where latency matters more than depth. Deep mode is appropriate for manual security audits where the goal is to produce exploitable proof-of-concept findings.
The e-graph CST engine. Most EVM analysis tools call out to an external SMT solver (Z3 or CVC5) for constraint solving. SMT solvers are powerful but introduce external binary dependencies, can run for seconds or minutes on complex constraints, and cannot run in WebAssembly environments. The Sigvex CST engine uses e-graphs and equality saturation instead. Vulnerability patterns are expressed as rewrite rules; after saturation, detection becomes a graph membership test. The engine is implemented entirely in Rust with no external dependencies and compiles to WASM for browser-based analysis. The performance difference against SMT-based tools is covered in detail in the e-graph research article.
Dynamic Analysis: Fuzzing and Exploit Generation
Static analysis identifies candidate vulnerability locations. Dynamic analysis validates whether those candidates are actually exploitable.
The fuzzer generates test inputs targeting the functions flagged by static detectors. Coverage-guided mutation expands the input corpus to explore branches not reached by the initial cases. For a reentrancy candidate, the fuzzer would generate a caller contract that re-enters the flagged function and attempt to trigger the state change before the initial call completes.
When a fuzzing campaign produces a crashing or anomalous result, the exploit generator constructs a proof-of-concept Solidity contract. The generated exploit includes:
- The attacker contract implementing the attack sequence
- A transaction sequence with expected intermediate state
- An estimated profit or fund drain amount where quantifiable
The exploit output is for security validation — confirming that a flagged vulnerability is genuinely exploitable, not a false positive. Automated exploit generation is discussed in more detail in Automated Exploit Generation.
Storage Design
Each analyzed contract produces data across three storage directories with different mutability characteristics:
bytecode/— immutable data derived from on-chain bytecode: the bytecode itself, decompiled functions, storage access patterns, IR representationsstate/— mutable on-chain state: storage slot values with block-level history, inferred storage layouts, balance and nonce snapshotsanalysis/— derived results that can be regenerated: findings, fuzzing results, call graphs, exploit output, metrics
Separating these three categories matters for cache invalidation. If a detector is updated, only the analysis/ output needs to be regenerated. The expensive decompilation step (which writes to bytecode/) does not need to re-run. If on-chain state changes, state/ is updated independently of the static analysis results.
Storage backends are swappable: local filesystem (default), AWS S3, and Azure Blob Storage are all supported via a factory pattern. The same storage interface is used regardless of backend.
What This Article Does Not Cover
Each component of the pipeline has its own research article with technical depth that would not fit here:
- Decompilation pipeline internals: Decompilation Pipeline
- Semantic lifting from bytecode: Semantic Lifting
- E-graph constraint satisfaction: E-Graph Constraint Satisfaction
- Vulnerability detector design: Vulnerability Detection Framework
- Automated exploit generation: Automated Exploit Generation
- Coverage-guided fuzzing: Coverage-Guided Fuzzing
- Cross-contract analysis: Cross-Contract Analysis
- Solana (SVM) pipeline: SVM Analysis Pipeline
- Historical exploit pattern matching: Attack Pattern Intelligence
References
- Mossberg, M. et al. “Manticore: A User-Friendly Symbolic Execution Framework for Binaries and Smart Contracts.” IEEE ASE 2019.
- Grech, N. et al. “Gigahorse: Thorough, Declarative Decompilation of Smart Contracts.” ICSE 2019.
- Brent, L. et al. “Vandal: A Scalable Security Analysis Framework for Smart Contracts.” arXiv 2018.
- Willsey, M. et al. “egg: Fast and Extensible Equality Saturation.” POPL 2021.
- EVM Opcodes Reference
- Smart Contract Weakness Classification (SWC)