Solana Program Analysis Pipeline
Security Analysis for eBPF Bytecode at Scale
How Sigvex analyzes Solana programs at the bytecode level, from ELF parsing and eBPF disassembly through HIR lifting, CPI analysis, and 192 Solana-native vulnerability detectors.
Solana Program Analysis Pipeline
Solana programs execute as eBPF (extended Berkeley Packet Filter) bytecode, a register-based architecture fundamentally different from the EVM’s stack machine. Security analysis of Solana programs requires a purpose-built pipeline that understands eBPF binary format, Solana’s account model, Cross-Program Invocation (CPI) semantics, and Anchor framework patterns. EVM analysis tools do not transfer—the vulnerability classes are different, the execution model is different, and the binary format is different.
Solana’s Distinct Security Model
Solana’s programming model introduces security challenges that have no EVM equivalent. The design decisions that make Solana fast also create a different threat surface.
The Account Model
In Solana, accounts are passed explicitly to each instruction as a list. Programs do not have private storage in the EVM sense—they operate on data accounts passed to them. This produces a class of vulnerabilities unique to Solana:
Missing signer checks: A program must verify that the expected account has signed the transaction. Without this verification, any caller can impersonate an authority account by passing the right account address without the corresponding signature. This is among the most frequent critical vulnerabilities in Solana programs.
Missing owner checks: A program must verify that data accounts are owned by the expected program—not just that they have the right address. Passing a fabricated account with the correct structure but owned by a different program allows an attacker to provide malicious data that passes structural validation.
Account aliasing: If the same account appears multiple times in the accounts list under different expected roles, programs operating on both roles corrupt state in unexpected ways. A single mutable account passed as both the “from” and “to” of a transfer can drain or double balances.
Duplicate mutable accounts: Two accounts expected to be distinct but with the same address create dangerous state confusion during write operations.
Cross-Program Invocation (CPI)
Solana programs compose by invoking other programs through CPI. CPI carries its own security surface:
Arbitrary CPI: If the target program address in a CPI call is attacker-controlled, the instruction can invoke any program on the network—including programs designed to exploit the caller’s signer authorities.
CPI signer propagation: When a program invokes another with invoke_signed, it asserts that certain PDA signers are valid. Incorrect seed construction or insufficient validation allows attacker-controlled programs to claim false signer authority.
CPI reentrancy: Unlike EVM reentrancy which exploits mutable storage state between external calls, Solana CPI reentrancy exploits the fact that a program can be re-invoked during a CPI chain before the original invocation completes.
Program Derived Addresses (PDAs)
PDAs are deterministic addresses derived from a program ID and a set of seeds. They are used pervasively as secure storage accounts and signer authorities. PDA-related vulnerabilities include:
Bump seed reuse: Each PDA has a canonical bump seed that ensures the address falls off the ed25519 curve. Reusing non-canonical bumps or accepting user-supplied bump values allows attackers to derive alternative addresses that pass structural validation but point to attacker-controlled data.
Seed collision: Two different seed combinations that produce the same PDA address allow an attacker to use one PDA context to authenticate operations intended for a different PDA context.
User-controlled seeds: When user-supplied input feeds directly into PDA seed construction without sufficient validation, attackers can enumerate seed values to find PDAs that provide unintended access.
Pipeline Architecture
flowchart TD
classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
classDef data fill:#332a1a,stroke:#d4b870,stroke-width:2px,color:#f0e0c0
classDef highlight fill:#332519,stroke:#e8a87c,stroke-width:2px,color:#f0d8c0
A["Raw Solana Program Binary - .so"]:::data
B["ELF Parsing + CFG Construction"]:::process
C["Control Flow Graph - CFG"]:::data
D["HIR Lifting - register-based SSA"]:::process
E["High-level IR - HIR"]:::data
F["Optimization Passes"]:::process
G["Optimized HIR"]:::data
H["Analysis Orchestration"]:::process
I["192 Security Detectors"]:::highlight
J["CPI Call Graph"]:::highlight
K["E-Graph Pattern Matching"]:::highlight
L["Rust Code Generation"]:::highlight
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> I
H --> J
H --> K
H --> L
Stage 1: ELF Parsing and Disassembly
Solana programs are distributed as ELF shared objects. The frontend stage parses these binaries, extracting:
- Code sections: The
.textsection containing eBPF instructions - Data sections: Read-only data and BSS sections
- Symbol table: Function names (when present in non-stripped binaries)
- Relocations: References to external symbols and system calls
eBPF uses a RISC-style 64-bit register architecture with 11 registers (r0-r10), 87 instruction types, and fixed-width 8-byte instruction encoding. The disassembler converts raw bytes into typed instruction records, classifying each into categories: arithmetic, memory, branch, call, return, and syscall.
Stage 2: Control Flow Graph Construction
From the disassembled instructions, the frontend constructs a Control Flow Graph (CFG) identifying:
Basic blocks: Maximal sequences of instructions with no branches or branch targets within them. A basic block ends at any jump instruction or call instruction (calls can fail, creating implicit edges).
Function boundaries: Identified by call instructions and BPF-to-BPF function boundaries. When symbol table entries are available, function names are recovered. Otherwise, functions are labeled by their entry offset.
Edge types: Fall-through edges (sequential execution), conditional jump edges (taken/not-taken), unconditional jump edges, call edges (to callee entry blocks), and return edges (from return instructions back to call sites).
Stage 3: HIR Lifting
The HIR for SVM is register-based rather than stack-based, following the eBPF architecture. The HIR lifting stage transforms the CFG into a typed SSA (Static Single Assignment) form through several sub-passes:
SSA construction: Phi nodes are inserted at control flow join points, ensuring each variable is defined exactly once. This property enables precise dataflow analysis without tracking multiple reaching definitions.
Dominator tree computation: Identifies which blocks dominate each other, necessary for SSA construction and loop analysis.
Liveness analysis: Tracks which variables are live at each program point, used by optimization passes to eliminate dead assignments.
Loop analysis: Detects back-edges and natural loops, computing induction variables and loop bounds. This feeds into compute exhaustion and DoS detectors.
Discriminator analysis: Solana programs dispatch to different handlers based on discriminator values—8-byte prefixes in instruction data. The discriminator analysis pass recovers the dispatch structure, identifying which instruction variant each handler processes. Known discriminators from Anchor’s framework, the SPL token program, and the system program are recognized automatically.
Stage 4: Anchor Framework Recognition
Anchor is the dominant Solana development framework. The Anchor recognition system detects Anchor programs and extracts their structural information:
Account constraint types: Anchor’s #[account(...)] attribute generates account validation code. The analysis recognizes constraint patterns including mut (mutable), signer, has_one, constraint, seeds, bump, init, close, and token::authority.
Account types: Anchor account types (Account<'info, T>, Signer<'info>, SystemAccount<'info>, UncheckedAccount<'info>) generate different validation patterns in bytecode. Recognizing these patterns allows the analysis to precisely model what validation is and is not present.
Data account structure: For Account<T> types, the analysis extracts field names and types from the discriminator and data layout, enabling semantic analysis of account field accesses.
This recognition is essential for reducing false positives: many apparent missing-signer-check findings are suppressed when the analysis confirms Anchor’s generated validation code is present.
Stage 5: CPI Analysis
The CPI analysis stage builds a complete picture of Cross-Program Invocations:
CPI call sites: Each location where the program invokes another program is identified, along with the target program (if determinable), the accounts passed, and any seeds used for signer derivation.
CPI type classification: CPIs are classified as direct (fixed program ID), indirect (account-referenced program ID), privileged (with signer seeds), or unprivileged. Indirect CPIs with externally-supplied program addresses are the primary source of arbitrary CPI vulnerabilities.
CPI security concerns: Specific concerns are identified at each CPI call site: missing program ID validation, unvalidated return values, potential signer authority downgrades, and possible reentrancy paths.
CPI graph: The full call graph of CPI relationships is constructed, enabling inter-procedural analysis of access control and data flow across program boundaries.
Stage 6: PDA Analysis
Derivation sites: Each create_program_address or find_program_address call is identified with its seed components.
Seed classification: Seeds are classified as constant (hardcoded), account-derived (from account data), user-supplied (from instruction data), or computed. User-supplied seeds trigger elevated scrutiny.
PDA purpose: Common PDA patterns are recognized: authority PDAs (used as signing authorities), storage PDAs (used as data accounts), and canonical PDAs (the standard Anchor pattern using fixed seeds and canonical bump).
Collision analysis: Across all derivation sites within the program, the analysis checks for seed combinations that could produce the same address under different inputs—the signature of a seed collision vulnerability.
Vulnerability Detector Categories
The pipeline registers 192 Solana-specific detectors organized into security-relevant categories. Detectors run in parallel, so total analysis time scales with the slowest individual detector rather than their sum.
Input Validation
Twelve detectors cover instruction data handling:
InstructionDataLength: Programs must validate instruction data length before reading fields; short data with reads beyond the end causes panics or incorrect behaviorInstructionDataParsing: Validates that deserialization is bounded and handles malformed input gracefullyAccountIndexBounds/AccountIteratorBounds/AccountListSize: Verifies that all account array accesses are bounds-checked before useImplicitInstructionOrdering: Detects programs that assume instruction execution order without explicit ordering enforcement
Access Control
Nine detectors cover access control and authorization:
MissingSignerCheck: Instructions modifying privileged state must verify the authority account’s signature is presentConditionalOwnershipBypass: Some programs check ownership only on one branch of a conditional, creating bypass pathsConditionalValidationBypass: Similar to ownership bypass but for general constraint checksRoleBasedAccessControl: Validates that role-based access control implementations correctly enforce role boundariesTimeLockOperations: Detects time-sensitive operations without sufficient delay enforcement
Account Validation
Twenty detectors cover account validation—the largest single category because the account model creates so many distinct vulnerability surfaces:
AccountAliasAttack: Detects when the same account could appear under multiple rolesDuplicateMutableAccounts: Finds cases where two accounts expected to be distinct can be the sameMissingOwnerCheck: Instructions that read account data without verifying the account is owned by the expected programMissingWritableCheck: State-modifying operations without checking the account’s writable flagAccountTypeConfusion: Using accounts as if they have a different type than their discriminator indicatesSysvarSubstitution: Sysvar accounts (Clock, Rent, SlotHashes) passed as normal accounts can be replaced by attacker-controlled accounts if not validated by address
SPL Token Security
Twenty detectors cover SPL Token and Token-2022 security:
MintAuthoritySecurity: Validates that mint authority transfers and revocations are properly protectedTokenAuthorityConfusion: Distinguishes between mint authority and freeze authority to prevent confusion attacksTokenAccountSpoofing: Fabricated token accounts that appear valid but report incorrect balances or authoritiesToken2022Extensions: Token-2022’s transfer hooks, confidential transfers, and permanent delegate features introduce new attack surfacesTransferHookSecurity: Transfer hooks executed during token transfers can trigger arbitrary code; validates hook program securityAtaValidation: Associated Token Account derivation must be validated to prevent ATA substitution attacks
CPI Security
Eighteen detectors cover Cross-Program Invocation security:
ArbitraryCpi: User-controlled program address in CPI call—the most critical CPI vulnerabilityCpiSignerSimulation: Detecting attempts to use CPI authority for unintended operationsCpiReentrancy/ReadOnlyReentrancy: CPI-based reentrancy where state is inconsistent during a nested invocationCpiDataTamperingDetector: Instruction data passed to CPI calls can be modified if sourced from attacker-controlled accountsUncheckedCpiReturn: CPI return values (via account data modification) must be explicitly validated; unchecked returns assume success
PDA Validation
Eleven detectors cover PDA-specific vulnerabilities:
BumpSeedCanonicalization: Programs must use the canonical bump (returned byfind_program_address) rather than accepting user-supplied bumpsPdaBumpSeedReuse: Using the same bump seed across multiple operations without regenerating viafind_program_addressPdaSeedCollision: Two seed combinations within the program that could produce the same addressWeakPdaEntropy: PDA seeds with insufficient entropy allow brute-force enumeration of valid addressesPdaUserControlledSeeds: User input feeding directly into PDA seed construction without sanitization
Arithmetic
Five arithmetic detectors adapted for Solana’s u64 lamport arithmetic and BorshSerialize integer handling:
IntegerOverflow/IntegerTruncation: Arithmetic overflow in lamport calculations can drain accounts or create phantom balancesUncheckedFeeRentMath: Rent and fee calculations using unchecked arithmetic can produce incorrect amounts
State Management
Twenty-one detectors cover account lifecycle and state consistency:
AccountReinitialization: Detecting wheninitializeinstructions can be called on already-initialized accountsAccountResurrection: Accounts closed and then reopened in the same transaction inherit stale dataUnsafeDeserialization/UncheckedDeserialization: Zero-copy deserialization of untrusted account data without bounds checkingMissingRentCheck: Accounts must maintain rent-exempt status; operations that withdraw below the rent-exempt threshold cause account deletion
Risk Scoring
The risk scoring system aggregates findings into a weighted risk score:
- Severity breakdown: Critical (weight 10), High (weight 5), Medium (weight 2), Low (weight 1)
- Confidence weighting: High-confidence findings contribute more to the score than low-confidence ones
- CPI exposure factor: Programs with arbitrary CPI vulnerabilities receive an elevated baseline score
- Account validation coverage: Programs with pervasive missing-signer or missing-owner checks receive severity amplification on related findings
The final risk score maps to risk levels: Critical (80+), High (50-79), Medium (20-49), Low (0-19).
Standards Detection
The standards detection system identifies SPL token program standards present in the analyzed program:
- SPL Token: Classic SPL token interface via function selector and CPI pattern recognition
- Token-2022: Token-2022 extension detection via discriminator and extension type analysis
- Metaplex NFT: Metadata account structure and Metaplex program ID detection
- Anchor: Framework detection via discriminator format and constraint pattern recognition
Detected standards inform the detector suite: SPL Token detection enables the full set of token-specific detectors; Anchor detection enables constraint pattern recognition; custom programs receive the full generic detector suite without standard-specific optimizations.
Rust Code Generation
Unlike the EVM pipeline which generates Solidity or Yul, the SVM backend generates Rust code approximating the original program structure. The code generation stage produces:
- Function signatures: Recovered from discriminators and HIR structure
- Account parameters: Typed based on Anchor context analysis where available, otherwise inferred from validation patterns
- Instruction handlers: Control flow lifted to Rust-style if/else and loop constructs from the HIR
- PDA derivations:
create_program_addresscalls reconstructed from the seed analysis
The generated Rust is not intended to be deployable—information irreversibly lost during compilation (variable names, comments, precise types) cannot be fully recovered. It serves as a human-readable representation for audit review and a structured representation for further tooling.
Audit Workflow
The SVM analysis pipeline produces findings that map directly to remediation steps. A useful ordering for audit work:
- Critical findings first: Missing signer checks, arbitrary CPI, and account reinitialization are highest priority because they typically enable complete program compromise
- CPI graph review: Examine the CPI call graph to understand trust boundaries and identify indirect authority delegation chains
- PDA seed audit: Review all PDA derivation sites for user-controlled seed components and canonicalization
- Account validation matrix: Verify that every instruction has complete signer, owner, and constraint validation for every account it touches
- State lifecycle review: Follow account creation, initialization, use, and close operations to identify resurrection and aliasing vulnerabilities
Operating directly on deployed program binaries means any deployed Solana program is analyzable—including closed-source programs where source access is not available.