Solana Program Analysis Pipeline

Solana programs execute as eBPF (extended Berkeley Packet Filter) bytecode, a register-based architecture fundamentally different from the EVM’s stack machine. Security analysis of Solana programs requires a purpose-built pipeline that understands eBPF binary format, Solana’s account model, Cross-Program Invocation (CPI) semantics, and common account-validation framework patterns. EVM analysis tools do not transfer—the vulnerability classes are different, the execution model is different, and the binary format is different.

Solana’s Distinct Security Model

Solana’s programming model introduces security challenges that have no EVM equivalent. The design decisions that make Solana fast also create a different threat surface.

The Account Model

In Solana, accounts are passed explicitly to each instruction as a list. Programs do not have private storage in the EVM sense—they operate on data accounts passed to them. This produces a class of vulnerabilities unique to Solana:

Missing signer checks: A program must verify that the expected account has signed the transaction. Without this verification, any caller can impersonate an authority account by passing the right account address without the corresponding signature. This is among the most frequent critical vulnerabilities in Solana programs.

Missing owner checks: A program must verify that data accounts are owned by the expected program—not just that they have the right address. Passing a fabricated account with the correct structure but owned by a different program allows an attacker to provide malicious data that passes structural validation.

Account aliasing: If the same account appears multiple times in the accounts list under different expected roles, programs operating on both roles corrupt state in unexpected ways. A single mutable account passed as both the “from” and “to” of a transfer can drain or double balances.

Duplicate mutable accounts: Two accounts expected to be distinct but with the same address create dangerous state confusion during write operations.

Cross-Program Invocation (CPI)

Solana programs compose by invoking other programs through CPI. CPI carries its own security surface:

Arbitrary CPI: If the target program address in a CPI call is attacker-controlled, the instruction can invoke any program on the network—including programs designed to exploit the caller’s signer authorities.

CPI signer propagation: When a program invokes another with invoke_signed, it asserts that certain PDA signers are valid. Incorrect seed construction or insufficient validation allows attacker-controlled programs to claim false signer authority.

CPI reentrancy: Unlike EVM reentrancy which exploits mutable storage state between external calls, Solana CPI reentrancy exploits the fact that a program can be re-invoked during a CPI chain before the original invocation completes.

Program Derived Addresses (PDAs)

PDAs are deterministic addresses derived from a program ID and a set of seeds. They are used pervasively as secure storage accounts and signer authorities. PDA-related vulnerabilities include:

Bump seed reuse: Each PDA has a canonical bump seed that ensures the address falls off the ed25519 curve. Reusing non-canonical bumps or accepting user-supplied bump values allows attackers to derive alternative addresses that pass structural validation but point to attacker-controlled data.

Seed collision: Two different seed combinations that produce the same PDA address allow an attacker to use one PDA context to authenticate operations intended for a different PDA context.

User-controlled seeds: When user-supplied input feeds directly into PDA seed construction without sufficient validation, attackers can enumerate seed values to find PDAs that provide unintended access.

Pipeline Architecture

flowchart TD
    classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
    classDef data fill:#332a1a,stroke:#d4b870,stroke-width:2px,color:#f0e0c0
    classDef highlight fill:#332519,stroke:#e8a87c,stroke-width:2px,color:#f0d8c0

    A["Raw Solana Program Binary - .so"]:::data
    B["ELF Parsing + CFG Construction"]:::process
    C["Control Flow Graph - CFG"]:::data
    D["HIR Lifting - register-based SSA"]:::process
    E["High-level IR - HIR"]:::data
    F["Optimization Passes"]:::process
    G["Optimized HIR"]:::data
    H["Analysis Orchestration"]:::process
    I["192 Security Detectors"]:::highlight
    J["CPI Call Graph"]:::highlight
    K["E-Graph Pattern Matching"]:::highlight
    L["Rust Code Generation"]:::highlight

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    H --> I
    H --> J
    H --> K
    H --> L

Stage 1: ELF Parsing and Disassembly

Solana programs are distributed as ELF shared objects. The frontend stage parses these binaries, extracting:

Code sections: The .text section containing eBPF instructions
Data sections: Read-only data and BSS sections
Symbol table: Function names (when present in non-stripped binaries)
Relocations: References to external symbols and system calls

eBPF uses a RISC-style 64-bit register architecture with 11 registers (r0-r10), 87 instruction types, and fixed-width 8-byte instruction encoding. The disassembler converts raw bytes into typed instruction records, classifying each into categories: arithmetic, memory, branch, call, return, and syscall.

Stage 2: Control Flow Graph Construction

From the disassembled instructions, the frontend constructs a Control Flow Graph (CFG) identifying:

Basic blocks: Maximal sequences of instructions with no branches or branch targets within them. A basic block ends at any jump instruction or call instruction (calls can fail, creating implicit edges).

Function boundaries: Identified by call instructions and BPF-to-BPF function boundaries. When symbol table entries are available, function names are recovered. Otherwise, functions are labeled by their entry offset.

Edge types: Fall-through edges (sequential execution), conditional jump edges (taken/not-taken), unconditional jump edges, call edges (to callee entry blocks), and return edges (from return instructions back to call sites).

Stage 3: HIR Lifting

The HIR for SVM is register-based rather than stack-based, following the eBPF architecture. The HIR lifting stage transforms the CFG into a typed SSA (Static Single Assignment) form through several sub-passes:

SSA construction: Phi nodes are inserted at control flow join points, ensuring each variable is defined exactly once. This property enables precise dataflow analysis without tracking multiple reaching definitions.

Dominator tree computation: Identifies which blocks dominate each other, necessary for SSA construction and loop analysis.

Liveness analysis: Tracks which variables are live at each program point, used by optimization passes to eliminate dead assignments.

Loop analysis: Detects back-edges and natural loops, computing induction variables and loop bounds. This feeds into compute exhaustion and DoS detectors.

Discriminator analysis: Solana programs dispatch to different handlers based on discriminator values—8-byte prefixes in instruction data. The discriminator analysis pass recovers the dispatch structure, identifying which instruction variant each handler processes. Known discriminators from Anchor’s framework, the SPL token program, and the system program are recognized automatically.

Stage 4: Anchor Framework Recognition

Anchor is the dominant Solana development framework. The Anchor recognition system detects Anchor programs and extracts their structural information:

Account constraint types: Anchor’s #[account(...)] attribute generates account validation code. The analysis recognizes constraint patterns including mut (mutable), signer, has_one, constraint, seeds, bump, init, close, and token::authority.

Account types: Anchor account types (Account<'info, T>, Signer<'info>, SystemAccount<'info>, UncheckedAccount<'info>) generate different validation patterns in bytecode. Recognizing these patterns allows the analysis to precisely model what validation is and is not present.

Data account structure: For Account<T> types, the analysis extracts field names and types from the discriminator and data layout, enabling semantic analysis of account field accesses.

This recognition is essential for reducing false positives: many apparent missing-signer-check findings are suppressed when the analysis confirms Anchor’s generated validation code is present.

Stage 5: CPI Analysis

The CPI analysis stage builds a complete picture of Cross-Program Invocations:

CPI call sites: Each location where the program invokes another program is identified, along with the target program (if determinable), the accounts passed, and any seeds used for signer derivation.

CPI type classification: CPIs are classified as direct (fixed program ID), indirect (account-referenced program ID), privileged (with signer seeds), or unprivileged. Indirect CPIs with externally-supplied program addresses are the primary source of arbitrary CPI vulnerabilities.

CPI security concerns: Specific concerns are identified at each CPI call site: missing program ID validation, unvalidated return values, potential signer authority downgrades, and possible reentrancy paths.

CPI graph: The full call graph of CPI relationships is constructed, enabling inter-procedural analysis of access control and data flow across program boundaries.

Stage 6: PDA Analysis

Derivation sites: Each create_program_address or find_program_address call is identified with its seed components.

Seed classification: Seeds are classified as constant (hardcoded), account-derived (from account data), user-supplied (from instruction data), or computed. User-supplied seeds trigger elevated scrutiny.

PDA purpose: Common PDA patterns are recognized: authority PDAs (used as signing authorities), storage PDAs (used as data accounts), and canonical PDAs (the standard Anchor pattern using fixed seeds and canonical bump).

Collision analysis: Across all derivation sites within the program, the analysis checks for seed combinations that could produce the same address under different inputs—the signature of a seed collision vulnerability.

Vulnerability Detector Categories

The pipeline registers over 170 Solana-specific detectors organized into security-relevant categories. Detectors run in parallel, so total analysis time scales with the slowest individual detector rather than their sum.

Input Validation

Twelve detectors cover instruction data handling:

InstructionDataLength: Programs must validate instruction data length before reading fields; short data with reads beyond the end causes panics or incorrect behavior
InstructionDataParsing: Validates that deserialization is bounded and handles malformed input gracefully
AccountIndexBounds / AccountIteratorBounds / AccountListSize: Verifies that all account array accesses are bounds-checked before use
ImplicitInstructionOrdering: Detects programs that assume instruction execution order without explicit ordering enforcement

Access Control

Nine detectors cover access control and authorization:

MissingSignerCheck: Instructions modifying privileged state must verify the authority account’s signature is present
ConditionalOwnershipBypass: Some programs check ownership only on one branch of a conditional, creating bypass paths
ConditionalValidationBypass: Similar to ownership bypass but for general constraint checks
RoleBasedAccessControl: Validates that role-based access control implementations correctly enforce role boundaries
TimeLockOperations: Detects time-sensitive operations without sufficient delay enforcement

Account Validation

Twenty detectors cover account validation—the largest single category because the account model creates so many distinct vulnerability surfaces:

AccountAliasAttack: Detects when the same account could appear under multiple roles
DuplicateMutableAccounts: Finds cases where two accounts expected to be distinct can be the same
MissingOwnerCheck: Instructions that read account data without verifying the account is owned by the expected program
MissingWritableCheck: State-modifying operations without checking the account’s writable flag
AccountTypeConfusion: Using accounts as if they have a different type than their discriminator indicates
SysvarSubstitution: Sysvar accounts (Clock, Rent, SlotHashes) passed as normal accounts can be replaced by attacker-controlled accounts if not validated by address

SPL Token Security

Twenty detectors cover SPL Token and Token-2022 security:

MintAuthoritySecurity: Validates that mint authority transfers and revocations are properly protected
TokenAuthorityConfusion: Distinguishes between mint authority and freeze authority to prevent confusion attacks
TokenAccountSpoofing: Fabricated token accounts that appear valid but report incorrect balances or authorities
Token2022Extensions: Token-2022’s transfer hooks, confidential transfers, and permanent delegate features introduce new attack surfaces
TransferHookSecurity: Transfer hooks executed during token transfers can trigger arbitrary code; validates hook program security
AtaValidation: Associated Token Account derivation must be validated to prevent ATA substitution attacks

CPI Security

Eighteen detectors cover Cross-Program Invocation security:

ArbitraryCpi: User-controlled program address in CPI call—the most critical CPI vulnerability
CpiSignerSimulation: Detecting attempts to use CPI authority for unintended operations
CpiReentrancy / ReadOnlyReentrancy: CPI-based reentrancy where state is inconsistent during a nested invocation
CpiDataTamperingDetector: Instruction data passed to CPI calls can be modified if sourced from attacker-controlled accounts
UncheckedCpiReturn: CPI return values (via account data modification) must be explicitly validated; unchecked returns assume success

PDA Validation

Eleven detectors cover PDA-specific vulnerabilities:

BumpSeedCanonicalization: Programs must use the canonical bump (returned by find_program_address) rather than accepting user-supplied bumps
PdaBumpSeedReuse: Using the same bump seed across multiple operations without regenerating via find_program_address
PdaSeedCollision: Two seed combinations within the program that could produce the same address
WeakPdaEntropy: PDA seeds with insufficient entropy allow brute-force enumeration of valid addresses
PdaUserControlledSeeds: User input feeding directly into PDA seed construction without sanitization

Arithmetic

Five arithmetic detectors adapted for Solana’s u64 lamport arithmetic and BorshSerialize integer handling:

IntegerOverflow / IntegerTruncation: Arithmetic overflow in lamport calculations can drain accounts or create phantom balances
UncheckedFeeRentMath: Rent and fee calculations using unchecked arithmetic can produce incorrect amounts

State Management

Twenty-one detectors cover account lifecycle and state consistency:

AccountReinitialization: Detecting when initialize instructions can be called on already-initialized accounts
AccountResurrection: Accounts closed and then reopened in the same transaction inherit stale data
UnsafeDeserialization / UncheckedDeserialization: Zero-copy deserialization of untrusted account data without bounds checking
MissingRentCheck: Accounts must maintain rent-exempt status; operations that withdraw below the rent-exempt threshold cause account deletion

Risk Scoring

The risk scoring system aggregates findings into a weighted risk score:

Severity breakdown: Critical (weight 10), High (weight 5), Medium (weight 2), Low (weight 1)
Confidence weighting: High-confidence findings contribute more to the score than low-confidence ones
CPI exposure factor: Programs with arbitrary CPI vulnerabilities receive an elevated baseline score
Account validation coverage: Programs with pervasive missing-signer or missing-owner checks receive severity amplification on related findings

The final risk score maps to risk levels: Critical (80+), High (50-79), Medium (20-49), Low (0-19).

Standards Detection

The standards detection system identifies SPL token program standards present in the analyzed program:

SPL Token: Classic SPL token interface via function selector and CPI pattern recognition
Token-2022: Token-2022 extension detection via discriminator and extension type analysis
NFT Metadata: Metadata account structure and standard NFT-metadata program ID detection
Account-validation framework: Framework detection via discriminator format and constraint pattern recognition

Detected standards inform the detector suite: SPL Token detection enables the full set of token-specific detectors; account-validation framework detection enables constraint pattern recognition; custom programs receive the full generic detector suite without standard-specific optimizations.

Rust Code Generation

Unlike the EVM pipeline which generates Solidity or Yul, the SVM backend generates Rust code approximating the original program structure. The code generation stage produces:

Function signatures: Recovered from discriminators and HIR structure
Account parameters: Typed based on account-validation framework context analysis where available, otherwise inferred from validation patterns
Instruction handlers: Control flow lifted to Rust-style if/else and loop constructs from the HIR
PDA derivations: create_program_address calls reconstructed from the seed analysis

The generated Rust is not intended to be deployable—information irreversibly lost during compilation (variable names, comments, precise types) cannot be fully recovered. It serves as a human-readable representation for audit review and a structured representation for further tooling.

Audit Workflow

The SVM analysis pipeline produces findings that map directly to remediation steps. A useful ordering for audit work:

Critical findings first: Missing signer checks, arbitrary CPI, and account reinitialization are highest priority because they typically enable complete program compromise
CPI graph review: Examine the CPI call graph to understand trust boundaries and identify indirect authority delegation chains
PDA seed audit: Review all PDA derivation sites for user-controlled seed components and canonicalization
Account validation matrix: Verify that every instruction has complete signer, owner, and constraint validation for every account it touches
State lifecycle review: Follow account creation, initialization, use, and close operations to identify resurrection and aliasing vulnerabilities

Operating directly on deployed program binaries means any deployed Solana program is analyzable—including closed-source programs where source access is not available.