Bytecode-Native Security: Why Source Code Analysis Is Fighting Yesterday’s War

Most security tools analyze source code. Attackers exploit bytecode. This gap between what gets analyzed and what actually executes creates blind spots — and they tend to be in exactly the places attackers look.

The Source Code Illusion

When a developer writes Solidity, they’re writing an abstraction over bytecode. The EVM never sees variable names, comments, or code structure. It sees opcodes.

Developer writes:
    function transfer(address to, uint amount) external {
        require(balances[msg.sender] >= amount);
        balances[msg.sender] -= amount;
        balances[to] += amount;
    }

Compiler produces:
    PUSH4 0xa9059cbb    // Function selector
    EQ
    PUSH2 0x00b4
    JUMPI
    ... 47 more opcodes ...

What the EVM executes:
    0x60806040526004361061...

Source code analysis tools examine the first representation. Attackers exploit the third.

Where Source Code Diverges from Bytecode

Compiler Optimizations

The Solidity compiler doesn’t just translate — it transforms. Optimization passes can change execution order, inline checks, and alter gas forwarding behavior. A source code analyzer sees the developer’s intended logic. The bytecode may be different.

This isn’t hypothetical. The Vyper compiler bug in July 2023 caused losses exceeding $70M across major stablecoin pools. The source code was correct — @nonreentrant decorators were present. The compiler generated bytecode where the reentrancy guard was incorrectly placed and ineffective:

# Vyper source (correct — nonreentrant decorator present)
@nonreentrant("lock")
def remove_liquidity():
    # Function logic
    pass

Source code analysis result: safe. Bytecode analysis result: reentrancy guard ineffective, vulnerable.

Only bytecode analysis could have caught this before exploitation.

Proxy Patterns

A significant share of high-value DeFi contracts use proxy patterns. When you analyze a proxy’s source code, you see something like:

fallback() external payable {
    address impl = implementation();
    assembly {
        calldatacopy(0, 0, calldatasize())
        let result := delegatecall(gas(), impl, 0, calldatasize(), 0, 0)
        returndatacopy(0, 0, returndatasize())
        switch result
        case 0 { revert(0, returndatasize()) }
        default { return(0, returndatasize()) }
    }
}

The actual logic lives in the implementation contract. Analyzing the proxy’s source tells you nothing about whether the protocol is secure — you’re analyzing a forwarding layer, not the code that handles funds.

Verification Mismatches

A verified-source badge on a block explorer is a trust assumption, not a guarantee. Developers can verify different source than what was deployed. Different compiler versions produce different bytecode from identical source. Compiler bugs can silently change behavior. Metadata hashes can obscure bytecode differences.

In practice, this creates a real attack surface:

Block explorer: "Verified ✓"
Source code shown: Clean, audited functions matching known-safe patterns

Bytecode analysis reveals:
  - 3 functions not present in verified source
  - Admin backdoor at selector 0x1337beef
  - Self-destruct capability hidden in inline assembly

The green checkmark creates false confidence for source-based tools.

Inline Assembly

Critical DeFi contracts routinely use inline assembly for gas optimization or to access EVM features not exposed by high-level Solidity. Source code tools that don’t model assembly treat these blocks as opaque:

function efficientSwap(bytes calldata data) external {
    assembly {
        // 200 lines of hand-optimized assembly
    }
}

Bytecode analysis doesn’t have this problem. Every assembly block compiles to opcodes, and opcodes are all the analyzer works with. There’s no distinction between “Solidity code” and “assembly code” at the bytecode level.

Why Bytecode Analysis Is Different

Bytecode analysis starts from what actually executes:

Raw bytecode (0x608060405...)
    ↓
Control flow graph (basic blocks, jumps)
    ↓
Data flow analysis (value propagation)
    ↓
Pattern matching (vulnerability signatures)
    ↓
Semantic recovery (what the code does)

This approach doesn’t depend on source availability, compiler version, or verification status. Whether the contract was compiled with solc 0.4.24, 0.8.19, or Vyper, the bytecode is what it is. Analysis works regardless of:

Source language (Solidity, Vyper, Yul, Fe, hand-written assembly)
Compiler version
Optimization settings
Whether source has been verified anywhere

For proxy contracts, bytecode analysis can follow the delegation chain:

1. Analyze proxy bytecode → Find implementation storage slot (EIP-1967)
2. Read implementation address from on-chain storage
3. Analyze implementation bytecode → Complete security picture
4. Detect upgradeability risks from the bytecode itself

Source analysis stops at the proxy. Bytecode analysis follows the delegatecall.

Three Levels of Bytecode Representation

Effective bytecode analysis works at multiple abstraction levels simultaneously, because different analyses are best suited to different representations.

Low-Level IR. A direct representation of EVM opcodes with explicit stack operations. Used for pattern matching against known exploit signatures and gas analysis.

PUSH1 0x04
CALLDATALOAD
PUSH1 0x00
SLOAD
GT
PUSH2 0x0054
JUMPI

Medium-Level IR. Stack-free representation with explicit control flow. Used for data flow analysis, taint tracking, and value propagation.

v0 = CALLDATALOAD(0x04)
v1 = SLOAD(0x00)
if (v0 > v1) goto block_2
else goto block_3

High-Level IR. Recovered structure approaching source semantics. Used for human review and invariant checking.

function withdraw(uint256 amount) {
    require(amount > balances[msg.sender]);
    balances[msg.sender] -= amount;
    msg.sender.call{value: amount}("");
}

Each level enables analyses that are awkward or impossible at the others. The combination gives coverage that neither source analysis nor raw opcode matching alone provides.

The Hard Problems in Bytecode Analysis

Bytecode-native analysis is not easy. The gaps between source semantics and bytecode require engineering solutions.

Dynamic jump resolution. Jump destinations can be computed at runtime:

PUSH2 [computed value]
JUMP

Where does this go? The answer requires symbolic execution — tracking which values can reach the JUMP instruction and what constraints they’re under. This is the core technical challenge in CFG construction.

Storage layout recovery. Solidity’s storage layout is deterministic but not stored in the bytecode. The analyzer must reconstruct it from access patterns — recognizing that CALLER being hashed against slot 3 before SLOAD indicates a mapping with address keys at base slot 3.

Type recovery. All values are 256-bit words in the EVM. Distinguishing uint128 from address from bytes20 requires inference from how values are used — masking operations, comparison patterns, and call context.

Function boundary detection. Without source metadata, function boundaries must be inferred by following the dispatch logic and identifying which basic blocks are only reachable from a given selector comparison.

None of these are solved problems in the sense of having perfect solutions, but practical tools handle the common cases well enough to be useful.

Practical Implications

For anyone doing security work on EVM contracts:

Always verify bytecode matches source before trusting an audit. An audit of the source code is an audit of the source code. If the deployed bytecode differs — which happens more often than it should — the audit’s conclusions may not apply.

Analyze proxy implementations, not just the proxy interface. The security of a proxy-based protocol lives in the implementation. That’s what needs to be analyzed.

Treat inline assembly blocks as requiring bytecode-level review. Source code tools can’t reason about them fully. The opcodes they generate are what matters.

Get bytecode-level analysis for critical contracts, especially after upgrades. Implementation upgrades are a common point where new vulnerabilities are introduced. Bytecode analysis of the new implementation catches issues that a review of the diff in source code might miss.

The gap between source code and bytecode isn’t a technical detail — it’s a security boundary. Every compiler-level bug, every proxy mismatch, every hidden function in unverified bytecode is a vulnerability that source analysis can’t see by design.

Bytecode analysis doesn’t replace source code review. Both have value. But for security guarantees about what actually executes on-chain, bytecode is the ground truth. Source code is a description of intent; bytecode is what runs.