Bytecode Analysis 101: Reading Smart Contracts Without Source Code

When you deploy a Solidity contract, the compiler produces EVM bytecode—the actual instruction sequence the network executes. Source code is documentation. Bytecode is what runs.

This matters for security analysis in two ways. First, a large fraction of deployed contracts have no verified source code (Etherscan reports roughly 40% unverified as of late 2024). Second, even when source is available, the compiled bytecode can differ from what the source implies—compiler optimizations, inlining, and ABI encoding logic are all compiler-generated and invisible at the Solidity level.

This guide walks through reading EVM bytecode and how decompilers reconstruct meaning from raw opcodes.

The EVM Execution Model

The EVM is a stack-based virtual machine. Instructions operate on a 256-bit stack; contracts also have volatile memory (wiped between calls) and persistent storage (the key-value store that holds balances, flags, and state).

Bytecode is a flat sequence of single-byte opcodes, some followed by immediate data operands. The simplest way to understand bytecode is to read it opcode by opcode.

A basic contract initialization starts like this:

PUSH1 0x80
PUSH1 0x40
MSTORE       // memory[0x40] = 0x80  (set free memory pointer)

This three-instruction sequence appears near the top of essentially every Solidity contract. Recognizing canonical patterns like this is where bytecode reading begins.

Opcode Categories Worth Knowing

Stack operations — push, duplicate, swap, discard:

PUSH1 0x01   // push 1-byte literal
DUP1         // duplicate top of stack
SWAP1        // swap top two values
POP          // discard top

Storage — read and write persistent state:

SLOAD        // read: push storage[stack[0]]
SSTORE       // write: storage[stack[0]] = stack[1]

External calls — interact with other contracts:

CALL         // call external contract, can send ETH
STATICCALL   // read-only call, no state changes allowed
DELEGATECALL // execute external code in the current contract's context

Control flow — conditional and unconditional branches:

JUMP         // unconditional jump to JUMPDEST
JUMPI        // conditional: jump if stack[1] != 0
JUMPDEST     // valid jump target (required marker)

Two Patterns That Appear Everywhere

The function dispatcher. Every Solidity contract routes incoming calls through a dispatch table. The first four bytes of calldata identify the function being called. The bytecode extracts these bytes and jumps to the matching handler:

PUSH1 0x04
CALLDATASIZE
LT
PUSH2 0x00ef
JUMPI          // revert if calldata < 4 bytes (no selector)

PUSH1 0x00
CALLDATALOAD
PUSH1 0xe0
SHR            // extract first 4 bytes = function selector

PUSH4 0xa9059cbb  // transfer(address,uint256) selector
EQ
PUSH2 0x0123
JUMPI          // jump to transfer handler

Mapping storage access. mapping(address => uint256) doesn’t store values at predictable slots. Instead, Solidity computes keccak256(key ++ slot_index) to derive each entry’s location:

CALLER         // msg.sender
PUSH1 0x00     // mapping's base slot
MSTORE
PUSH1 0x20
PUSH1 0x00
SHA3           // keccak256(abi.encode(key, slot))
SLOAD          // read value at computed slot

Recognizing this pattern is how decompilers recover balances[msg.sender] from raw opcodes.

The Decompilation Pipeline

Decompilers convert raw bytecode to readable, analyzable form through a sequence of lifting passes.

Stage 1: Disassembly

Raw bytes become labeled opcodes with their operands:

Bytecode:  60 80 60 40 52 34 80 15 ...
Decoded:   PUSH1 0x80
           PUSH1 0x40
           MSTORE
           CALLVALUE
           DUP1
           ISZERO

Human-readable, but still structurally flat.

Stage 2: Control Flow Analysis

The disassembly is partitioned into basic blocks—sequences with no internal branches—and edges between blocks are computed to form a Control Flow Graph (CFG):

Function: transfer(address,uint256)  [selector 0xa9059cbb]

  Block 0x0123:                    Block 0x0156:
    SLOAD balances[sender]           SUB balances[sender], amount
    DUP amount                       SSTORE balances[sender]
    LT                               ADD balances[to]
    JUMPI → 0x0189 (fail)           SSTORE balances[to]
    JUMP → 0x0156 (pass)            RETURN

  Block 0x0189:
    PUSH "Insufficient balance"
    REVERT

Stage 3: Data Flow Lifting to HIR

Values are traced through the stack to recover named expressions. This produces a High-level Intermediate Representation (HIR) that resembles structured pseudocode:

function transfer(address to, uint256 amount):
    let balance = SLOAD(keccak256(msg.sender, 0))
    require(balance >= amount, "Insufficient balance")
    SSTORE(keccak256(msg.sender, 0), balance - amount)
    SSTORE(keccak256(to, 0), SLOAD(keccak256(to, 0)) + amount)
    return true

Variable names like balances are inferred from storage access patterns. No source code required.

Stage 4: Vulnerability Detection

The HIR is analyzed against known vulnerability patterns. For the transfer above, the ordering is correct (checks then effects). But if the sequence were reversed, it would produce something like:

Finding: Reentrancy
Severity: CRITICAL
Confidence: 0.97

Pattern: External call precedes state update
  SLOAD  balances[CALLER]    ; offset 0x1A3
  CALL   msg.sender          ; offset 0x1F8  ← external call
  SSTORE balances[CALLER]    ; offset 0x22C  ← state update after call

Recommendation: Update balances[CALLER] before the CALL instruction

What Source-Level Tools Miss

Compiler-introduced vulnerabilities. In July 2023, Curve Finance lost approximately $70 million when attackers exploited a Vyper compiler bug that silently broke reentrancy guards in specific compiler versions. The guard was present in the source code and would have passed any source-level audit. Bytecode analysis would have detected the missing guard regardless of what the source said.

Optimizer reordering. The Solidity optimizer can reorder operations and inline functions in ways that change security-relevant execution order. Bytecode analysis sees the final compiled result, not the pre-optimization form.

Proxy implementation drift. Upgradeable proxies separate logic from storage. The proxy source may look safe while the implementation it delegates to contains vulnerabilities. Following DELEGATECALL chains requires bytecode-level analysis.

Storage slot collisions. Proxies not using EIP-1967 standardized slots can have the proxy’s own storage overwritten by the implementation. Computing which slots each contract reads and writes—and comparing them—requires access to the bytecode.

Practical Example: Reentrancy in Unknown Bytecode

Here is a contract with no verified source. One function’s bytecode:

// Function: 0x2e1a7d4d (withdraw(uint256))

CALLER
PUSH1 0x00
MSTORE
PUSH1 0x20
PUSH1 0x00
SHA3           // compute balances[msg.sender] slot
SLOAD          // load balance

CALLDATALOAD   // load amount argument
DUP2
LT             // balance >= amount check
PUSH2 0x01f8
JUMPI          // jump to CALL if sufficient

PUSH1 0x00
DUP1
DUP3           // amount
CALLER         // target: msg.sender
GAS
CALL           // external call to msg.sender

PUSH1 0x00     // new balance = 0
CALLER
PUSH1 0x00
MSTORE
PUSH1 0x20
PUSH1 0x00
SHA3
SSTORE         // update balance AFTER the call — vulnerable

The sequence: SLOAD (read balance) → CALL (send ETH to msg.sender) → SSTORE (zero balance). The state update comes after the external call. An attacker’s receive() function can call withdraw() again before SSTORE runs, draining the contract through repeated recursive calls before any balance update takes effect.

Reference Tools

For bytecode exploration alongside decompilation output:

  • evm.codes — complete EVM opcode reference with gas costs and stack behavior
  • ethervm.io — interactive disassembler
  • etherscan.io — raw bytecode and deployed code hash for any verified address

Understanding what decompilers produce—and the opcodes underneath—helps evaluate findings and spot false positives. The further from bytecode your analysis sits, the more it is reasoning about intent rather than execution.

For a closer look at specific vulnerability classes detectable in bytecode, see the guides on reentrancy attacks and proxy vulnerabilities.