Bytecode Analysis 101: Reading Smart Contracts Without Source Code
When you deploy a Solidity contract, the compiler produces EVM bytecode—the actual instruction sequence the network executes. Source code is documentation. Bytecode is what runs.
This matters for security analysis in two ways. First, a large fraction of deployed contracts have no verified source code (Etherscan reports roughly 40% unverified as of late 2024). Second, even when source is available, the compiled bytecode can differ from what the source implies—compiler optimizations, inlining, and ABI encoding logic are all compiler-generated and invisible at the Solidity level.
This guide walks through reading EVM bytecode and how decompilers reconstruct meaning from raw opcodes.
The EVM Execution Model
The EVM is a stack-based virtual machine. Instructions operate on a 256-bit stack; contracts also have volatile memory (wiped between calls) and persistent storage (the key-value store that holds balances, flags, and state).
Bytecode is a flat sequence of single-byte opcodes, some followed by immediate data operands. The simplest way to understand bytecode is to read it opcode by opcode.
A basic contract initialization starts like this:
PUSH1 0x80
PUSH1 0x40
MSTORE // memory[0x40] = 0x80 (set free memory pointer)
This three-instruction sequence appears near the top of essentially every Solidity contract. Recognizing canonical patterns like this is where bytecode reading begins.
Opcode Categories Worth Knowing
Stack operations — push, duplicate, swap, discard:
PUSH1 0x01 // push 1-byte literal
DUP1 // duplicate top of stack
SWAP1 // swap top two values
POP // discard top
Storage — read and write persistent state:
SLOAD // read: push storage[stack[0]]
SSTORE // write: storage[stack[0]] = stack[1]
External calls — interact with other contracts:
CALL // call external contract, can send ETH
STATICCALL // read-only call, no state changes allowed
DELEGATECALL // execute external code in the current contract's context
Control flow — conditional and unconditional branches:
JUMP // unconditional jump to JUMPDEST
JUMPI // conditional: jump if stack[1] != 0
JUMPDEST // valid jump target (required marker)
Two Patterns That Appear Everywhere
The function dispatcher. Every Solidity contract routes incoming calls through a dispatch table. The first four bytes of calldata identify the function being called. The bytecode extracts these bytes and jumps to the matching handler:
PUSH1 0x04
CALLDATASIZE
LT
PUSH2 0x00ef
JUMPI // revert if calldata < 4 bytes (no selector)
PUSH1 0x00
CALLDATALOAD
PUSH1 0xe0
SHR // extract first 4 bytes = function selector
PUSH4 0xa9059cbb // transfer(address,uint256) selector
EQ
PUSH2 0x0123
JUMPI // jump to transfer handler
Mapping storage access. mapping(address => uint256) doesn’t store values at predictable slots. Instead, Solidity computes keccak256(key ++ slot_index) to derive each entry’s location:
CALLER // msg.sender
PUSH1 0x00 // mapping's base slot
MSTORE
PUSH1 0x20
PUSH1 0x00
SHA3 // keccak256(abi.encode(key, slot))
SLOAD // read value at computed slot
Recognizing this pattern is how decompilers recover balances[msg.sender] from raw opcodes.
The Decompilation Pipeline
Decompilers convert raw bytecode to readable, analyzable form through a sequence of lifting passes.
Stage 1: Disassembly
Raw bytes become labeled opcodes with their operands:
Bytecode: 60 80 60 40 52 34 80 15 ...
Decoded: PUSH1 0x80
PUSH1 0x40
MSTORE
CALLVALUE
DUP1
ISZERO
Human-readable, but still structurally flat.
Stage 2: Control Flow Analysis
The disassembly is partitioned into basic blocks—sequences with no internal branches—and edges between blocks are computed to form a Control Flow Graph (CFG):
Function: transfer(address,uint256) [selector 0xa9059cbb]
Block 0x0123: Block 0x0156:
SLOAD balances[sender] SUB balances[sender], amount
DUP amount SSTORE balances[sender]
LT ADD balances[to]
JUMPI → 0x0189 (fail) SSTORE balances[to]
JUMP → 0x0156 (pass) RETURN
Block 0x0189:
PUSH "Insufficient balance"
REVERT
Stage 3: Data Flow Lifting to HIR
Values are traced through the stack to recover named expressions. This produces a High-level Intermediate Representation (HIR) that resembles structured pseudocode:
function transfer(address to, uint256 amount):
let balance = SLOAD(keccak256(msg.sender, 0))
require(balance >= amount, "Insufficient balance")
SSTORE(keccak256(msg.sender, 0), balance - amount)
SSTORE(keccak256(to, 0), SLOAD(keccak256(to, 0)) + amount)
return true
Variable names like balances are inferred from storage access patterns. No source code required.
Stage 4: Vulnerability Detection
The HIR is analyzed against known vulnerability patterns. For the transfer above, the ordering is correct (checks then effects). But if the sequence were reversed, it would produce something like:
Finding: Reentrancy
Severity: CRITICAL
Confidence: 0.97
Pattern: External call precedes state update
SLOAD balances[CALLER] ; offset 0x1A3
CALL msg.sender ; offset 0x1F8 ← external call
SSTORE balances[CALLER] ; offset 0x22C ← state update after call
Recommendation: Update balances[CALLER] before the CALL instruction
What Source-Level Tools Miss
Compiler-introduced vulnerabilities. In July 2023, Curve Finance lost approximately $70 million when attackers exploited a Vyper compiler bug that silently broke reentrancy guards in specific compiler versions. The guard was present in the source code and would have passed any source-level audit. Bytecode analysis would have detected the missing guard regardless of what the source said.
Optimizer reordering. The Solidity optimizer can reorder operations and inline functions in ways that change security-relevant execution order. Bytecode analysis sees the final compiled result, not the pre-optimization form.
Proxy implementation drift. Upgradeable proxies separate logic from storage. The proxy source may look safe while the implementation it delegates to contains vulnerabilities. Following DELEGATECALL chains requires bytecode-level analysis.
Storage slot collisions. Proxies not using EIP-1967 standardized slots can have the proxy’s own storage overwritten by the implementation. Computing which slots each contract reads and writes—and comparing them—requires access to the bytecode.
Practical Example: Reentrancy in Unknown Bytecode
Here is a contract with no verified source. One function’s bytecode:
// Function: 0x2e1a7d4d (withdraw(uint256))
CALLER
PUSH1 0x00
MSTORE
PUSH1 0x20
PUSH1 0x00
SHA3 // compute balances[msg.sender] slot
SLOAD // load balance
CALLDATALOAD // load amount argument
DUP2
LT // balance >= amount check
PUSH2 0x01f8
JUMPI // jump to CALL if sufficient
PUSH1 0x00
DUP1
DUP3 // amount
CALLER // target: msg.sender
GAS
CALL // external call to msg.sender
PUSH1 0x00 // new balance = 0
CALLER
PUSH1 0x00
MSTORE
PUSH1 0x20
PUSH1 0x00
SHA3
SSTORE // update balance AFTER the call — vulnerable
The sequence: SLOAD (read balance) → CALL (send ETH to msg.sender) → SSTORE (zero balance). The state update comes after the external call. An attacker’s receive() function can call withdraw() again before SSTORE runs, draining the contract through repeated recursive calls before any balance update takes effect.
Reference Tools
For bytecode exploration alongside decompilation output:
- evm.codes — complete EVM opcode reference with gas costs and stack behavior
- ethervm.io — interactive disassembler
- etherscan.io — raw bytecode and deployed code hash for any verified address
Understanding what decompilers produce—and the opcodes underneath—helps evaluate findings and spot false positives. The further from bytecode your analysis sits, the more it is reasoning about intent rather than execution.
For a closer look at specific vulnerability classes detectable in bytecode, see the guides on reentrancy attacks and proxy vulnerabilities.