EVM Bytecode Analysis: From Raw Bytes to Security Insights
Every smart contract vulnerability ultimately exists in bytecode. Reentrancy, integer overflow, access control failure — the bug lives in the sequence of opcodes the EVM executes. Understanding how to analyze that bytecode is the foundation of smart contract security work.
The EVM Execution Model
The Ethereum Virtual Machine is a stack-based, 256-bit word machine with four main components:
- Stack — max 1024 items, each 256 bits wide
- Memory — byte-addressable, ephemeral, expands as needed
- Storage — persistent key-value store, survives across transactions
- Call context — msg.sender, msg.value, calldata for the current call
Bytecode is a flat sequence of opcodes that manipulate these components. There’s no type system, no variables, no functions at the bytecode level — just offsets and stack values.
Opcode Categories
The security-relevant categories:
Storage: SLOAD / SSTORE (persistent state reads/writes)
Calls: CALL, DELEGATECALL, STATICCALL, CREATE, CREATE2
Control: JUMP, JUMPI, JUMPDEST
Memory: MLOAD, MSTORE
Stack: PUSH1-PUSH32, POP, DUP1-DUP16, SWAP1-SWAP16
Arithmetic: ADD, SUB, MUL, DIV, MOD, EXP
Comparison: LT, GT, EQ, ISZERO
Bitwise: AND, OR, XOR, SHL, SHR, SAR
Termination: RETURN, REVERT, SELFDESTRUCT, STOP
Reading Bytecode
Take a simple Solidity function:
function getBalance(address user) external view returns (uint256) {
return balances[user];
}
At the opcode level this becomes:
PUSH4 0x70a08231 ; Function selector for getBalance(address)
EQ ; Compare with calldata selector
PUSH2 0x0040 ; Jump destination if match
JUMPI ; Jump if equal
; At 0x0040 (getBalance implementation):
JUMPDEST ; Valid jump target
PUSH1 0x04 ; Offset 4 (skip selector bytes)
CALLDATALOAD ; Load address argument from calldata
PUSH1 0x00 ; Slot 0 = base slot for balances mapping
MSTORE ; Store address at memory[0]
PUSH1 0x20 ; 32 bytes of input
PUSH1 0x00 ; Memory offset
SHA3 ; Hash to derive storage slot: keccak256(addr . slot)
SLOAD ; Load balance from that storage slot
PUSH1 0x00
MSTORE ; Store result in memory
PUSH1 0x20
PUSH1 0x00
RETURN
The ABI encoding and storage layout conventions are entirely implicit. The analyzer has to reconstruct them from the access patterns.
Control Flow Analysis
Basic Block Identification
A basic block is a maximal sequence of instructions with a single entry (the first instruction or a JUMPDEST) and a single exit (JUMP, JUMPI, RETURN, REVERT, STOP, or fall-through to the next block). Given this bytecode:
0x00: PUSH1 0x0a
0x02: PUSH1 0x00
0x04: SLOAD
0x05: GT
0x06: PUSH1 0x10
0x08: JUMPI
0x09: STOP
0x0a: JUMPDEST
0x0b: PUSH1 0x01
0x0d: PUSH1 0x00
0x0f: SSTORE
0x10: JUMPDEST
0x11: STOP
Block 0 (0x00-0x08): Entry block, exits via JUMPI
Block 1 (0x09): STOP — reachable if condition is false
Block 2 (0x0a-0x0f): Conditional branch target
Block 3 (0x10-0x11): Merge point
CFG Edges:
Block 0 → Block 2 (jump taken, condition true)
Block 0 → Block 1 (fall-through, condition false)
Block 2 → Block 3 (fall-through)
Jump Resolution
Static jumps are straightforward — the destination literal is embedded in the bytecode immediately before the JUMP:
PUSH2 0x0040
JUMP ; Always jumps to 0x0040
Dynamic jumps are harder. The function dispatcher is the canonical example: it extracts a 4-byte selector via CALLDATALOAD + SHR 0xe0, then compares against known selectors in a chain of EQ/JUMPI pairs. The recovery algorithm traces that comparison chain and maps each selector constant to its jump target. Each target is a function entry point.
Data Flow Analysis
Stack Tracking
Track values as they flow through instructions:
Instruction Stack (top first)
----------- ------------------
PUSH1 0x04 [0x04]
CALLDATALOAD [calldata[4:36]]
PUSH1 0x00 [0x00, calldata[4:36]]
SLOAD [storage[0], calldata[4:36]]
GT [storage[0] > calldata[4:36]]
This tells us: the check compares a storage value against a user-supplied input.
Taint Analysis
Taint tracks which values in the program the caller can influence:
Taint Sources:
CALLDATALOAD User-controlled input
CALLER Transaction sender
ORIGIN Original sender (separate from CALLER in delegatecall contexts)
CALLVALUE ETH amount sent
Propagation Rules:
Arithmetic Taint flows through — ADD of tainted + constant = tainted
Memory Taint follows the value to/from memory
Logic ops Result is tainted if any operand is tainted
Sinks (where taint reaching here is a finding):
SSTORE Writing user input to storage
CALL/value Sending ETH based on user-controlled amount
JUMPI Conditional branch controlled by user input
A concrete example:
CALLDATALOAD ; Tainted: user_input
PUSH1 0x02
MUL ; Tainted: user_input * 2
DUP1
PUSH1 0x00
SSTORE ; SINK: user-controlled value written to storage
Value Range Analysis
At conditional jumps, the branch condition constrains values on each path. If a JUMPI branches on input < 100, the taken path knows input < 100 and the fall-through knows input >= 100. This constraint propagation prunes unreachable paths and tightens the value space at each instruction.
Storage Analysis
Slot Calculation
Solidity’s storage layout is deterministic but not stored anywhere — it must be reconstructed from access patterns:
Simple variables: sequential slots
uint256 a; → slot 0
uint256 b; → slot 1
address c; → slot 2
Mappings: keccak256-based slots
mapping(address => uint256) balances; → base slot 3
balances[addr] is at: keccak256(addr . 3)
Nested mappings:
mapping(address => mapping(address => uint256)) allowances; → base slot 4
allowances[owner][spender] is at: keccak256(spender . keccak256(owner . 4))
Dynamic arrays:
uint256[] data; → slot 5 holds the length
data[i] is at: keccak256(5) + i
Recognizing Mapping Access in Bytecode
The pattern for balances[msg.sender] is recognizable once you know what to look for:
CALLER ; Push msg.sender
PUSH1 0x00 ; Mapping's base slot
MSTORE ; Store in memory
PUSH1 0x20
PUSH1 0x00
SHA3 ; keccak256(sender . slot)
SLOAD ; Load from computed slot
Recognized pattern: mapping_read(slot=0, key=CALLER)
Vulnerability Patterns at the Opcode Level
Reentrancy
The pattern: external call followed by storage write to a slot that was read before the call.
SLOAD ; Read balance (slot X)
...
CALL ; Send ETH — control leaves this contract
...
SSTORE ; Update balance (slot X) AFTER call returns
Detection:
1. Find all CALL/DELEGATECALL sites
2. For each, check if any SSTORE follows in the same execution path
3. Check whether the written slot was read before the CALL
4. If yes → potential reentrancy
Integer Overflow (Pre-Solidity 0.8)
CALLDATALOAD ; User input
ADD ; Add to existing value — no bounds check
SSTORE ; Store result
Solidity 0.8+ inserts overflow checks automatically:
DUP2
DUP2
ADD ; Compute sum
DUP1
DUP3
LT ; Check: sum < operand? (overflow indicator)
PUSH2 0x...
JUMPI ; Revert if overflow
When analyzing pre-0.8 contracts, the absence of this check pattern after ADD/MUL on user-controlled values is a signal worth investigating.
Unchecked External Call Return Value
Vulnerable:
...
CALL ; Returns success boolean on stack
POP ; Discards it — failure is ignored
Safe:
...
CALL
ISZERO ; Test if call failed
PUSH2 0x...
JUMPI ; Branch to error handling if failed
Access Control
Privileged operation:
SELFDESTRUCT
What should precede it:
CALLER
PUSH20 <owner_address>
EQ
PUSH2 0x...
JUMPI ; Proceed only if caller == owner
What a missing check looks like:
No CALLER comparison on any path to SELFDESTRUCT
→ Any address can destroy the contract
Advanced Techniques
Symbolic Execution
Instead of running with concrete values, track symbolic expressions. Where concrete execution of PUSH1 0x05; PUSH1 0x03; ADD produces [8], symbolic execution of CALLDATALOAD; PUSH1 0x03; ADD produces [input_0 + 3]. At a branch input_0 + 3 > 10, the engine splits: taken path constrains input_0 > 7, fall-through constrains input_0 <= 7. This enumerates what conditions reach any instruction.
Abstract Interpretation
When symbolic execution hits state explosion, abstract interpretation approximates using abstract domains (sign domain, interval domain) that trade precision for tractability.
E-Graph Pattern Matching
E-graphs represent semantically equivalent expressions in a single structure, letting vulnerability detection match patterns regardless of compiled form:
Reentrancy pattern:
storage_read(slot X) ... external_call ... storage_write(slot X)
Matches all of:
SLOAD slot_x ... CALL ... SSTORE slot_x
SLOAD slot_x ... DELEGATECALL ... SSTORE slot_x
and algebraically equivalent orderings
These techniques — CFG construction, taint analysis, symbolic execution, pattern matching — compose into a pipeline where each layer builds on the previous one. Accurate basic blocks enable data flow analysis, which enables taint tracking, which separates true positives from noise.
The EVM’s simplicity (no types, no functions, just opcodes and a stack) is what makes this tractable. Every compiler emits the same instruction set, so patterns generalize across compilers and language versions.