EVM Bytecode Analysis: From Raw Bytes to Security Insights

Every smart contract vulnerability ultimately exists in bytecode. Reentrancy, integer overflow, access control failure — the bug lives in the sequence of opcodes the EVM executes. Understanding how to analyze that bytecode is the foundation of smart contract security work.

The EVM Execution Model

The Ethereum Virtual Machine is a stack-based, 256-bit word machine with four main components:

  • Stack — max 1024 items, each 256 bits wide
  • Memory — byte-addressable, ephemeral, expands as needed
  • Storage — persistent key-value store, survives across transactions
  • Call context — msg.sender, msg.value, calldata for the current call

Bytecode is a flat sequence of opcodes that manipulate these components. There’s no type system, no variables, no functions at the bytecode level — just offsets and stack values.

Opcode Categories

The security-relevant categories:

Storage:     SLOAD / SSTORE (persistent state reads/writes)
Calls:       CALL, DELEGATECALL, STATICCALL, CREATE, CREATE2
Control:     JUMP, JUMPI, JUMPDEST
Memory:      MLOAD, MSTORE
Stack:       PUSH1-PUSH32, POP, DUP1-DUP16, SWAP1-SWAP16
Arithmetic:  ADD, SUB, MUL, DIV, MOD, EXP
Comparison:  LT, GT, EQ, ISZERO
Bitwise:     AND, OR, XOR, SHL, SHR, SAR
Termination: RETURN, REVERT, SELFDESTRUCT, STOP

Reading Bytecode

Take a simple Solidity function:

function getBalance(address user) external view returns (uint256) {
    return balances[user];
}

At the opcode level this becomes:

PUSH4 0x70a08231    ; Function selector for getBalance(address)
EQ                  ; Compare with calldata selector
PUSH2 0x0040        ; Jump destination if match
JUMPI               ; Jump if equal

; At 0x0040 (getBalance implementation):
JUMPDEST            ; Valid jump target
PUSH1 0x04          ; Offset 4 (skip selector bytes)
CALLDATALOAD        ; Load address argument from calldata
PUSH1 0x00          ; Slot 0 = base slot for balances mapping
MSTORE              ; Store address at memory[0]
PUSH1 0x20          ; 32 bytes of input
PUSH1 0x00          ; Memory offset
SHA3                ; Hash to derive storage slot: keccak256(addr . slot)
SLOAD               ; Load balance from that storage slot
PUSH1 0x00
MSTORE              ; Store result in memory
PUSH1 0x20
PUSH1 0x00
RETURN

The ABI encoding and storage layout conventions are entirely implicit. The analyzer has to reconstruct them from the access patterns.

Control Flow Analysis

Basic Block Identification

A basic block is a maximal sequence of instructions with a single entry (the first instruction or a JUMPDEST) and a single exit (JUMP, JUMPI, RETURN, REVERT, STOP, or fall-through to the next block). Given this bytecode:

0x00: PUSH1 0x0a
0x02: PUSH1 0x00
0x04: SLOAD
0x05: GT
0x06: PUSH1 0x10
0x08: JUMPI
0x09: STOP
0x0a: JUMPDEST
0x0b: PUSH1 0x01
0x0d: PUSH1 0x00
0x0f: SSTORE
0x10: JUMPDEST
0x11: STOP

Block 0 (0x00-0x08): Entry block, exits via JUMPI
Block 1 (0x09):      STOP — reachable if condition is false
Block 2 (0x0a-0x0f): Conditional branch target
Block 3 (0x10-0x11): Merge point

CFG Edges:
Block 0 → Block 2 (jump taken, condition true)
Block 0 → Block 1 (fall-through, condition false)
Block 2 → Block 3 (fall-through)

Jump Resolution

Static jumps are straightforward — the destination literal is embedded in the bytecode immediately before the JUMP:

PUSH2 0x0040
JUMP          ; Always jumps to 0x0040

Dynamic jumps are harder. The function dispatcher is the canonical example: it extracts a 4-byte selector via CALLDATALOAD + SHR 0xe0, then compares against known selectors in a chain of EQ/JUMPI pairs. The recovery algorithm traces that comparison chain and maps each selector constant to its jump target. Each target is a function entry point.

Data Flow Analysis

Stack Tracking

Track values as they flow through instructions:

Instruction          Stack (top first)
-----------          ------------------
PUSH1 0x04           [0x04]
CALLDATALOAD         [calldata[4:36]]
PUSH1 0x00           [0x00, calldata[4:36]]
SLOAD                [storage[0], calldata[4:36]]
GT                   [storage[0] > calldata[4:36]]

This tells us: the check compares a storage value against a user-supplied input.

Taint Analysis

Taint tracks which values in the program the caller can influence:

Taint Sources:
  CALLDATALOAD   User-controlled input
  CALLER         Transaction sender
  ORIGIN         Original sender (separate from CALLER in delegatecall contexts)
  CALLVALUE      ETH amount sent

Propagation Rules:
  Arithmetic     Taint flows through — ADD of tainted + constant = tainted
  Memory         Taint follows the value to/from memory
  Logic ops      Result is tainted if any operand is tainted

Sinks (where taint reaching here is a finding):
  SSTORE         Writing user input to storage
  CALL/value     Sending ETH based on user-controlled amount
  JUMPI          Conditional branch controlled by user input

A concrete example:

CALLDATALOAD    ; Tainted: user_input
PUSH1 0x02
MUL             ; Tainted: user_input * 2
DUP1
PUSH1 0x00
SSTORE          ; SINK: user-controlled value written to storage

Value Range Analysis

At conditional jumps, the branch condition constrains values on each path. If a JUMPI branches on input < 100, the taken path knows input < 100 and the fall-through knows input >= 100. This constraint propagation prunes unreachable paths and tightens the value space at each instruction.

Storage Analysis

Slot Calculation

Solidity’s storage layout is deterministic but not stored anywhere — it must be reconstructed from access patterns:

Simple variables: sequential slots
  uint256 a;     → slot 0
  uint256 b;     → slot 1
  address c;     → slot 2

Mappings: keccak256-based slots
  mapping(address => uint256) balances;  → base slot 3
  balances[addr] is at: keccak256(addr . 3)

Nested mappings:
  mapping(address => mapping(address => uint256)) allowances;  → base slot 4
  allowances[owner][spender] is at: keccak256(spender . keccak256(owner . 4))

Dynamic arrays:
  uint256[] data;  → slot 5 holds the length
  data[i] is at: keccak256(5) + i

Recognizing Mapping Access in Bytecode

The pattern for balances[msg.sender] is recognizable once you know what to look for:

CALLER              ; Push msg.sender
PUSH1 0x00          ; Mapping's base slot
MSTORE              ; Store in memory
PUSH1 0x20
PUSH1 0x00
SHA3                ; keccak256(sender . slot)
SLOAD               ; Load from computed slot

Recognized pattern: mapping_read(slot=0, key=CALLER)

Vulnerability Patterns at the Opcode Level

Reentrancy

The pattern: external call followed by storage write to a slot that was read before the call.

SLOAD         ; Read balance (slot X)
...
CALL          ; Send ETH — control leaves this contract
...
SSTORE        ; Update balance (slot X) AFTER call returns

Detection:
1. Find all CALL/DELEGATECALL sites
2. For each, check if any SSTORE follows in the same execution path
3. Check whether the written slot was read before the CALL
4. If yes → potential reentrancy

Integer Overflow (Pre-Solidity 0.8)

CALLDATALOAD  ; User input
ADD           ; Add to existing value — no bounds check
SSTORE        ; Store result

Solidity 0.8+ inserts overflow checks automatically:
  DUP2
  DUP2
  ADD           ; Compute sum
  DUP1
  DUP3
  LT            ; Check: sum < operand? (overflow indicator)
  PUSH2 0x...
  JUMPI         ; Revert if overflow

When analyzing pre-0.8 contracts, the absence of this check pattern after ADD/MUL on user-controlled values is a signal worth investigating.

Unchecked External Call Return Value

Vulnerable:
  ...
  CALL          ; Returns success boolean on stack
  POP           ; Discards it — failure is ignored

Safe:
  ...
  CALL
  ISZERO        ; Test if call failed
  PUSH2 0x...
  JUMPI         ; Branch to error handling if failed

Access Control

Privileged operation:
  SELFDESTRUCT

What should precede it:
  CALLER
  PUSH20 <owner_address>
  EQ
  PUSH2 0x...
  JUMPI         ; Proceed only if caller == owner

What a missing check looks like:
  No CALLER comparison on any path to SELFDESTRUCT
  → Any address can destroy the contract

Advanced Techniques

Symbolic Execution

Instead of running with concrete values, track symbolic expressions. Where concrete execution of PUSH1 0x05; PUSH1 0x03; ADD produces [8], symbolic execution of CALLDATALOAD; PUSH1 0x03; ADD produces [input_0 + 3]. At a branch input_0 + 3 > 10, the engine splits: taken path constrains input_0 > 7, fall-through constrains input_0 <= 7. This enumerates what conditions reach any instruction.

Abstract Interpretation

When symbolic execution hits state explosion, abstract interpretation approximates using abstract domains (sign domain, interval domain) that trade precision for tractability.

E-Graph Pattern Matching

E-graphs represent semantically equivalent expressions in a single structure, letting vulnerability detection match patterns regardless of compiled form:

Reentrancy pattern:
  storage_read(slot X) ... external_call ... storage_write(slot X)

Matches all of:
  SLOAD slot_x ... CALL ... SSTORE slot_x
  SLOAD slot_x ... DELEGATECALL ... SSTORE slot_x
  and algebraically equivalent orderings

These techniques — CFG construction, taint analysis, symbolic execution, pattern matching — compose into a pipeline where each layer builds on the previous one. Accurate basic blocks enable data flow analysis, which enables taint tracking, which separates true positives from noise.

The EVM’s simplicity (no types, no functions, just opcodes and a stack) is what makes this tractable. Every compiler emits the same instruction set, so patterns generalize across compilers and language versions.