Storage Layout Reconstruction as Type Inference
Hindley-Milner-Style Type Recovery for EVM Bytecode
Formal framework for recovering Solidity storage layouts from EVM bytecode using constraint-based type inference, with extensions for packed storage and proxy patterns.
Storage Layout Reconstruction as Type Inference
Abstract
The EVM provides 2²⁵⁶ addressable 256-bit storage slots with no runtime type information. Recovering the original high-level storage layout from compiled bytecode is a prerequisite for meaningful security analysis, yet most existing tools rely on heuristic pattern matching that breaks on non-standard layouts. This paper reformulates the storage reconstruction problem as a type inference problem. We define a type system for storage slots, give formal rules for generating type constraints from SSTORE/SLOAD access patterns, and show how Hindley-Milner-style unification recovers the original layout. Extensions handle packed storage via mask analysis and polymorphic slots arising from proxy patterns. The framework provides a principled foundation for storage analysis that degrades gracefully: when full reconstruction is impossible, the constraint system reports precisely which slots remain ambiguous and why.
Introduction
Security analysis of EVM smart contracts depends on understanding what storage slots represent. A reentrancy detector that knows slot 3 holds a mapping(address => uint256) of balances can recognize the checks-effects-interactions pattern. Without that knowledge, it sees only raw 256-bit reads and writes to opaque locations.
Source-verified contracts sidestep this problem — the compiler emits storage layout metadata that maps slots to variable names and types. But a significant fraction of deployed contracts are unverified. Even for verified contracts, proxy patterns introduce layouts that differ from what the source implies: the proxy’s storage and the implementation’s storage coexist at the same address, governed by different compilation units.
Existing decompilers reconstruct storage layouts through ad-hoc pattern recognition: if a slot is accessed via keccak256(key . slot), label it a mapping; if via keccak256(slot), label it a dynamic array. These heuristics work for straightforward cases but cannot compose. When a mapping contains structs that contain dynamic arrays, the nesting of hash computations must be tracked precisely, and each level constrains the types of the others.
Type inference provides the right abstraction. Each storage access in the bytecode generates a constraint on the type of the accessed slot. Unification combines constraints, and the resulting type environment is the reconstructed layout. This is the same structure as Hindley-Milner type inference — adapted from lambda terms to storage access patterns.
The EVM Storage Model
EVM persistent storage is a mapping from 256-bit keys to 256-bit values:
σ : uint256 → uint256
The SLOAD instruction reads σ[k] and the SSTORE instruction writes σ[k] := v. At the EVM level, keys and values are untyped 256-bit words. All type structure is a compile-time artifact that the Solidity compiler erases.
Solidity’s Storage Layout Rules
The Solidity compiler assigns storage according to deterministic rules specified in the Solidity documentation:
Value types. Variables of types uint256, int256, address, bool, bytes32, and similar fixed-size types occupy slots sequentially starting from slot 0. Multiple variables smaller than 256 bits may be packed into a single slot, right-aligned.
Mappings. A mapping(K => V) declared at slot p stores the value for key k at slot keccak256(k . p), where . denotes concatenation and k is left-padded to 32 bytes.
Dynamic arrays. A T[] declared at slot p stores its length at slot p and element i at slot keccak256(p) + i * ceil(sizeof(T) / 32).
Structs. Struct members are laid out sequentially starting from the struct’s base slot, following the same packing rules as top-level variables.
Nested compositions. These rules compose. A mapping(address => uint256[]) at slot p stores the length of the array for key k at keccak256(k . p), and element i of that array at keccak256(keccak256(k . p)) + i.
Storage Types
We define a type language for storage slots:
τ ::= α -- type variable
| uint256 | int256 | address
| bool | bytesN
| mapping(τ, τ) -- mapping from key type to value type
| array(τ) -- dynamic array of element type
| struct(τ₁, τ₂, ..., τₙ) -- struct with ordered field types
| packed(⟨τ₁, b₁, w₁⟩, ..., -- packed slot: type, bit-offset, bit-width
⟨τₙ, bₙ, wₙ⟩)
| ref(τ) -- reference to a derived storage region
A type environment Γ maps slot expressions to types:
Γ : SlotExpr → τ
where SlotExpr is the grammar of slot computations:
s ::= n -- literal slot number
| keccak256(e . s) -- mapping access
| keccak256(s) -- array base
| keccak256(s) + e -- array element
| s + n -- struct field offset
The reconstruction problem is: given a set of observed SLOAD/SSTORE operations with their computed slot expressions, infer a type environment Γ that is consistent with all observations.
Constraint Generation
Each opcode pattern in the bytecode generates one or more type constraints. We write constraints as judgments of the form s : τ, meaning “slot expression s has type τ.” A constraint set C is a collection of such judgments accumulated during analysis.
Inference Rules
We present the constraint generation rules using standard inference rule notation. In each rule, the premises (above the line) describe the conditions observed in the bytecode, and the conclusion (below the line) states the generated constraint.
Direct slot access. A plain SLOAD or SSTORE on a literal slot number with no surrounding arithmetic generates a simple value type constraint:
SLOAD(n) no mask operations on result
─────────────────────────────────────────────
C ← C ∪ { n : α }
where α is a fresh type variable. The variable will be refined by subsequent rules.
Mapping access. When the slot is computed as keccak256(key . base), the base slot is constrained to a mapping type:
slot = keccak256(k . p) SLOAD(slot) or SSTORE(slot, v)
─────────────────────────────────────────────────────────────
C ← C ∪ { p : mapping(τ_k, τ_v) }
where τ_k = typeOf(k), τ_v = fresh α
The key type τ_k is inferred from the key operand: if k is masked to 20 bytes, τ_k = address; if k results from a comparison or is a small integer, other type rules apply.
Dynamic array length. When slot p is read and used as a loop bound or comparison target, and keccak256(p) appears as a base for indexed access:
SLOAD(p) used as loop bound
keccak256(p) + i·stride appears as slot in SLOAD/SSTORE
──────────────────────────────────────────────────────────────
C ← C ∪ { p : array(α) }
where α is fresh, stride = sizeof(α) / 32
Struct field offset. When multiple slots p, p+1, p+2, … are accessed with different type constraints:
p : τ₁ ∈ C p+1 : τ₂ ∈ C ... p+n : τₙ ∈ C
τ₁, ..., τₙ are distinct types or access patterns
──────────────────────────────────────────────────────────────
C ← C ∪ { p : struct(τ₁, τ₂, ..., τₙ) }
Address type. When the value loaded from a slot is used as a CALL target or masked with the address bitmask:
v = SLOAD(s) v AND 0xffffffffffffffffffffffffffffffffffffffff
──────────────────────────────────────────────────────────────────
C ← C ∪ { s : address }
Boolean type. When a stored value is constrained to {0, 1}:
v = SLOAD(s) ISZERO(ISZERO(v)) or v used as JUMPI condition
──────────────────────────────────────────────────────────────────
C ← C ∪ { s : bool }
The full constraint generation traverses the control flow graph, applying these rules at each SLOAD and SSTORE site. The result is a constraint set C containing all observed type information.
flowchart TD
classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
classDef data fill:#332a1a,stroke:#d4b870,stroke-width:2px,color:#f0e0c0
classDef success fill:#1a331a,stroke:#a8c686,stroke-width:2px,color:#c8e8b0
A["EVM Bytecode"]:::data
B["CFG Construction"]:::process
C["SLOAD/SSTORE Identification"]:::process
D["Slot Expression Recovery"]:::process
E["Constraint Generation"]:::process
F["Constraint Set C"]:::data
G["Unification"]:::process
H["Type Environment Gamma"]:::success
A --> B --> C --> D --> E --> F --> G --> H
Type Unification
Given a constraint set C, unification produces the most general type environment consistent with all constraints. The algorithm follows the structure of Robinson’s unification, adapted for storage types.
The Unification Algorithm
A substitution θ is a mapping from type variables to types. Applying θ to a type replaces each variable with its binding. Two types τ₁ and τ₂ unify if there exists a substitution θ such that θ(τ₁) = θ(τ₂).
The core unification rules:
unify(α, τ) = [α ↦ τ] if α ∉ FV(τ)
unify(τ, α) = [α ↦ τ] if α ∉ FV(τ)
unify(uint256, uint256) = ∅
unify(address, address) = ∅
... (identity for all base types)
unify(mapping(K₁,V₁), mapping(K₂,V₂))
= let θ₁ = unify(K₁, K₂)
in let θ₂ = unify(θ₁(V₁), θ₁(V₂))
in θ₂ ∘ θ₁
unify(array(T₁), array(T₂))
= unify(T₁, T₂)
unify(packed(fields₁), packed(fields₂))
= unify each field pairwise where bit-ranges overlap
unify(τ₁, τ₂) = ⊥ (error) otherwise
The occurs check (α ∉ FV(τ)) prevents infinite types. In storage reconstruction, an occurs check failure signals a pathological access pattern — for example, a slot whose value is used as the base slot for computing its own address.
Constraint Solving
The full algorithm iterates over the constraint set:
SOLVE(C):
θ ← ∅
for each (s : τ) ∈ C:
if s ∈ dom(θ):
θ ← θ ∘ unify(θ(Γ(s)), θ(τ))
else:
Γ(s) ← τ
return θ(Γ)
When unification fails for a particular slot — when two constraints require incompatible types — the algorithm records the conflict rather than aborting. This is a key difference from classical HM inference in programming languages: in storage reconstruction, type errors are information. A slot that must be both mapping(address, uint256) and uint256 may indicate a proxy pattern, an optimizer artifact, or a misidentified slot expression.
Subtyping Considerations
The storage type system admits a limited form of subtyping. A uint160 value stored in a 256-bit slot is compatible with address (since addresses are 160-bit values). Rather than full subtyping, we handle this through type widening: when uint160 and address constraints appear on the same slot, unification produces address as the more specific type.
unify(uintN, address) = address if N ≤ 160
unify(uintN, bool) = bool if N ≤ 8
Packed Storage Recovery
When the Solidity compiler packs multiple variables into a single 256-bit slot, the bytecode accesses the slot through AND, OR, and SHIFT operations that extract or insert bit-ranges. Recovering the packed layout requires mask analysis.
Mask Patterns
The compiler generates characteristic instruction sequences for packed access:
Read a packed field at bit-offset b with width w:
SLOAD(slot)
SHR(b) -- shift right by bit-offset
AND(2^w - 1) -- mask to field width
Write a packed field:
SLOAD(slot)
AND(NOT(((2^w - 1) << b))) -- clear the field bits
OR(value << b) -- insert new value
SSTORE(slot)
Inference Rule for Packed Fields
Each observed mask pattern generates a sub-slot constraint:
v = SLOAD(s) v' = SHR(b, v) v'' = AND(2^w - 1, v')
─────────────────────────────────────────────────────────────
C ← C ∪ { s : packed(⟨α, b, w⟩, ...) }
Multiple mask operations on the same slot accumulate field entries in the packed type. Unification merges these entries by bit-range:
unify(packed(F₁), packed(F₂))
= packed(merge(F₁, F₂))
where merge combines field lists, unifying types
for overlapping bit-ranges and concatenating disjoint ranges
Type Recovery from Field Width
The bit-width constrains the field type:
| Bit-width | Candidate Types |
|---|---|
| 1 | bool |
| 8 | uint8, int8, bool (context-dependent) |
| 160 | address, uint160 |
| 256 | uint256, int256, bytes32 |
| Other N | uintN, intN, bytesM where M = N/8 |
The constraint system uses bit-width as a hard constraint and refines through usage-based rules (address mask, sign extension, comparison patterns) to select among candidates.
flowchart LR
classDef process fill:#1a2233,stroke:#7ea8d4,stroke-width:2px,color:#c0d8f0
classDef data fill:#332a1a,stroke:#d4b870,stroke-width:2px,color:#f0e0c0
classDef highlight fill:#332519,stroke:#e8a87c,stroke-width:2px,color:#f0d8c0
A["SLOAD slot 0"]:::data
B["SHR 160"]:::process
C["AND 0xFF"]:::process
D["bool at bits 160-167"]:::highlight
E["SLOAD slot 0"]:::data
F["AND addr_mask"]:::process
G["address at bits 0-159"]:::highlight
A --> B --> C --> D
E --> F --> G
Proxy and Diamond Patterns
Proxy contracts (EIP-1967, EIP-2535) introduce polymorphic storage: the same contract address executes different bytecode depending on which implementation is active. Each implementation may impose a different type structure on the same storage slots.
The Problem
Consider a transparent proxy where:
- Implementation A treats slot 5 as
mapping(address => uint256)(a balance mapping) - Implementation B treats slot 5 as
mapping(address => bool)(an allowlist)
Standard unification would produce a type error. In a proxy context, this is expected behavior — the slot’s type depends on which implementation is executing.
Polymorphic Type Contexts
We extend the type environment with an implementation index:
Γ : (ImplAddress, SlotExpr) → τ
Constraint generation tags each constraint with the implementation context in which it was observed. Unification operates within each implementation context independently. Cross-context conflicts are reported as layout transitions rather than type errors.
For diamond proxies (EIP-2535), each facet provides a separate implementation context. The storage layout is the union of all facet layouts, and conflicts between facets indicate storage collision vulnerabilities — a direct security finding.
EIP-1967 Slot Detection
The well-known storage slots defined by EIP-1967 serve as anchors for proxy identification:
implementation_slot = keccak256("eip1967.proxy.implementation") - 1
admin_slot = keccak256("eip1967.proxy.admin") - 1
beacon_slot = keccak256("eip1967.proxy.beacon") - 1
When these slots appear in the constraint set, the analysis identifies the contract as a proxy and switches to polymorphic mode. The implementation address read from the implementation slot determines which context to use for subsequent DELEGATECALL targets.
flowchart TD
classDef system fill:#1a3333,stroke:#5ba8a8,stroke-width:2px,color:#c0e8e8
classDef data fill:#332a1a,stroke:#d4b870,stroke-width:2px,color:#f0e0c0
classDef highlight fill:#332519,stroke:#e8a87c,stroke-width:2px,color:#f0d8c0
classDef attack fill:#331a1a,stroke:#d97777,stroke-width:2px,color:#f0c0c0
P["Proxy Contract"]:::system
I1["Implementation A"]:::data
I2["Implementation B"]:::data
S5a["slot 5 : mapping addr to uint256"]:::highlight
S5b["slot 5 : mapping addr to bool"]:::highlight
C["Layout Transition Detected"]:::attack
P -->|"DELEGATECALL"| I1
P -->|"DELEGATECALL"| I2
I1 --> S5a
I2 --> S5b
S5a --- C
S5b --- C
Evaluation
What the Framework Recovers
The constraint-based approach reconstructs the following layout elements with high confidence:
Mappings. The keccak256(key . slot) pattern is unambiguous — it appears in no other context in Solidity-compiled bytecode. Key types are recovered from masking and usage context. Nested mappings (e.g., mapping(address => mapping(address => uint256)) for ERC-20 allowances) produce nested hash computations that the constraint rules decompose.
Dynamic arrays. The keccak256(slot) base with arithmetic indexing is distinctive. Element types are inferred from access patterns on the computed slots. The length stored at the base slot provides a consistency check.
Packed slots. Mask analysis recovers bit-ranges precisely. Type assignment within a packed slot depends on usage context — a 160-bit field might be address or uint160, resolved by downstream operations (CALL targets, comparisons with msg.sender).
Structs. Sequential slot access with heterogeneous types implies struct layout. The boundary between “adjacent variables” and “struct fields” is ambiguous at the bytecode level — both compile to the same slot arithmetic. The constraint system reports the types; whether they form a struct or separate variables is a presentation choice.
Limitations
Unaccessed slots. Slots that are never read or written in the analyzed bytecode generate no constraints. Dead storage variables are invisible to any dynamic or static approach that relies on access patterns.
Compiler variations. The rules assume Solidity’s storage layout. Vyper uses a different layout strategy (with slot hashing for all variables as of Vyper 0.4), and contracts compiled with solc --storage-layout optimizations may differ from the default. The constraint system is parameterizable — different rule sets can be plugged in for different compilers — but the default rules target Solidity.
Ambiguous nesting depth. A slot accessed as keccak256(keccak256(k₁ . p) . q) could be a mapping(K => mapping(K2 => V)) or a mapping(K => V) where V is itself stored via a secondary mapping. Without observing accesses at intermediate nesting levels, the depth is underdetermined. The framework assigns the minimal nesting depth consistent with observations.
Inline assembly. Contracts that compute storage slots through arbitrary arithmetic (common in assembly-heavy code like Solady or Yul-optimized contracts) may produce slot expressions outside the grammar. The constraint system flags these as unresolved, preserving soundness at the cost of completeness.
Precision Characterization
The framework provides a three-valued confidence for each slot:
| Confidence | Meaning |
|---|---|
| Definite | All constraints unified without conflict; single most-general type |
| Ambiguous | Multiple compatible types remain (e.g., uint160 vs address) |
| Conflicting | Incompatible constraints detected; likely proxy pattern or analysis error |
This three-valued output is more informative than a binary “recovered / not recovered” classification. Downstream analyses can use definite types directly, treat ambiguous types conservatively, and flag conflicting types for manual review or proxy-aware analysis.
Related Work
Gigahorse (Grech et al., 2019) performs storage analysis as part of its Datalog-based decompilation framework, recovering mappings and arrays through declarative pattern rules. The constraint-based approach described here generalizes these patterns into a unified type system with formal unification semantics.
The Ethereum Yellow Paper (Wood, 2014) defines the storage model but not the layout conventions, which are specified in the Solidity documentation.
Hindley-Milner type inference (Hindley, 1969; Milner, 1978; Damas and Milner, 1982) provides the theoretical foundation. The adaptation to storage types is analogous to type inference for record types (Wand, 1987), where field access constrains the record structure. Row polymorphism, used in extensible record systems, offers a natural model for struct types where not all fields have been observed.
EIP-1967 (Standard Proxy Storage Slots) and EIP-2535 (Diamond Standard) define the proxy patterns that motivate polymorphic type contexts.
Abstract interpretation (Cousot and Cousot, 1977) provides an alternative framing: storage reconstruction as computing a fixed point over an abstract domain of storage types. The constraint-based and abstract interpretation perspectives are complementary — constraints can be viewed as the collecting semantics of an abstract domain where the abstraction function maps concrete 256-bit values to storage types.
References
-
Grech, N., Brent, L., Scholz, B., and Smaragdakis, Y. “Gigahorse: Thorough, Declarative Decompilation of Smart Contract Bytecodes.” Proceedings of the 41st International Conference on Software Engineering (ICSE), 2019. doi:10.1109/ICSE.2019.00120
-
Wood, G. “Ethereum: A Secure Decentralised Generalised Transaction Ledger.” Ethereum Yellow Paper, 2014. https://ethereum.github.io/yellowpaper/paper.pdf
-
Solidity Team. “Layout of State Variables in Storage.” Solidity Documentation. https://docs.soliditylang.org/en/latest/internals/layout_in_storage.html
-
Hindley, R. “The Principal Type-Scheme of an Object in Combinatory Logic.” Transactions of the American Mathematical Society, 146:29-60, 1969.
-
Milner, R. “A Theory of Type Polymorphism in Programming.” Journal of Computer and System Sciences, 17(3):348-375, 1978.
-
Damas, L. and Milner, R. “Principal Type-Schemes for Functional Programs.” Proceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 1982. doi:10.1145/582153.582176
-
Wand, M. “Complete Type Inference for Simple Objects.” Proceedings of the 2nd IEEE Symposium on Logic in Computer Science (LICS), 1987.
-
Cousot, P. and Cousot, R. “Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints.” Proceedings of the 4th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 1977. doi:10.1145/512950.512973
-
EIP-1967: Standard Proxy Storage Slots. https://eips.ethereum.org/EIPS/eip-1967
-
EIP-2535: Diamonds, Multi-Facet Proxy. https://eips.ethereum.org/EIPS/eip-2535