RTLO Unicode
Detects right-to-left override Unicode characters used in Trojan Source attacks to disguise malicious code as benign.
RTLO Unicode
Overview
The RTLO Unicode detector identifies the presence of right-to-left override (U+202E) and other bidirectional control characters in contract source code or metadata. These invisible Unicode characters reverse the display order of subsequent text, making malicious code appear benign when viewed in editors and code review tools. This class of attack is known as “Trojan Source.”
Why This Is an Issue
An attacker can insert RTLO characters into comments, string literals, or variable names to make one line of code render as a completely different operation in human-readable display. For example, a transfer to an attacker’s address can be displayed as a transfer to the legitimate recipient. Code reviewers and auditors see the benign version, while the compiler processes the actual malicious version.
How to Resolve
// This line contains a hidden RTLO character that reverses display:
// What reviewers see: require(msg.sender == admin);
// What compiles: require(msg.sender == attacker);
// Fix: Strip all bidirectional Unicode control characters
// Use tools that highlight or reject bidirectional characters
// Configure CI/CD to reject files containing U+202E, U+202B, U+202A, U+202D, U+2066-U+2069
Compilers and linters should reject source files containing bidirectional control characters. Solidity 0.8.0+ emits a warning for these characters.
Detection Methodology
- Unicode scanning: Scans contract source code and metadata strings for bidirectional control characters (U+202A through U+202E, U+2066 through U+2069).
- String literal analysis: Checks string constants embedded in bytecode for hidden directional characters.
- Metadata inspection: Examines contract metadata (CBOR-encoded at the end of bytecode) for embedded source references containing these characters.
Limitations
False positives: Contracts that legitimately handle Arabic or Hebrew text may contain bidirectional characters in string constants. False negatives: If the source code is not available and the bidirectional characters do not appear in the compiled bytecode or metadata, detection from bytecode alone is not possible.
Related Detectors
- Access Control — detects missing authorization checks
- Business Logic Error — detects logic issues in contract behavior