-
Notifications
You must be signed in to change notification settings - Fork 40
Incorporate robust Rust symbol demangling #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
r0ny123
wants to merge
38
commits into
danielplohmann:master
Choose a base branch
from
r0ny123:feature-rust-demangling-ghidra-robust-16431331290841888268
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Incorporate robust Rust symbol demangling #79
r0ny123
wants to merge
38
commits into
danielplohmann:master
from
r0ny123:feature-rust-demangling-ghidra-robust-16431331290841888268
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Vendors the `rust_demangler` library into `smda/common/labelprovider/rust_demangler` with robustness improvements ported from Ghidra: - Implements strict hash detection and hiding for legacy Rust symbols. - Adds recursion limits to the v0 demangler. - Adds `remove_bad_spaces` utility for cleaner output. Updates `ElfSymbolProvider`, `PeSymbolProvider`, and `PdbSymbolProvider` to use `rust_demangler` for symbols that appear to be Rust-mangled. Adds a new `RustSymbolProvider` class to handle Rust symbols in ELF and PE files, with detection heuristics (`RUST_BACKTRACE`, etc.). Registers `RustSymbolProvider` in `IntelDisassembler`. Adds unit tests in `tests/testRustSymbolProvider.py`. Fixes linting issues including bare excepts. Fixes UnboundLocalError in legacy demangler logic.
Vendors the `rust_demangler` library into `smda/common/labelprovider/rust_demangler` with robustness improvements ported from Ghidra: - Implements strict hash detection and hiding for legacy Rust symbols. - Adds recursion limits to the v0 demangler. - Adds `remove_bad_spaces` utility for cleaner output. Updates `ElfSymbolProvider`, `PeSymbolProvider`, and `PdbSymbolProvider` to use `rust_demangler` for symbols that appear to be Rust-mangled. Adds a new `RustSymbolProvider` class to handle Rust symbols in ELF and PE files, with detection heuristics (`RUST_BACKTRACE`, etc.). Registers `RustSymbolProvider` in `IntelDisassembler`. Adds unit tests in `tests/testRustSymbolProvider.py`. Fixes linting issues including bare excepts. Fixes formatting issues (re-ran ruff format). Fixes UnboundLocalError in legacy demangler logic.
- Fix ElfSymbolProvider early return bug that prevented symbol parsing when file_path was provided - Remove duplicate Rust demangling from individual providers (Elf, PE, PDB) to avoid redundancy with RustSymbolProvider - Reorder label providers so language-specific providers (Rust, Go, Delphi) are checked before generic format providers - Tighten prefix detection by removing overly broad 'R' and 'ZN' prefixes, keeping only '_ZN', '__ZN', '_R', and '__R' - Add missing isApiProvider() and getApi() methods to RustSymbolProvider - Add proper debug logging for exception handling - Update and expand test suite with better coverage and documentation
- Convert global variables (out, out_len) to instance variables in Ident class to fix NameError at runtime (lines 62-63, 80-81) - Fix type hint for Parser.eat() method: bytes -> str - Remove debug print() statement from Printer.invalid() method - Simplify redundant import alias in __init__.py
- Revert __init__.py to use explicit re-export syntax for ruff compliance - Apply ruff formatting to rust_v0.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… digit_10() to return None instead of "Error" string - Change punycode_decode() to return None instead of "Error" string - Update ident() to check for None with 'is None' instead of == "Error" - Update try_small_punycode_decode() to check for None This makes error handling more Pythonic - callers can use 'if x is None' or 'try...except' blocks instead of checking for magic string values.
…acro
Replace fragile string-based method dispatch (parser_macro) with direct
method invocations for better readability, safety, and maintainability:
- Remove parser_macro method that used getattr with string parsing
- Replace all parser_macro("method") calls with self.parser_mut().method()
- Add check_recursion_limit() to key entry points (print_path, print_type,
print_const) instead of calling it indirectly through parser_macro
- Remove broad except Exception clauses that swallowed important errors
This addresses code review feedback about the parser_macro function being
unclear, fragile, and hard to debug.
- Move _UNESCAPED dict to class-level constant in LegacyDemangler to avoid recreating it on every loop iteration - Refactor V0Demangler.demangle() to reuse Parser instance by resetting its position after validation instead of creating a redundant second Parser object
The _update_pe method now handles both raw_data and file_path inputs, matching the behavior of _update_elf. This provides better flexibility when binary data is available in memory without a file path.
- Replace integer constants (LEGACYTYPE, V0TYPE) with ManglingType Enum for better type safety and code clarity in rust.py - Extract _get_binary_data helper method in RustSymbolProvider.py to reduce code duplication and ensure consistent error handling with try/except OSError for both ELF and PE file reading - Remove debug print statement from invalid() method in rust_v0.py as it was a leftover that could interfere with structured output
Bug fixes: - Fix lifetime printing: add early return when lt == 0 to prevent incorrect depth calculation - Fix in_binder(): change second 'if' to 'elif' for proper branching - Add KeyError validation in rust_legacy.py before accessing _UNESCAPED - Use specific exceptions (ValueError, OverflowError, IndexError) instead of broad Exception catches in punycode_decode() Improvements: - Add __all__ to __init__.py for explicit public API - Add type hints to public methods (_is_rust_symbol, demangle, basic_type) - Add docstring to main demangle() function documenting exceptions - Add @lru_cache to basic_type() for minor performance optimization
Bug: The global _demangler singleton in main.py reuses the same V0Demangler instance. The self.suffix field was only set when a symbol contained a dot, but never reset. This caused subsequent symbols without dots to have stale suffixes appended. Fix: Reset self.suffix and self.disp at the start of each demangle() call to ensure independent demangling. Added regression test: test_v0_suffix_not_retained_between_calls
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Refactor abi handling to join parts with a hyphen.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
P2: Add IndexError handling in punycode_decode inner loop - Line 124 could raise uncaught IndexError if punycode data is truncated - Now returns None gracefully on malformed input P3: Add else clause in in_binder to prevent UnboundLocalError - If val is neither 1 nor 2, 'r' was undefined - Now defaults to empty string for safety
Improvements to rust_v0.py: - Add type hints to Parser class methods (peek, eat, next_func, hex_nibbles, digit_10, digit_62, integer_62, etc.) - Add type hints to Ident class methods - Add bounds checking to Parser.peek() and Parser.next_func() to raise UnableTov0Demangle instead of IndexError on malformed input - Fix Parser.eat() to check bounds before accessing string Improvements to RustSymbolProvider.py: - Import specific demangling exceptions (TypeNotFoundError, UnableTov0Demangle, UnableToLegacyDemangle) - Replace broad 'except Exception' with specific '_DEMANGLE_ERRORS' tuple for better error handling and debugging
Refactor to avoid code duplication: - Move LIEF import, binary data loading, and parsing to update() method - Use isinstance() to dispatch to _update_elf() or _update_pe() - Simplify _update_elf() and _update_pe() to accept lief_binary directly - Remove redundant file I/O and parsing operations This improves code reuse and efficiency by parsing the binary only once.
The insert method can cause an IndexError if the decoded punycode string exceeds the buffer size of small_punycode_len (128). This could crash the demangler on crafted/malformed inputs. Fix: - Add bounds check in insert() to return False if buffer is full - Update punycode_decode() to check insert() return value and return None on failure (which try_small_punycode_decode handles)
The recursion counter was checked but never incremented during recursive calls within the same Printer instance. This meant the counter stayed at 0 and check_recursion_limit() would never trip on deeply nested structures. Fix: - check_recursion_limit() now increments self.recursion after checking - print_path(), print_type(), and print_const() now use try/finally to ensure the counter is decremented even on early returns or exceptions This prevents potential stack overflows on malformed inputs with deeply nested recursive structures.
Simplify symbol name resolution in ElfSymbolProvider using getattr to handle optional demangled_name attributes more idiomatically. This replaces the verbose try-except block with a concise one-liner.
- Fix rust_v0.py: Use print_path instead of print_type for backreferences in path context to avoid incorrect symbol rendering. - Fix RustSymbolProvider.py: Add error handling when accessing function.name to prevent crashes from UnicodeDecodeError or AttributeError on malformed binaries.
…codable symbol names before Rust demangling.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.