Skip to content

Conversation

@r0ny123
Copy link
Contributor

@r0ny123 r0ny123 commented Dec 18, 2025

No description provided.

google-labs-jules bot and others added 30 commits December 11, 2025 04:27
Vendors the `rust_demangler` library into `smda/common/labelprovider/rust_demangler` with robustness improvements ported from Ghidra:
- Implements strict hash detection and hiding for legacy Rust symbols.
- Adds recursion limits to the v0 demangler.
- Adds `remove_bad_spaces` utility for cleaner output.

Updates `ElfSymbolProvider`, `PeSymbolProvider`, and `PdbSymbolProvider` to use `rust_demangler` for symbols that appear to be Rust-mangled.
Adds a new `RustSymbolProvider` class to handle Rust symbols in ELF and PE files, with detection heuristics (`RUST_BACKTRACE`, etc.).
Registers `RustSymbolProvider` in `IntelDisassembler`.
Adds unit tests in `tests/testRustSymbolProvider.py`.
Fixes linting issues including bare excepts.
Fixes UnboundLocalError in legacy demangler logic.
Vendors the `rust_demangler` library into `smda/common/labelprovider/rust_demangler` with robustness improvements ported from Ghidra:
- Implements strict hash detection and hiding for legacy Rust symbols.
- Adds recursion limits to the v0 demangler.
- Adds `remove_bad_spaces` utility for cleaner output.

Updates `ElfSymbolProvider`, `PeSymbolProvider`, and `PdbSymbolProvider` to use `rust_demangler` for symbols that appear to be Rust-mangled.
Adds a new `RustSymbolProvider` class to handle Rust symbols in ELF and PE files, with detection heuristics (`RUST_BACKTRACE`, etc.).
Registers `RustSymbolProvider` in `IntelDisassembler`.
Adds unit tests in `tests/testRustSymbolProvider.py`.
Fixes linting issues including bare excepts.
Fixes formatting issues (re-ran ruff format).
Fixes UnboundLocalError in legacy demangler logic.
- Fix ElfSymbolProvider early return bug that prevented symbol parsing
  when file_path was provided
- Remove duplicate Rust demangling from individual providers (Elf, PE, PDB)
  to avoid redundancy with RustSymbolProvider
- Reorder label providers so language-specific providers (Rust, Go, Delphi)
  are checked before generic format providers
- Tighten prefix detection by removing overly broad 'R' and 'ZN' prefixes,
  keeping only '_ZN', '__ZN', '_R', and '__R'
- Add missing isApiProvider() and getApi() methods to RustSymbolProvider
- Add proper debug logging for exception handling
- Update and expand test suite with better coverage and documentation
- Convert global variables (out, out_len) to instance variables in Ident class
  to fix NameError at runtime (lines 62-63, 80-81)
- Fix type hint for Parser.eat() method: bytes -> str
- Remove debug print() statement from Printer.invalid() method
- Simplify redundant import alias in __init__.py
- Revert __init__.py to use explicit re-export syntax for ruff compliance
- Apply ruff formatting to rust_v0.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… digit_10() to return None instead of "Error" string - Change punycode_decode() to return None instead of "Error" string - Update ident() to check for None with 'is None' instead of == "Error" - Update try_small_punycode_decode() to check for None This makes error handling more Pythonic - callers can use 'if x is None' or 'try...except' blocks instead of checking for magic string values.
…acro

Replace fragile string-based method dispatch (parser_macro) with direct
method invocations for better readability, safety, and maintainability:

- Remove parser_macro method that used getattr with string parsing
- Replace all parser_macro("method") calls with self.parser_mut().method()
- Add check_recursion_limit() to key entry points (print_path, print_type,
  print_const) instead of calling it indirectly through parser_macro
- Remove broad except Exception clauses that swallowed important errors

This addresses code review feedback about the parser_macro function being
unclear, fragile, and hard to debug.
- Move _UNESCAPED dict to class-level constant in LegacyDemangler
  to avoid recreating it on every loop iteration

- Refactor V0Demangler.demangle() to reuse Parser instance by resetting
  its position after validation instead of creating a redundant second
  Parser object
The _update_pe method now handles both raw_data and file_path inputs,
matching the behavior of _update_elf. This provides better flexibility
when binary data is available in memory without a file path.
- Replace integer constants (LEGACYTYPE, V0TYPE) with ManglingType Enum
  for better type safety and code clarity in rust.py

- Extract _get_binary_data helper method in RustSymbolProvider.py to
  reduce code duplication and ensure consistent error handling with
  try/except OSError for both ELF and PE file reading

- Remove debug print statement from invalid() method in rust_v0.py
  as it was a leftover that could interfere with structured output
Bug fixes:
- Fix lifetime printing: add early return when lt == 0 to prevent
  incorrect depth calculation
- Fix in_binder(): change second 'if' to 'elif' for proper branching
- Add KeyError validation in rust_legacy.py before accessing _UNESCAPED
- Use specific exceptions (ValueError, OverflowError, IndexError)
  instead of broad Exception catches in punycode_decode()

Improvements:
- Add __all__ to __init__.py for explicit public API
- Add type hints to public methods (_is_rust_symbol, demangle, basic_type)
- Add docstring to main demangle() function documenting exceptions
- Add @lru_cache to basic_type() for minor performance optimization
Bug: The global _demangler singleton in main.py reuses the same
V0Demangler instance. The self.suffix field was only set when
a symbol contained a dot, but never reset. This caused subsequent
symbols without dots to have stale suffixes appended.

Fix: Reset self.suffix and self.disp at the start of each demangle()
call to ensure independent demangling.

Added regression test: test_v0_suffix_not_retained_between_calls
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Refactor abi handling to join parts with a hyphen.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
P2: Add IndexError handling in punycode_decode inner loop
- Line 124 could raise uncaught IndexError if punycode data is truncated
- Now returns None gracefully on malformed input

P3: Add else clause in in_binder to prevent UnboundLocalError
- If val is neither 1 nor 2, 'r' was undefined
- Now defaults to empty string for safety
Improvements to rust_v0.py:
- Add type hints to Parser class methods (peek, eat, next_func,
  hex_nibbles, digit_10, digit_62, integer_62, etc.)
- Add type hints to Ident class methods
- Add bounds checking to Parser.peek() and Parser.next_func()
  to raise UnableTov0Demangle instead of IndexError on malformed input
- Fix Parser.eat() to check bounds before accessing string

Improvements to RustSymbolProvider.py:
- Import specific demangling exceptions (TypeNotFoundError,
  UnableTov0Demangle, UnableToLegacyDemangle)
- Replace broad 'except Exception' with specific '_DEMANGLE_ERRORS'
  tuple for better error handling and debugging
Refactor to avoid code duplication:
- Move LIEF import, binary data loading, and parsing to update() method
- Use isinstance() to dispatch to _update_elf() or _update_pe()
- Simplify _update_elf() and _update_pe() to accept lief_binary directly
- Remove redundant file I/O and parsing operations

This improves code reuse and efficiency by parsing the binary only once.
The insert method can cause an IndexError if the decoded punycode string
exceeds the buffer size of small_punycode_len (128). This could crash
the demangler on crafted/malformed inputs.

Fix:
- Add bounds check in insert() to return False if buffer is full
- Update punycode_decode() to check insert() return value and
  return None on failure (which try_small_punycode_decode handles)
The recursion counter was checked but never incremented during recursive
calls within the same Printer instance. This meant the counter stayed at 0
and check_recursion_limit() would never trip on deeply nested structures.

Fix:
- check_recursion_limit() now increments self.recursion after checking
- print_path(), print_type(), and print_const() now use try/finally to
  ensure the counter is decremented even on early returns or exceptions

This prevents potential stack overflows on malformed inputs with deeply
nested recursive structures.
@r0ny123 r0ny123 marked this pull request as draft December 26, 2025 05:26
Simplify symbol name resolution in ElfSymbolProvider using getattr to handle optional demangled_name attributes more idiomatically. This replaces the verbose try-except block with a concise one-liner.
- Fix rust_v0.py: Use print_path instead of print_type for backreferences in path context to avoid incorrect symbol rendering.

- Fix RustSymbolProvider.py: Add error handling when accessing function.name to prevent crashes from UnicodeDecodeError or AttributeError on malformed binaries.
@r0ny123 r0ny123 marked this pull request as ready for review December 26, 2025 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants