feat: improve dependency tracking and base class extraction #1166

KRRT7 · 2026-01-24T06:30:09Z

Summary

Include dependency classes in read-writable optimization context
Track attribute access base names as dependencies in DependencyCollector
Use PicklePatcher for instance state to handle unpicklable async objects
Extract __init__ from external library base classes for test context
Handle module-level function definitions in add_global_assignments
Handle annotated assignments in GlobalAssignmentCollector
Handle repr failures for Mock objects in test result comparison
Handle instance state capture for classes without __dict__
Recursively extract base classes for imported dataclasses
Handle decorators & annotations properly

Test plan

Unit tests added for base class extraction in imported dataclasses
Unit tests extended for code context extractor
Unit tests extended for codeflash capture
Unit tests added for remove unused definitions

When extracting imported class definitions for testgen context, also extract base classes from the same module. This ensures the full inheritance chain is available for understanding constructor signatures. For example, when LLMConfig inherits from LLMConfigBase, both classes are now included in the context so the LLM can see all required positional arguments from parent classes.

- Update test_get_imported_class_definitions_includes_dataclass_decorators to expect both base class and derived class to be extracted - Add test_get_imported_class_definitions_extracts_multilevel_inheritance to verify multi-level inheritance chains are fully extracted

Support classes using __slots__ and C extension types (like Playwright's Page) that don't have a __dict__ attribute. Previously raised ValueError, now captures slot values or public non-callable attributes as fallback.

Mock objects from unittest.mock can have corrupted internal state after pickling, causing __repr__ to raise AttributeError. Added safe_repr wrapper to gracefully handle these failures during test result comparison.

GlobalAssignmentCollector only handled cst.Assign but not cst.AnnAssign (annotated assignments like `X: int = 1`). When the LLM generated optimizations with annotated module-level variables, these weren't copied to the target file, causing NameError at runtime. - Add visit_AnnAssign to GlobalAssignmentCollector - Add leave_AnnAssign to GlobalAssignmentTransformer - Update type hints to include cst.AnnAssign - Add test for annotated assignment handling

Add GlobalFunctionCollector and GlobalFunctionTransformer to collect and insert module-level function definitions introduced by LLM optimizations. This fixes NameError when optimized code introduces new helper functions like @lru_cache decorated functions that are used by the optimized method.

…ntext Add get_external_base_class_inits to extract __init__ methods from external library base classes (e.g., collections.UserDict) when project classes inherit from them. This helps the LLM understand constructor signatures for mocking.

… objects When instrumenting classes that inherit from Playwright's Page (like SkyvernPage), the instance state contains async event loop references including asyncgen_hooks which cannot be pickled. PicklePatcher gracefully handles these by replacing unpicklable objects with placeholders.

…ollector Fix issue where enum/class attribute access like `MessageKind.VALUE` was not tracking `MessageKind` as a dependency. The original code skipped all Names inside Attribute nodes within classes, but this incorrectly filtered out legitimate references. Now properly distinguishes between: - `.attr` part (e.g., `x` in `self.x`) - not tracked (attribute names) - `.value` part (e.g., `MessageKind` in `MessageKind.VALUE`) - tracked

Classes used as dependencies (enums, dataclasses, types) were being excluded from the optimization context even when marked as used by the target function. This caused NameError when the LLM used these types in generated optimizations.

…eError When LLM-generated optimizations include module-level code like `_REIFIERS = {MessageKind.XXX: ...}`, the global assignment was being inserted right after imports, BEFORE the class definition it referenced, causing NameError at module load time. Changes: - GlobalAssignmentTransformer now inserts assignments after all class/function definitions instead of right after imports - GlobalStatementCollector now skips AnnAssign (annotated assignments) so they are handled by GlobalAssignmentCollector instead

…ameError When LLM-generated optimizations include module-level function calls like `_register(MessageKind.ASK, ...)`, they were being inserted right after imports, BEFORE the function definition they reference, causing NameError at module load time. Changes: - Add GlobalStatementTransformer to append global statements at module end - Reorder transformations: functions → assignments → statements - Remove unused ImportInserter class - Update test expectations to reflect new placement behavior

codeflash-ai · 2026-01-24T07:51:47Z

⚡️ Codeflash found optimizations for this PR

📄 23% (0.23x) speedup for `add_global_assignments` in `codeflash/code_utils/code_extractor.py`

⏱️ Runtime : 573 milliseconds → 467 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function add_global_assignments by 23% in PR #1166 (skyvern-grace) #1167

If you approve, it will be merged into this PR (branch skyvern-grace).

codeflash-ai · 2026-01-24T09:06:34Z

⚡️ Codeflash found optimizations for this PR

📄 396% (3.96x) speedup for `extract_imports_for_class` in `codeflash/context/code_context_extractor.py`

⏱️ Runtime : 3.23 milliseconds → 650 microseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function extract_imports_for_class by 396% in PR #1166 (skyvern-grace) #1168

If you approve, it will be merged into this PR (branch skyvern-grace).

codeflash-ai · 2026-01-24T09:22:23Z

⚡️ Codeflash found optimizations for this PR

📄 43% (0.43x) speedup for `UsedNameCollector.get_external_names` in `codeflash/context/code_context_extractor.py`

⏱️ Runtime : 177 microseconds → 124 microseconds (best of 250 runs)

A new Optimization Review has been created.

🔗 Review here

Update test_no_targets_found to expect outer class to be kept when targeting nested class methods, and add test for nonexistent targets. Update test_multi_file_replcement01 to expect global assignments at module end rather than after imports.

Assignments that don't reference module-level definitions are now placed right after imports. Only assignments that reference classes/functions are placed after those definitions to prevent NameError.

…ized function Consolidated prune_cst_for_read_only_code and prune_cst_for_testgen_code into prune_cst_for_context with include_target_in_output and include_init_dunder flags.

…lash into skyvern-grace

…finition_remover Remove unused print_definitions debug function and simplify detect_unused_helper_functions to use try/except/else pattern.

…moving dead code - Extract build_testgen_context helper to reduce duplication in testgen token limit handling (~50 lines to ~20 lines) - Remove unused extract_code_string_context_from_files function (~100 lines) - Import get_section_names from unused_definition_remover instead of duplicating

This optimization achieves a **14% runtime improvement** (4.80ms → 4.19ms) through several targeted micro-optimizations that reduce overhead in hot code paths: ## Key Performance Improvements ### 1. **Eliminated Redundant Dictionary Lookups via Caching** In `CodeStringsMarkdown` properties (`flat`, `file_to_path`), the original code called `self._cache.get("key")` twice per invocation. The optimized version caches the result in a local variable: ```python # Before: two lookups if self._cache.get("flat") is not None: return self._cache["flat"] # After: one lookup cached = self._cache.get("flat") if cached is not None: return cached ``` This eliminates redundant hash table lookups in frequently accessed properties. ### 2. **Replaced `dict.setdefault()` for Atomic List Operations** In `_analyze_imports_in_optimized_code`, the original code used an if-check followed by assignment for the helpers dictionary: ```python # Before: check + assign (two operations) if func_name in file_entry: file_entry[func_name].append(helper) else: file_entry[func_name] = [helper] # After: single atomic operation helpers_by_file_and_func[module_name].setdefault(func_name, []).append(helper) ``` The `setdefault()` approach reduces the operation to a single dictionary call, eliminating the membership test. ### 3. **Hoisted `as_posix()` Calls Outside String Formatting** In the `markdown` property, path conversion was moved outside the f-string: ```python # Before: as_posix() called inside f-string f"```python:{code_string.file_path.as_posix()}\n..." # After: precomputed in conditional branch if code_string.file_path: file_path_str = code_string.file_path.as_posix() result.append(f"```python:{file_path_str}\n...") ``` This avoids repeated method calls during string formatting. ### 4. **Optimized Set Membership Tests with Early Exit** The most impactful change replaced `set.intersection()` with short-circuit boolean checks: ```python # Before: creates intermediate set via intersection is_called = bool(possible_call_names.intersection(called_function_names)) # After: early-exit on first match if (helper_qualified_name in called_function_names or helper_simple_name in called_function_names or helper_fully_qualified_name in called_function_names): is_called = True ``` With ~200 helpers in large-scale tests, this avoids creating temporary sets for every comparison, showing **50% speedup** in the large helper test (1.31ms → 868μs). ### 5. **Minimized Repeated Attribute Access** Variables like `entrypoint_file_path`, `attr_name`, and `value_id` are now cached before use, reducing attribute lookups in the AST traversal loop. ## Impact Based on Test Results - **Small workloads** (10-50 helpers): 10-16% speedup from reduced dict lookups - **Large workloads** (200 helpers): 50% speedup due to eliminated set operations in the helper-checking loop - **Edge cases** (syntax errors, missing functions): Minimal overhead, consistent 2-3% improvement This optimization is particularly valuable when `detect_unused_helper_functions` is called repeatedly during code analysis pipelines, as the cumulative effect of these micro-optimizations scales with the number of helper functions and code blocks analyzed.

codeflash-ai · 2026-01-24T15:53:42Z

⚡️ Codeflash found optimizations for this PR

📄 15% (0.15x) speedup for `detect_unused_helper_functions` in `codeflash/context/unused_definition_remover.py`

⏱️ Runtime : 4.80 milliseconds → 4.19 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function detect_unused_helper_functions by 15% in PR #1166 (skyvern-grace) #1169

If you approve, it will be merged into this PR (branch skyvern-grace).

…2026-01-24T15.53.33 ⚡️ Speed up function `detect_unused_helper_functions` by 15% in PR #1166 (`skyvern-grace`)

KRRT7 and others added 14 commits January 23, 2026 06:52

decorators & annotations

722d053

fix: handle instance state capture for classes without __dict__

9cefc03

Support classes using __slots__ and C extension types (like Playwright's Page) that don't have a __dict__ attribute. Previously raised ValueError, now captures slot values or public non-callable attributes as fallback.

fix: handle repr failures for Mock objects in test result comparison

412779d

Mock objects from unittest.mock can have corrupted internal state after pickling, causing __repr__ to raise AttributeError. Added safe_repr wrapper to gracefully handle these failures during test result comparison.

Merge branch 'main' into skyvern-grace

1bb9d14

codeflash-ai bot mentioned this pull request Jan 24, 2026

⚡️ Speed up function add_global_assignments by 23% in PR #1166 (skyvern-grace) #1167

Closed

codeflash-ai bot mentioned this pull request Jan 24, 2026

⚡️ Speed up function extract_imports_for_class by 396% in PR #1166 (skyvern-grace) #1168

Closed

KRRT7 added 6 commits January 24, 2026 06:19

refactor: smarter placement of global assignments based on dependencies

7b33e8b

Assignments that don't reference module-level definitions are now placed right after imports. Only assignments that reference classes/functions are placed after those definitions to prevent NameError.

refactor: merge duplicate CST pruning functions into single parameter…

48b5ff3

…ized function Consolidated prune_cst_for_read_only_code and prune_cst_for_testgen_code into prune_cst_for_context with include_target_in_output and include_init_dunder flags.

refactor: merge duplicate CST pruning functions into single parameter…

9c9593a

…ized function Consolidated prune_cst_for_read_only_code and prune_cst_for_testgen_code into prune_cst_for_context with include_target_in_output and include_init_dunder flags.

Update mypy.yml

1b92d11

Merge branch 'skyvern-grace' of https://github.com/codeflash-ai/codef…

aba3e79

…lash into skyvern-grace

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Jan 24, 2026

KRRT7 added 4 commits January 24, 2026 07:05

skip for 3.9

571b2ba

refactor: remove unused code and simplify return pattern in unused_de…

47b5235

…finition_remover Remove unused print_definitions debug function and simplify detect_unused_helper_functions to use try/except/else pattern.

add tests

65ff392

codeflash-ai bot mentioned this pull request Jan 24, 2026

⚡️ Speed up function detect_unused_helper_functions by 15% in PR #1166 (skyvern-grace) #1169

Merged

KRRT7 added 2 commits January 24, 2026 19:39

Merge pull request #1169 from codeflash-ai/codeflash/optimize-pr1166-…

7e3d6ec

…2026-01-24T15.53.33 ⚡️ Speed up function `detect_unused_helper_functions` by 15% in PR #1166 (`skyvern-grace`)

Merge branch 'main' into skyvern-grace

214d891

KRRT7 merged commit 5e61a8b into main Jan 25, 2026
23 of 24 checks passed

KRRT7 deleted the skyvern-grace branch January 25, 2026 07:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve dependency tracking and base class extraction #1166

feat: improve dependency tracking and base class extraction #1166

Uh oh!

KRRT7 commented Jan 24, 2026

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Speed up function `add_global_assignments` by 23% in PR #1166 (`skyvern-grace`) #1167

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Speed up function `extract_imports_for_class` by 396% in PR #1166 (`skyvern-grace`) #1168

Uh oh!

codeflash-ai bot commented Jan 24, 2026

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Speed up function `detect_unused_helper_functions` by 15% in PR #1166 (`skyvern-grace`) #1169

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: improve dependency tracking and base class extraction #1166

feat: improve dependency tracking and base class extraction #1166

Uh oh!

Conversation

KRRT7 commented Jan 24, 2026

Summary

Test plan

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 23% (0.23x) speedup for add_global_assignments in codeflash/code_utils/code_extractor.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function add_global_assignments by 23% in PR #1166 (skyvern-grace) #1167

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 396% (3.96x) speedup for extract_imports_for_class in codeflash/context/code_context_extractor.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function extract_imports_for_class by 396% in PR #1166 (skyvern-grace) #1168

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 43% (0.43x) speedup for UsedNameCollector.get_external_names in codeflash/context/code_context_extractor.py

A new Optimization Review has been created.

Uh oh!

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 15% (0.15x) speedup for detect_unused_helper_functions in codeflash/context/unused_definition_remover.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function detect_unused_helper_functions by 15% in PR #1166 (skyvern-grace) #1169

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

📄 23% (0.23x) speedup for `add_global_assignments` in `codeflash/code_utils/code_extractor.py`

⚡️ Speed up function `add_global_assignments` by 23% in PR #1166 (`skyvern-grace`) #1167

📄 396% (3.96x) speedup for `extract_imports_for_class` in `codeflash/context/code_context_extractor.py`

⚡️ Speed up function `extract_imports_for_class` by 396% in PR #1166 (`skyvern-grace`) #1168

📄 43% (0.43x) speedup for `UsedNameCollector.get_external_names` in `codeflash/context/code_context_extractor.py`

📄 15% (0.15x) speedup for `detect_unused_helper_functions` in `codeflash/context/unused_definition_remover.py`

⚡️ Speed up function `detect_unused_helper_functions` by 15% in PR #1166 (`skyvern-grace`) #1169