Skip to content

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Jan 24, 2026

Summary

  • Include dependency classes in read-writable optimization context
  • Track attribute access base names as dependencies in DependencyCollector
  • Use PicklePatcher for instance state to handle unpicklable async objects
  • Extract __init__ from external library base classes for test context
  • Handle module-level function definitions in add_global_assignments
  • Handle annotated assignments in GlobalAssignmentCollector
  • Handle repr failures for Mock objects in test result comparison
  • Handle instance state capture for classes without __dict__
  • Recursively extract base classes for imported dataclasses
  • Handle decorators & annotations properly

Test plan

  • Unit tests added for base class extraction in imported dataclasses
  • Unit tests extended for code context extractor
  • Unit tests extended for codeflash capture
  • Unit tests added for remove unused definitions

KRRT7 and others added 14 commits January 23, 2026 06:52
When extracting imported class definitions for testgen context, also
extract base classes from the same module. This ensures the full
inheritance chain is available for understanding constructor signatures.

For example, when LLMConfig inherits from LLMConfigBase, both classes
are now included in the context so the LLM can see all required
positional arguments from parent classes.
- Update test_get_imported_class_definitions_includes_dataclass_decorators
  to expect both base class and derived class to be extracted
- Add test_get_imported_class_definitions_extracts_multilevel_inheritance
  to verify multi-level inheritance chains are fully extracted
Support classes using __slots__ and C extension types (like Playwright's
Page) that don't have a __dict__ attribute. Previously raised ValueError,
now captures slot values or public non-callable attributes as fallback.
Mock objects from unittest.mock can have corrupted internal state after
pickling, causing __repr__ to raise AttributeError. Added safe_repr
wrapper to gracefully handle these failures during test result comparison.
GlobalAssignmentCollector only handled cst.Assign but not cst.AnnAssign
(annotated assignments like `X: int = 1`). When the LLM generated
optimizations with annotated module-level variables, these weren't
copied to the target file, causing NameError at runtime.

- Add visit_AnnAssign to GlobalAssignmentCollector
- Add leave_AnnAssign to GlobalAssignmentTransformer
- Update type hints to include cst.AnnAssign
- Add test for annotated assignment handling
Add GlobalFunctionCollector and GlobalFunctionTransformer to collect and
insert module-level function definitions introduced by LLM optimizations.
This fixes NameError when optimized code introduces new helper functions
like @lru_cache decorated functions that are used by the optimized method.
…ntext

Add get_external_base_class_inits to extract __init__ methods from external
library base classes (e.g., collections.UserDict) when project classes inherit
from them. This helps the LLM understand constructor signatures for mocking.
… objects

When instrumenting classes that inherit from Playwright's Page (like
SkyvernPage), the instance state contains async event loop references
including asyncgen_hooks which cannot be pickled. PicklePatcher gracefully
handles these by replacing unpicklable objects with placeholders.
…ollector

Fix issue where enum/class attribute access like `MessageKind.VALUE` was not
tracking `MessageKind` as a dependency. The original code skipped all Names
inside Attribute nodes within classes, but this incorrectly filtered out
legitimate references.

Now properly distinguishes between:
- `.attr` part (e.g., `x` in `self.x`) - not tracked (attribute names)
- `.value` part (e.g., `MessageKind` in `MessageKind.VALUE`) - tracked
Classes used as dependencies (enums, dataclasses, types) were being
excluded from the optimization context even when marked as used by
the target function. This caused NameError when the LLM used these
types in generated optimizations.
…eError

When LLM-generated optimizations include module-level code like
`_REIFIERS = {MessageKind.XXX: ...}`, the global assignment was being
inserted right after imports, BEFORE the class definition it referenced,
causing NameError at module load time.

Changes:
- GlobalAssignmentTransformer now inserts assignments after all
  class/function definitions instead of right after imports
- GlobalStatementCollector now skips AnnAssign (annotated assignments)
  so they are handled by GlobalAssignmentCollector instead
…ameError

When LLM-generated optimizations include module-level function calls like
`_register(MessageKind.ASK, ...)`, they were being inserted right after
imports, BEFORE the function definition they reference, causing NameError
at module load time.

Changes:
- Add GlobalStatementTransformer to append global statements at module end
- Reorder transformations: functions → assignments → statements
- Remove unused ImportInserter class
- Update test expectations to reflect new placement behavior
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 23% (0.23x) speedup for add_global_assignments in codeflash/code_utils/code_extractor.py

⏱️ Runtime : 573 milliseconds 467 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch skyvern-grace).

Static Badge

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 396% (3.96x) speedup for extract_imports_for_class in codeflash/context/code_context_extractor.py

⏱️ Runtime : 3.23 milliseconds 650 microseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch skyvern-grace).

Static Badge

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 43% (0.43x) speedup for UsedNameCollector.get_external_names in codeflash/context/code_context_extractor.py

⏱️ Runtime : 177 microseconds 124 microseconds (best of 250 runs)

A new Optimization Review has been created.

🔗 Review here

Static Badge

KRRT7 added 6 commits January 24, 2026 06:19
Update test_no_targets_found to expect outer class to be kept when
targeting nested class methods, and add test for nonexistent targets.
Update test_multi_file_replcement01 to expect global assignments at
module end rather than after imports.
Assignments that don't reference module-level definitions are now placed
right after imports. Only assignments that reference classes/functions
are placed after those definitions to prevent NameError.
…ized function

Consolidated prune_cst_for_read_only_code and prune_cst_for_testgen_code into
prune_cst_for_context with include_target_in_output and include_init_dunder flags.
…ized function

Consolidated prune_cst_for_read_only_code and prune_cst_for_testgen_code into
prune_cst_for_context with include_target_in_output and include_init_dunder flags.
@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Jan 24, 2026
KRRT7 added 4 commits January 24, 2026 07:05
…finition_remover

Remove unused print_definitions debug function and simplify
detect_unused_helper_functions to use try/except/else pattern.
…moving dead code

- Extract build_testgen_context helper to reduce duplication in testgen
  token limit handling (~50 lines to ~20 lines)
- Remove unused extract_code_string_context_from_files function (~100 lines)
- Import get_section_names from unused_definition_remover instead of duplicating
This optimization achieves a **14% runtime improvement** (4.80ms → 4.19ms) through several targeted micro-optimizations that reduce overhead in hot code paths:

## Key Performance Improvements

### 1. **Eliminated Redundant Dictionary Lookups via Caching**
In `CodeStringsMarkdown` properties (`flat`, `file_to_path`), the original code called `self._cache.get("key")` twice per invocation. The optimized version caches the result in a local variable:
```python
# Before: two lookups
if self._cache.get("flat") is not None:
    return self._cache["flat"]

# After: one lookup
cached = self._cache.get("flat")
if cached is not None:
    return cached
```
This eliminates redundant hash table lookups in frequently accessed properties.

### 2. **Replaced `dict.setdefault()` for Atomic List Operations**
In `_analyze_imports_in_optimized_code`, the original code used an if-check followed by assignment for the helpers dictionary:
```python
# Before: check + assign (two operations)
if func_name in file_entry:
    file_entry[func_name].append(helper)
else:
    file_entry[func_name] = [helper]

# After: single atomic operation
helpers_by_file_and_func[module_name].setdefault(func_name, []).append(helper)
```
The `setdefault()` approach reduces the operation to a single dictionary call, eliminating the membership test.

### 3. **Hoisted `as_posix()` Calls Outside String Formatting**
In the `markdown` property, path conversion was moved outside the f-string:
```python
# Before: as_posix() called inside f-string
f"```python:{code_string.file_path.as_posix()}\n..."

# After: precomputed in conditional branch
if code_string.file_path:
    file_path_str = code_string.file_path.as_posix()
    result.append(f"```python:{file_path_str}\n...")
```
This avoids repeated method calls during string formatting.

### 4. **Optimized Set Membership Tests with Early Exit**
The most impactful change replaced `set.intersection()` with short-circuit boolean checks:
```python
# Before: creates intermediate set via intersection
is_called = bool(possible_call_names.intersection(called_function_names))

# After: early-exit on first match
if (helper_qualified_name in called_function_names or 
    helper_simple_name in called_function_names or 
    helper_fully_qualified_name in called_function_names):
    is_called = True
```
With ~200 helpers in large-scale tests, this avoids creating temporary sets for every comparison, showing **50% speedup** in the large helper test (1.31ms → 868μs).

### 5. **Minimized Repeated Attribute Access**
Variables like `entrypoint_file_path`, `attr_name`, and `value_id` are now cached before use, reducing attribute lookups in the AST traversal loop.

## Impact Based on Test Results

- **Small workloads** (10-50 helpers): 10-16% speedup from reduced dict lookups
- **Large workloads** (200 helpers): 50% speedup due to eliminated set operations in the helper-checking loop
- **Edge cases** (syntax errors, missing functions): Minimal overhead, consistent 2-3% improvement

This optimization is particularly valuable when `detect_unused_helper_functions` is called repeatedly during code analysis pipelines, as the cumulative effect of these micro-optimizations scales with the number of helper functions and code blocks analyzed.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Jan 24, 2026

⚡️ Codeflash found optimizations for this PR

📄 15% (0.15x) speedup for detect_unused_helper_functions in codeflash/context/unused_definition_remover.py

⏱️ Runtime : 4.80 milliseconds 4.19 milliseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch skyvern-grace).

Static Badge

KRRT7 added 2 commits January 24, 2026 19:39
…2026-01-24T15.53.33

⚡️ Speed up function `detect_unused_helper_functions` by 15% in PR #1166 (`skyvern-grace`)
@KRRT7 KRRT7 merged commit 5e61a8b into main Jan 25, 2026
23 of 24 checks passed
@KRRT7 KRRT7 deleted the skyvern-grace branch January 25, 2026 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants