Skip to content

Conversation

@galagam
Copy link
Contributor

@galagam galagam commented Dec 28, 2025

What does this PR do?

Type of change: Bug fix

Overview: Fix AutoCast ReferenceRunner to handle large models.
Models above 2GB cannot be serialized to string, which is what polygraphy is doing under the hood. Use a temporary file instead to save the modified onnx with all tensors marked as outputs.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • Improvements
    • Enhanced model processing to better support large ONNX models during validation and runtime execution
    • Added diagnostic logging of model sizes at key processing stages for improved debugging and performance monitoring

✏️ Tip: You can customize this high-level summary in your review settings.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Comment on lines 131 to 134
has_external_data = any(
init.HasField("data_location") and init.data_location == onnx.TensorProto.EXTERNAL
for init in self.model.graph.initializer
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse this function to check for external data?

def has_external_data(onnx_model_path: str):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajrasane I'm not sure I want to introduce dependencies from modelopt.torch here just for this.
Since modelopt/torch/_deploy/utils/torch_onnx.py is already importing quite a few utils from modelopt.onnx.utils, how about I move this function to modelopt.onnx.utils and import it in modelopt.torch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajrasane please revisit - since I edited torch utils, I now need a review from modelopt-torch-deploy-codeowners 🙏

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam galagam force-pushed the dev-gagam-narrowing-error branch from 1a0711c to 7a2d91a Compare January 19, 2026 19:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The changes enhance ONNX model handling with external data support by introducing detection and branching logic in ReferenceRunner, adding model size logging to PrecisionConverter, and updating utils to support external data workflows with modified validation and return signatures.

Changes

Cohort / File(s) Summary
Model Size Logging
modelopt/onnx/autocast/precisionconverter.py
Added print_byte_size(label: str) helper method to serialize and log model byte sizes. Updated _sanity_check to log sizes before and after ONNX validation. Changed model validation in convert to in-place check instead of assignment.
External Data Detection & Handling
modelopt/onnx/autocast/referencerunner.py
Introduced _get_ort_runner(model) method that detects external data requirements (via data_location checks and size thresholds) and branches into either temporary-file-based loading with external data or in-memory path. Updated run method to delegate session construction to new helper. Added imports: tempfile, onnx_utils.
Model Validation & Utility Updates
modelopt/onnx/utils.py
Changed check_model signature from returning onnx.ModelProto to returning None; now performs in-place validation without returning modified model. Updated external data handling to force external data flag in save workflow. Upgraded logging level from debug to warning for size/external data reporting. Added copy import.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as ReferenceRunner
    participant GetRunner as _get_ort_runner()
    participant Model as ONNX Model
    participant FS as File System
    participant ORT as ONNXRuntime

    Runner->>GetRunner: run(model)
    GetRunner->>Model: Check initializers for external data
    alt External Data Detected or Size > 2GB
        GetRunner->>FS: Create temp .onnx file
        GetRunner->>Model: Save with external data enabled
        GetRunner->>FS: Write temp ONNX file
        GetRunner->>ORT: Create InferenceSession from file
        ORT-->>GetRunner: Session ready
    else In-Memory Path
        GetRunner->>ORT: Create Session via BytesFromOnnx
        ORT-->>GetRunner: Session ready
    end
    GetRunner-->>Runner: Return OnnxrtRunner(session)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[5725362] AutoCast Fixes for models with external data' directly and clearly describes the main change: fixing AutoCast to handle models with external data. This aligns with all three commits and the file changes.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@galagam galagam marked this pull request as ready for review January 19, 2026 19:30
@galagam galagam requested review from a team as code owners January 19, 2026 19:31
@galagam galagam requested a review from gcunhase January 19, 2026 19:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@modelopt/onnx/autocast/precisionconverter.py`:
- Around line 85-88: Remove the temporary debug method print_byte_size: it uses
print() with a "GAGAM" prefix and lacks a docstring; either delete this method
from the class or replace it with a properly documented utility that uses the
module logger (e.g., logging.getLogger(__name__)) and a D102-compliant
docstring, and if keeping functionality, rename to something descriptive (e.g.,
log_model_byte_size) and call self.model.SerializeToString() to compute size
before logging the result instead of printing.
- Around line 1298-1303: Remove the debug instrumentation by deleting the calls
to self.print_byte_size(...) inside the _sanity_check method and remove the
print_byte_size method definition entirely; ensure _sanity_check only performs
onnx_utils.check_model(self.model) (and any existing sanity_ok handling) without
emitting byte-size prints, and run tests to confirm no remaining references to
print_byte_size remain in the class.

In `@modelopt/onnx/autocast/referencerunner.py`:
- Around line 160-165: The temp ONNX file created with
tempfile.NamedTemporaryFile(delete=False) and passed to
onnx_utils.save_onnx/InfernceSession is never removed; store the path (e.g.,
self._temp_model_path) and ensure cleanup once the session is no longer needed
by either creating the temp inside a TemporaryDirectory context (mirroring
utils.check_model) that you keep open for the session lifetime or add a
dedicated cleanup method (e.g., close()/cleanup_temp_model) that deletes the
file and call it when the runner is disposed; optionally register that cleanup
with atexit as a fallback to avoid accumulating files.

In `@modelopt/onnx/utils.py`:
- Line 661: The line force-setting save_as_external_data = True is a temporary
debug override that bypasses the prior size-based logic; remove this hardcoded
assignment (the "save_as_external_data = True # GAGAM: for debug" statement) so
the function/method that computes save_as_external_data earlier in the scope can
determine the value normally, and ensure no other debug-only overrides remain in
the same function or surrounding block.
🧹 Nitpick comments (3)
modelopt/onnx/utils.py (1)

647-649: Consider reverting the log level from warning to debug.

This message logs model size and external data usage for every save operation. Using warning level will generate noise in production logs for what is essentially informational/diagnostic output. The original debug level was more appropriate unless there's a specific reason to always surface this information.

Suggested change
-        logger.warning(
+        logger.debug(
             f"Model size: {model_size} bytes, using external data: {save_as_external_data}"
         )
modelopt/onnx/autocast/referencerunner.py (2)

131-152: Consider extracting external data detection to a shared utility.

This logic duplicates functionality that exists in modelopt/torch/_deploy/utils/torch_onnx.py (has_external_data and check_model_uses_external_data). Per a past review comment, consider moving the detection logic to modelopt/onnx/utils.py so it can be reused across the codebase.

The current implementation is correct but consolidating would improve maintainability.


166-166: Lambda closure captures session correctly, but consider clarity.

The lambda lambda: session works because session is captured by reference. However, this pattern can be confusing. The OnnxrtRunner expects a callable that returns a session — passing the session directly via a lambda is acceptable but slightly unusual.

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.11%. Comparing base (391f6cb) to head (caafcbd).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/autocast/referencerunner.py 61.90% 8 Missing ⚠️
modelopt/onnx/utils.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   74.19%   74.11%   -0.08%     
==========================================
  Files         192      191       -1     
  Lines       19238    19271      +33     
==========================================
+ Hits        14273    14283      +10     
- Misses       4965     4988      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Copy link
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@galagam galagam requested a review from a team as a code owner January 20, 2026 15:01
@galagam galagam requested a review from cjluo-nv January 20, 2026 15:01
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam galagam force-pushed the dev-gagam-narrowing-error branch from 2ae99d0 to 00ea80c Compare January 20, 2026 15:14
@galagam galagam requested a review from ajrasane January 20, 2026 15:23
@galagam galagam requested a review from a team January 22, 2026 06:39
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam galagam force-pushed the dev-gagam-narrowing-error branch from 13f2eb8 to caafcbd Compare January 25, 2026 11:53
@galagam galagam requested a review from gcunhase January 25, 2026 11:55
tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)
tmp_file.close()
tmp_file_path = tmp_file.name
onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I think the external data will not be cleaned up here - that's the reason I use TemporaryDirectory instead in Polygraphy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranavm-nvidia Thanks, I'm aware, but it's not an easy fix, because we need the file in a broader context.
This is actually something we have to fix in a few other places.
Filed https://jirasw.nvidia.com/browse/OMNIML-3276 to track this

Copy link
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@galagam galagam merged commit 3840309 into NVIDIA:main Jan 26, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants