-
Notifications
You must be signed in to change notification settings - Fork 3k
[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706
Conversation
Fixes volcengine#4457 The `add_token2text` function was using `vars(result)` which returns a reference to the Pydantic model's internal `__dict__`. When this dict is modified (to add prompt_text and response_text), it also modifies the original model object. This can cause serialization issues with MLflow because the model now has extra fields that aren't in its schema. This fix: - Uses `model_dump()` for Pydantic BaseModel objects to get a proper copy - Falls back to `dict(vars(result))` for other objects to ensure a copy This ensures the original result object remains unchanged and the modified dict passed to MLflow's `span.set_outputs()` is a clean copy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a serialization issue with Pydantic models by using model_dump() instead of vars(). This prevents unintended modification of the original model object. My review includes a suggestion to make the fallback logic for non-Pydantic objects more robust to prevent potential crashes with certain object types.
| if isinstance(result, BaseModel): | ||
| _result = result.model_dump() | ||
| else: | ||
| _result = dict(vars(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback logic dict(vars(result)) is a good improvement for creating a copy, but it's not robust enough for a general utility function. It will raise a TypeError if result is an object that uses __slots__ and does not have a __dict__ attribute. This would cause a crash.
To make this function more robust and prevent potential runtime errors, I suggest explicitly handling objects with __dict__ and __slots__ before falling back to vars().
| if isinstance(result, BaseModel): | |
| _result = result.model_dump() | |
| else: | |
| _result = dict(vars(result)) | |
| if isinstance(result, BaseModel): | |
| _result = result.model_dump() | |
| elif hasattr(result, "__dict__"): | |
| _result = result.__dict__.copy() | |
| elif hasattr(result, "__slots__"): | |
| _result = {slot: getattr(result, slot) for slot in result.__slots__} | |
| else: | |
| # Fallback for unknown types, which might fail. | |
| _result = dict(vars(result)) |
…token2text (volcengine#4706) ## Summary - Fixes volcengine#4457 (token2text in MLflow does not work) - The `add_token2text` function was using `vars(result)` which returns a reference to the Pydantic model's internal `__dict__` - When this dict is modified to add `prompt_text` and `response_text`, it also modifies the original model object - This can cause serialization issues with MLflow because the model now has extra fields that aren't in its schema ## Changes - Use `model_dump()` for Pydantic BaseModel objects to get a proper copy - Fall back to `dict(vars(result))` for other objects to ensure a copy ## Technical Details The bug was that `vars()` on a Pydantic v2 model returns a reference to the model's internal `__dict__`: ```python result = AgentLoopOutput(prompt_ids=[1,2,3], response_ids=[4,5,6]) _result = vars(result) _result['prompt_text'] = 'decoded text' # Now result.__dict__ also has 'prompt_text'! ``` This modifies the original model, which can cause MLflow serialization issues. The fix uses `model_dump()` which returns a proper copy. ## Test plan - [x] Verified syntax is correct - [x] Tested that `model_dump()` creates a proper copy that doesn't modify the original - [x] Tested that the old `vars()` behavior was indeed problematic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: yurekami <yurekami@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Summary
add_token2textfunction was usingvars(result)which returns a reference to the Pydantic model's internal__dict__prompt_textandresponse_text, it also modifies the original model objectChanges
model_dump()for Pydantic BaseModel objects to get a proper copydict(vars(result))for other objects to ensure a copyTechnical Details
The bug was that
vars()on a Pydantic v2 model returns a reference to the model's internal__dict__:This modifies the original model, which can cause MLflow serialization issues. The fix uses
model_dump()which returns a proper copy.Test plan
model_dump()creates a proper copy that doesn't modify the originalvars()behavior was indeed problematic🤖 Generated with Claude Code