[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706

yurekami · 2025-12-28T16:33:35Z

Summary

Fixes Verl token2text in mlflow not working #4457 (token2text in MLflow does not work)
The add_token2text function was using vars(result) which returns a reference to the Pydantic model's internal __dict__
When this dict is modified to add prompt_text and response_text, it also modifies the original model object
This can cause serialization issues with MLflow because the model now has extra fields that aren't in its schema

Changes

Use model_dump() for Pydantic BaseModel objects to get a proper copy
Fall back to dict(vars(result)) for other objects to ensure a copy

Technical Details

The bug was that vars() on a Pydantic v2 model returns a reference to the model's internal __dict__:

result = AgentLoopOutput(prompt_ids=[1,2,3], response_ids=[4,5,6])
_result = vars(result)
_result['prompt_text'] = 'decoded text'
# Now result.__dict__ also has 'prompt_text'!

This modifies the original model, which can cause MLflow serialization issues. The fix uses model_dump() which returns a proper copy.

Test plan

Verified syntax is correct
Tested that model_dump() creates a proper copy that doesn't modify the original
Tested that the old vars() behavior was indeed problematic

🤖 Generated with Claude Code

Fixes volcengine#4457 The `add_token2text` function was using `vars(result)` which returns a reference to the Pydantic model's internal `__dict__`. When this dict is modified (to add prompt_text and response_text), it also modifies the original model object. This can cause serialization issues with MLflow because the model now has extra fields that aren't in its schema. This fix: - Uses `model_dump()` for Pydantic BaseModel objects to get a proper copy - Falls back to `dict(vars(result))` for other objects to ensure a copy This ensures the original result object remains unchanged and the modified dict passed to MLflow's `span.set_outputs()` is a clean copy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request correctly addresses a serialization issue with Pydantic models by using model_dump() instead of vars(). This prevents unintended modification of the original model object. My review includes a suggestion to make the fallback logic for non-Pydantic objects more robust to prevent potential crashes with certain object types.

gemini-code-assist · 2025-12-28T16:34:42Z

verl/utils/rollout_trace.py

+                if isinstance(result, BaseModel):
+                    _result = result.model_dump()
+                else:
+                    _result = dict(vars(result))


The fallback logic dict(vars(result)) is a good improvement for creating a copy, but it's not robust enough for a general utility function. It will raise a TypeError if result is an object that uses __slots__ and does not have a __dict__ attribute. This would cause a crash.

To make this function more robust and prevent potential runtime errors, I suggest explicitly handling objects with __dict__ and __slots__ before falling back to vars().

Suggested change

if isinstance(result, BaseModel):

_result = result.model_dump()

else:

_result = dict(vars(result))

if isinstance(result, BaseModel):

_result = result.model_dump()

elif hasattr(result, "__dict__"):

_result = result.__dict__.copy()

elif hasattr(result, "__slots__"):

_result = {slot: getattr(result, slot) for slot in result.__slots__}

else:

# Fallback for unknown types, which might fail.

_result = dict(vars(result))

…token2text (volcengine#4706) ## Summary - Fixes volcengine#4457 (token2text in MLflow does not work) - The `add_token2text` function was using `vars(result)` which returns a reference to the Pydantic model's internal `__dict__` - When this dict is modified to add `prompt_text` and `response_text`, it also modifies the original model object - This can cause serialization issues with MLflow because the model now has extra fields that aren't in its schema ## Changes - Use `model_dump()` for Pydantic BaseModel objects to get a proper copy - Fall back to `dict(vars(result))` for other objects to ensure a copy ## Technical Details The bug was that `vars()` on a Pydantic v2 model returns a reference to the model's internal `__dict__`: ```python result = AgentLoopOutput(prompt_ids=[1,2,3], response_ids=[4,5,6]) _result = vars(result) _result['prompt_text'] = 'decoded text' # Now result.__dict__ also has 'prompt_text'! ``` This modifies the original model, which can cause MLflow serialization issues. The fix uses `model_dump()` which returns a proper copy. ## Test plan - [x] Verified syntax is correct - [x] Tested that `model_dump()` creates a proper copy that doesn't modify the original - [x] Tested that the old `vars()` behavior was indeed problematic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: yurekami <yurekami@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist bot reviewed Dec 28, 2025

View reviewed changes

wuxibin89 changed the title ~~fix: use model_dump() for proper Pydantic serialization in token2text~~ [rollout] fix: use model_dump() for proper Pydantic serialization in token2text Jan 5, 2026

wuxibin89 approved these changes Jan 5, 2026

View reviewed changes

wuxibin89 merged commit 0f61557 into volcengine:main Jan 5, 2026
41 of 56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706

[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706

Uh oh!

yurekami commented Dec 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706

[rollout] fix: use model_dump() for proper Pydantic serialization in token2text #4706

Uh oh!

Conversation

yurekami commented Dec 28, 2025

Summary

Changes

Technical Details

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants