Skip to content

Conversation

@rostan-t
Copy link
Collaborator

@rostan-t rostan-t commented Jan 2, 2026

Category: Bug fix (non-breaking change which fixes an issue)

Description:

TFRecord returns a dictionary, which dynamic mode doesn't handle properly. This PR fixes this issue.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
      • test_reader_decoder.py: test_tfrecord
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@greptile-apps
Copy link

greptile-apps bot commented Jan 2, 2026

Greptile Summary

This PR adds support for dictionary return values in DALI's dynamic mode, specifically fixing TFRecord reader compatibility. The changes propagate dictionary handling through the entire invocation pipeline:

  • Core Changes: Modified Invocation, Operator, and Reader classes to detect when operators return dictionaries (via _output_names) and preserve the key-value structure throughout the execution path
  • Type Signatures: Updated TFRecord's next_epoch return type annotations to reflect dict[str, Tensor] or dict[str, Batch] instead of tuples
  • Testing: Added comprehensive test coverage for TFRecord in both single-sample and batch modes
  • Bug Fix: Corrected an f-string formatting error in _tensor.py (line 70)

The implementation is clean and follows the existing architecture patterns. The changes are well-contained and don't affect operators that return tuples.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-structured, maintain backward compatibility for tuple-returning operators, include comprehensive tests, and fix a clear bug. The implementation correctly handles both dictionary and tuple return types throughout the pipeline.
  • No files require special attention

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/dynamic/_invocation.py Added __iter__ method and dict handling in _run_impl to support dictionary return values from operators
dali/python/nvidia/dali/experimental/dynamic/_op_builder.py Modified build_call_function to return dicts when operator has _output_names, unifying batch/tensor handling
dali/python/nvidia/dali/experimental/dynamic/_ops.py Added _output_names tracking and dict return handling throughout Operator and Reader classes
dali/python/nvidia/dali/ops/_signatures.py Updated type signatures for TFRecord's next_epoch to return dict[str, Tensor/Batch]
dali/test/python/experimental_mode/test_reader_decoder.py Added comprehensive test for TFRecord reader covering both single sample and batch modes

Sequence Diagram

sequenceDiagram
    participant User
    participant TFRecord as TFRecord Reader
    participant Operator as Operator._run()
    participant Invocation as Invocation._run_impl()
    participant OpBuilder as build_call_function()
    participant Reader as Reader._samples()/_batches()

    User->>TFRecord: next_epoch(batch_size)
    TFRecord->>Reader: _samples() or _batches()
    Reader->>Operator: _run(ctx, batch_size)
    Operator->>Invocation: Create Invocation
    Invocation->>Invocation: _run_impl(ctx)
    
    Note over Invocation: Execute operator backend
    Invocation->>Invocation: Check result type
    alt Result is dict
        Invocation->>Invocation: Convert to tuple(r.values())
    else Result is tuple/list
        Invocation->>Invocation: Keep as tuple
    else Result is single value
        Invocation->>Invocation: Wrap in tuple
    end
    
    Invocation-->>Operator: Return tuple results
    
    Note over Operator: Check _output_names
    alt _output_names is set
        Operator->>Operator: zip(names, results) → dict
    else _output_names is None
        Operator->>Operator: Return tuple as-is
    end
    
    Operator-->>Reader: dict or tuple
    
    alt Result is dict
        Reader->>Reader: Iterate over dict.values()
        Reader->>Reader: yield dict(zip(names, tensors))
    else Result is tuple
        Reader->>Reader: Iterate over tuple
        Reader->>Reader: yield tuple(tensors)
    end
    
    Reader-->>User: dict[str, Tensor/Batch]
Loading

@greptile-apps
Copy link

greptile-apps bot commented Jan 2, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@rostan-t
Copy link
Collaborator Author

rostan-t commented Jan 2, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41073266]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41073266]: BUILD PASSED

self.run(self._eval_context)
return self._results[result_index].layout()

def __iter__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: what is this needed for? You can iterate an object based on __len__/__getitem__.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that __getitem__ is enough for an object to be iterable but this adds extra overhead and this can be called in a hot path.

Comment on lines 137 to 142
self._num_outputs = self._operator._infer_num_outputs(*self._inputs, **self._args)
assert self._num_outputs is not None
Copy link
Contributor

@mzient mzient Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._num_outputs = self._operator._infer_num_outputs(*self._inputs, **self._args)
assert self._num_outputs is not None
self._num_outputs = self._operator._infer_num_outputs(*self._inputs, **self._args)
assert self._num_outputs is not None

No need to run the assert all the time, especially right after we've checked that the requested condition is met.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I did it because type checkers infer the return type as Any | int | None and they really hate that __len__ can return none but there's actually a less expensive way to fix it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 155b391.

return tuple(
Batch(invocation_result=invocation[i]) for i in range(len(invocation))
)
cls = Batch if is_batch else Tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend something more precise than cls, which might be confused for the operator class.

Suggested change
cls = Batch if is_batch else Tensor
ResultType = Batch if is_batch else Tensor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 64d0c4a

@mzient mzient self-assigned this Jan 5, 2026
if is_batch:

if self._output_names is not None:
return dict(zip(self._output_names, tuple(out)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with non-batch outputs?
Shouldn't you rather convert out to tensors based on is_batch and make a dictionary afterwards?

Copy link
Collaborator Author

@rostan-t rostan-t Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ce374de

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t
Copy link
Collaborator Author

rostan-t commented Jan 5, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41161266]: BUILD STARTED

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t
Copy link
Collaborator Author

rostan-t commented Jan 5, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41163669]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41163669]: BUILD FAILED

@rostan-t rostan-t assigned banasraf and unassigned stiepan Jan 5, 2026
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41163669]: BUILD PASSED

@rostan-t rostan-t merged commit 4aeb8a2 into NVIDIA:main Jan 5, 2026
8 checks passed
@rostan-t rostan-t deleted the ndd-tfrecord branch January 5, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants