Skip to content

[BUG] Batch API InlinedResponses are not returned in order #1909

@boasbakker

Description

@boasbakker

Summary

When using the Batch API with inlinedRequests, the resulting inlinedResponses are returned in a the wrong order. This directly contradicts the official documentation, which states:

"The responses will be in the same order as the input requests."

This issue, combined with #1886 , renders the inlinedRequests feature unusable for batch processing, as there is currently no way to correlate an output response to its specific input request.

Environment details

  • Programming language: Python
  • OS: Windows 11
  • Language runtime version: 3.13.3
  • Package version: google-genai 1.56.0

Steps to reproduce

  1. Initialize the google.genai client.
  2. Create a list of inline requests (NUM_REQUESTS = 200) where each prompt requests a specific, predictable output (e.g., "Respond with YES" or "Respond with NO") based on the index.
  3. Submit the batch using client.batches.create.
  4. Wait for the batch to succeed.
  5. Iterate through batch_job.dest.inlined_responses.
  6. Compare the content of response i with the expected output for request i.

Expected Behavior:
The response at index i in inlined_responses corresponds to the request at index i in inlined_requests.

Actual Behavior:
The responses are not in the correct order. In my test of 200 random boolean questions, 96 responses do not match their index position. With another test I've found there is some pattern in this shuffling, but I've not been able to find the exact way it's been shuffled. So far I've ran this script 5 times, in 4/5 cases 96 responses do not match, which seems to imply the shuffling is always in the same way. In the other case no shuffling occurred.

Reproduction Script

"""
Minimal reproduction script for Gemini Batch API response ordering bug.

This script sends batch requests with predictable expected responses
and checks if they come back in the correct order.

Expected behavior: Response at index i should match request at index i
Actual behavior: Responses may be shuffled/out of order
"""
import time
from google import genai

# Configuration
MODEL = "gemini-3-flash-preview"
NUM_REQUESTS = 200  # Number of requests to test ordering
SEED = 0  # Fixed RNG seed for reproducibility

client = genai.Client()  # Uses GOOGLE_API_KEY env var

import random
# Fixed RNG for reproducible runs
random.seed(SEED)
print(f"Using random seed: {SEED}")
# Create random pattern of YES/NO
inline_requests = []
random_pattern = []

for i in range(NUM_REQUESTS):
    expected = random.choice(["YES", "NO"])
    random_pattern.append(expected)
    prompt = f"Request #{i}: Respond with exactly one word: {expected}"
    inline_requests.append({
        'contents': [{
            'parts': [{'text': prompt}],
            'role': 'user'
        }],
        'config': {
            'temperature': 0,
            'max_output_tokens': 5,
            'thinking_config': {'thinking_level': 'minimal'}
        }
    })
batch_job = client.batches.create(
    model=MODEL,
    src=inline_requests,
    config={'display_name': 'order-bug-reproduction'}
)
print(f"Batch of {NUM_REQUESTS} requests created: {batch_job.name}")

# Poll for completion
completed_states = {'JOB_STATE_SUCCEEDED', 'JOB_STATE_FAILED', 'JOB_STATE_CANCELLED', 'JOB_STATE_EXPIRED'}
while True:
    batch_job = client.batches.get(name=batch_job.name)
    state = batch_job.state.name
    print(f"Status: {state}")
    if state in completed_states:
        break
    time.sleep(3)

if batch_job.state.name != 'JOB_STATE_SUCCEEDED':
    print(f"Batch failed with state: {batch_job.state.name}")
else:
    num_mismatches = 0
    responses = batch_job.dest.inlined_responses

    for i, resp in enumerate(responses):
        expected = random_pattern[i]
        if resp.response:
            actual = resp.response.text.strip().upper()
            match = "YES" in actual if expected == "YES" else "NO" in actual
            if not match:
                num_mismatches += 1

    print(f"\nTotal mismatches: {num_mismatches} out of {NUM_REQUESTS} requests")

    if num_mismatches:
        print("\nBUG CONFIRMED: Responses are NOT in the same order as requests!")
    else:
        print("\n✓ All responses matched expected order (bug not reproduced in this run)")

Impact & Priority

This should have high priority.
This bug effectively breaks the inlinedRequests functionality of the Batch API.

Developers rely on one of two methods to map inputs to outputs in batch processing:

  1. Strict Ordering: (Guaranteed by documentation, but broken by this bug).
  2. Metadata/IDs: Attaching a custom_id or metadata field to the request and reading it in the response.

However, strictly ordering is currently broken (this issue), and metadata passthrough is also broken (see related issue ** #1886 **).

Because both ordering is unreliable and metadata is inaccessible, it is impossible to determine which output belongs to which input when using inlinedRequests. This forces developers to abandon inlinedRequests entirely and rely on GCS Bucket (File) I/O, which significantly increases implementation complexity for smaller batch jobs.

Metadata

Metadata

Assignees

Labels

api: batchIssues related to the Batch API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions