-
Notifications
You must be signed in to change notification settings - Fork 725
Description
Description
I have noticed a significant discrepancy in token usage calculation for the same image when using the google-genai SDK, depending on whether the backend is Vertex AI or the Gemini API (AI Studio).
Using the same model (gemini-2.5-flash) and the same input image (
- Gemini API: ~258 tokens.
- Vertex AI: ~1806 tokens.
Environment details
- SDK version: google-genai 1.12.1
- Python version: 3.10.16
Reproduction Code
import json
import os
import rich
from google import genai
from google.oauth2 import service_account
from PIL import Image
def get_gemini_client(vertexai: bool = False):
if vertexai:
# Assumes credentials.json is present
credentials = json.loads(open("credentials.json").read())
return genai.Client(
vertexai=True,
project=os.getenv("GOOGLE_CLOUD_PROJECT"),
location="us-central1", # or global
credentials=service_account.Credentials.from_service_account_info(
credentials,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
),
)
else:
return genai.Client(
api_key=os.getenv("GEMINI_API_KEY"),
)
if __name__ == "__main__":
# Test image size: (700, 1003)
image = Image.open("test.png")
print(f"Image size: {image.size}")
model = "gemini-2.5-flash"
print("-" * 100)
print("Vertex AI:")
vertex_client = get_gemini_client(vertexai=True)
vertex_count = vertex_client.models.count_tokens(model=model, contents=[image])
print(f"Token count (Vertex): {vertex_count}")
vertex_response = vertex_client.models.generate_content(
model=model,
contents=[image, "Describe the image in detail."],
)
print("Vertex AI usage metadata:")
rich.print(vertex_response.usage_metadata.prompt_tokens_details)
print("-" * 100)
print("Gemini API:")
gemini_client = get_gemini_client(vertexai=False)
gemini_count = gemini_client.models.count_tokens(model=model, contents=[image])
print(f"Token count (Gemini API): {gemini_count}")
gemini_response = gemini_client.models.generate_content(
model=model,
contents=[image, "Describe the image in detail."],
)
print("Gemini API usage metadata:")
rich.print(gemini_response.usage_metadata.prompt_tokens_details)
print("-" * 100)Logs / Output
Image size: (700, 1003)
----------------------------------------------------------------------------------------------------
Vertex AI:
Token count (Vertex): total_tokens=1806 cached_content_token_count=None
Vertex AI usage metadata:
[
ModalityTokenCount(modality=<MediaModality.IMAGE: 'IMAGE'>, token_count=1806),
ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=6)
]
----------------------------------------------------------------------------------------------------
Gemini API:
Token count (Gemini API): total_tokens=259 cached_content_token_count=None
Gemini API usage metadata:
[
ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=7),
ModalityTokenCount(modality=<MediaModality.IMAGE: 'IMAGE'>, token_count=258)
]
----------------------------------------------------------------------------------------------------
Analysis
According to the official documentation for Token Calculation (Gemini 2.0/2.5): 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens.
For an image of size 700x1003:
- Vertex AI returns 1806 tokens. Since
$1806 / 258 = 7$ , Vertex seems to be breaking the image into 7 tiles. - Gemini API returns 258 tokens. This implies it is treating the image as a single tile, possibly resizing it heavily or calculating differently.
Expected Behavior: I expect consistency between the two backends for the same model and image. If the tiling logic is standard, both should return similar token counts. Currently, using Vertex AI is ~7x more expensive for the exact same inference request.
Could you clarify if this is intended behavior, a bug in the Vertex implementation of the tokenizer, or an issue with how the SDK handles images for different clients?