Skip to content

Unexpectedly high promptTokenCount for simple greetings in Gemini Live API (ai.google.dev) #1917

@VedantRajp3907

Description

@VedantRajp3907

What I am trying to do

I am building a voice-based conversational agent using the Gemini Developer Live API (ai.google.dev) with the Python SDK (googleapis/python-genai).
The agent streams microphone audio, enables input/output transcription, and tracks token usage via usage_metadata.

I am trying to understand why promptTokenCount is very high (300–500 tokens) even when the user input is only a simple greeting such as “hello”.


What I expected

For a very short user input (e.g., “hello”), I expected:

  • promptTokenCount to be relatively small
  • Growth across turns to roughly correlate with visible conversation history

What actually happens

Even with a minimal input like “hello”, the promptTokenCount is already several hundred tokens on the first turn, and continues to grow across turns ( it might be due to history but why a single hello is 334 tokens).

Example output from my session:

┌──────────────────────────────────────────────────────────────┐
│ 📊 Turn # 1                                                   │
├──────────────────────────────────────────────────────────────┤
│   API UsageMetadata (raw values):                             │
│     promptTokenCount:          334                            │
│     responseTokenCount:         56                            │
│     thoughtsTokenCount:         44                            │
│     totalTokenCount:           390                            │
└──────────────────────────────────────────────────────────────┘

🎤 You:  Hello
🤖 Gemini: Hello how are you?

Later in the same session i said "My name is vedant" and again we get 432 tokens it increased due to history but for this single text message we have 432 tokens or Audio is also included in this:

prompt_token_count=432
response_token_count=78
thoughts_token_count=47
total_token_count=510

prompt_tokens_details=[
  TEXT: 428 tokens
  AUDIO: 4 tokens
]

And the next turn:

┌──────────────────────────────────────────────────────────────┐
│ 📊 Turn # 2                                                   │
├──────────────────────────────────────────────────────────────┤
│   API UsageMetadata (raw values):                             │
│     promptTokenCount:          432                            │
│     responseTokenCount:         78                            │
│     thoughtsTokenCount:         47                            │
│     totalTokenCount:           510                            │
└──────────────────────────────────────────────────────────────┘

This happens even though the user-visible input is just a short greeting.


NOTE: I have passed empty system prompt

gemini_live_audio.py

. Nothing was passed.

What I understand so far

  • promptTokenCount is per request / per turn, not cumulative.

  • In Live API sessions, the prompt appears to include:

    • Prior conversation context
    • Internal session/state wrappers
    • Role formatting and safety framing
    • Audio/transcription-related metadata
  • These internal components are not visible, but still count toward promptTokenCount. Is this true?


Questions

  1. Does the promptTokenCount cumulative token count of all the previous conversation or it is just for a single current turn conversation?
  2. Is this level of prompt overhead expected behavior for Live (bidiGenerateContent) sessions?
  3. Is there any way to inspect or estimate what contributes to the non-user-visible prompt tokens?
  4. Are there recommended configurations to reduce prompt token usage for simple conversational turns (e.g., greetings)?

I also have attached the code files below:

Metadata

Metadata

Assignees

Labels

priority: p3Desirable enhancement or fix. May not be included in next release.status:awaiting user responsetype: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions