Unexpectedly high promptTokenCount for simple greetings in Gemini Live API (ai.google.dev)

### What I am trying to do

I am building a **voice-based conversational agent** using the **Gemini Developer Live API** (`ai.google.dev`) with the Python SDK (`googleapis/python-genai`).
The agent streams microphone audio, enables input/output transcription, and tracks token usage via `usage_metadata`.

I am trying to understand **why `promptTokenCount` is very high (300–500 tokens)** even when the user input is only a simple greeting such as “hello”.

---

### What I expected

For a very short user input (e.g., “hello”), I expected:

* `promptTokenCount` to be relatively small
* Growth across turns to roughly correlate with visible conversation history

---

### What actually happens

Even with a minimal input like “hello”, the `promptTokenCount` is already several hundred tokens on the first turn, and continues to grow across turns ( it might be due to history but why a single hello is 334 tokens).

Example output from my session:

```
┌──────────────────────────────────────────────────────────────┐
│ 📊 Turn # 1                                                   │
├──────────────────────────────────────────────────────────────┤
│   API UsageMetadata (raw values):                             │
│     promptTokenCount:          334                            │
│     responseTokenCount:         56                            │
│     thoughtsTokenCount:         44                            │
│     totalTokenCount:           390                            │
└──────────────────────────────────────────────────────────────┘

🎤 You:  Hello
🤖 Gemini: Hello how are you?
```

Later in the same session i said "My name is vedant" and again we get 432 tokens it increased due to history but for this single text message we have 432 tokens or Audio is also included in this:

```
prompt_token_count=432
response_token_count=78
thoughts_token_count=47
total_token_count=510

prompt_tokens_details=[
  TEXT: 428 tokens
  AUDIO: 4 tokens
]
```

And the next turn:

```
┌──────────────────────────────────────────────────────────────┐
│ 📊 Turn # 2                                                   │
├──────────────────────────────────────────────────────────────┤
│   API UsageMetadata (raw values):                             │
│     promptTokenCount:          432                            │
│     responseTokenCount:         78                            │
│     thoughtsTokenCount:         47                            │
│     totalTokenCount:           510                            │
└──────────────────────────────────────────────────────────────┘
```

This happens even though the **user-visible input is just a short greeting**.

---

## NOTE: I have passed empty system prompt

[gemini_live_audio.py](https://github.com/user-attachments/files/24466951/gemini_live_audio.py)

. Nothing was passed.

### What I understand so far

* `promptTokenCount` is **per request / per turn**, not cumulative.
* In Live API sessions, the prompt appears to include:

  * Prior conversation context
  * Internal session/state wrappers
  * Role formatting and safety framing
  * Audio/transcription-related metadata
* These internal components are **not visible**, but still count toward `promptTokenCount`. Is this true?

---

### Questions

1. Does the promptTokenCount cumulative token count of all the previous conversation or it is just for a single current turn conversation?
2. Is this level of prompt overhead **expected behavior** for Live (bidiGenerateContent) sessions?
3. Is there any way to **inspect or estimate** what contributes to the non-user-visible prompt tokens?
4. Are there recommended configurations to **reduce prompt token usage** for simple conversational turns (e.g., greetings)?
---

I also have attached the code files below:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpectedly high promptTokenCount for simple greetings in Gemini Live API (ai.google.dev) #1917

What I am trying to do

What I expected

What actually happens

NOTE: I have passed empty system prompt

What I understand so far

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpectedly high promptTokenCount for simple greetings in Gemini Live API (ai.google.dev) #1917

Description

What I am trying to do

What I expected

What actually happens

NOTE: I have passed empty system prompt

What I understand so far

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions