Skip to content

Add Otel Integrations#45343

Open
bambriz wants to merge 4 commits intoAzure:mainfrom
bambriz:bambriz-opentelemetry-cosmos
Open

Add Otel Integrations#45343
bambriz wants to merge 4 commits intoAzure:mainfrom
bambriz:bambriz-opentelemetry-cosmos

Conversation

@bambriz
Copy link
Member

@bambriz bambriz commented Feb 24, 2026

Description

This PR adds new integrations for Open Telemetry spans by adding Cosmos Specific attributes to operation level spans as described by the Cosmos Semantics here: https://github.com/devopsleague/opentelemetry-semantic-conventions/blob/main/docs/database/cosmosdb.md

This includes query sanitation. On top of that the otel samples have been updated to show how to use head sampling to reduce the noise level of spans captured. It can range from 0% to 100% capture rate. Using head sampling would significantly reduce the overhead of capturing spans.

Here is a sample of how captured spans look and what information they can hold:

[
  {
    "name": "ContainerProxy.create_item",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "create_item",
      "db.cosmosdb.operation_type": "create",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 6.86,
      "db.cosmosdb.request_diagnostics_id": "c4670c67-d8fe-498a-9312-888025763e16"
    }
  },
  {
    "name": "ContainerProxy.read_item",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "read_item",
      "db.cosmosdb.operation_type": "read",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 1.0,
      "db.cosmosdb.request_diagnostics_id": "a6f42dca-fc8d-4a81-8c0c-47cc9691ebf1"
    }
  },
  {
    "name": "ContainerProxy.query_items",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "query_items",
      "db.cosmosdb.operation_type": "query",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.query.text": "SELECT * FROM c WHERE c.price > @minPrice AND c.category = @category",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 2.83,
      "db.cosmosdb.request_diagnostics_id": "f9d42a8c-3e71-4d9a-bd56-1a8c67e45f23"
    },
    "_comment": "Parameterized query - NOT sanitized. Parameter values (500, 'electronics') are NOT logged."
  },
  {
    "name": "ContainerProxy.query_items",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "query_items",
      "db.cosmosdb.operation_type": "query",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.query.text": "SELECT * FROM c WHERE c.price = ? AND c.name = '?'",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 2.83,
      "db.cosmosdb.request_diagnostics_id": "a2b78f9d-1c45-4e67-9abc-d3f45e67a890"
    },
    "_comment": "Non-parameterized query - SANITIZED. Original values (999.99, 'Laptop') replaced with ? for security."
  },
  {
    "name": "ContainerProxy.upsert_item",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "upsert_item",
      "db.cosmosdb.operation_type": "upsert",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 6.67,
      "db.cosmosdb.request_diagnostics_id": "e15a333f-987f-4182-948f-87ec5ad5059c"
    }
  },
  {
    "name": "ContainerProxy.replace_item",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "replace_item",
      "db.cosmosdb.operation_type": "replace",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 10.67,
      "db.cosmosdb.request_diagnostics_id": "3512dd7c-4d72-4471-8267-48f35df73aec"
    }
  },
  {
    "name": "ContainerProxy.delete_item",
    "attributes": {
      "db.system": "cosmosdb",
      "db.operation.name": "delete_item",
      "db.cosmosdb.operation_type": "delete",
      "db.namespace": "MyDatabase",
      "db.collection.name": "MyContainer",
      "db.cosmosdb.connection_mode": "gateway",
      "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991",
      "db.cosmosdb.request_charge": 4.29
    }
  },
  {
    "name": "ContainerProxy.read_item",
    "status": "ERROR",
    "attributes": {
      "db.system": "cosmosdb",
      "db.cosmosdb.status_code": 404,
      "db.cosmosdb.request_charge": 1.0,
      "db.cosmosdb.request_diagnostics_id": "068762c2-16d3-4bd8-9b27-e9570ed29324",
      "error.type": "azure.cosmos.exceptions.CosmosResourceNotFoundError"
    },
    "_comment": "Error case - db.system attribute is present even when operation fails (404 Not Found)."
  }
]

Adding new decorators and tests to allow cosmos specific attributes to be added into otel spans.
Copilot AI review requested due to automatic review settings February 24, 2026 22:47
@bambriz bambriz requested a review from a team as a code owner February 24, 2026 22:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Cosmos DB OpenTelemetry operation-level span enrichment following proposed Cosmos semantic conventions, including query text sanitization and updated tracing samples.

Changes:

  • Introduces sync + async decorators to add Cosmos-specific OTel attributes to spans, and adds new OTel attribute/operation-type constants.
  • Applies the new decorators across core Cosmos client/proxy operations (sync and async).
  • Adds unit/integration tests for query sanitization + span attributes, and updates the OpenTelemetry tracing sample to demonstrate head sampling.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_span_attributes.py New sync decorator + query sanitization + attribute extraction from request/response.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_span_attributes_async.py New async decorator that reuses sync telemetry helpers.
sdk/cosmos/azure-cosmos/azure/cosmos/_constants.py Adds Cosmos OpenTelemetry attribute keys and operation type constants.
sdk/cosmos/azure-cosmos/azure/cosmos/container.py Adds telemetry decorator to container operations (create/read/query/etc).
sdk/cosmos/azure-cosmos/azure/cosmos/database.py Adds telemetry decorator to database operations (containers/users/throughput/etc).
sdk/cosmos/azure-cosmos/azure/cosmos/user.py Adds telemetry decorator to user/permission operations.
sdk/cosmos/azure-cosmos/azure/cosmos/cosmos_client.py Adds telemetry decorator to client create_database.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py Adds telemetry decorator to async container operations.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py Adds telemetry decorator to async database operations.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_user.py Adds telemetry decorator to async user/permission operations.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py Adds telemetry decorator to async client operations.
sdk/cosmos/azure-cosmos/samples/tracing_open_telemetry.py Updates sample to show basic tracing + head sampling configuration.
sdk/cosmos/azure-cosmos/tests/test_query_sanitization.py New unit tests for sanitization + attribute injection helpers (sync).
sdk/cosmos/azure-cosmos/tests/test_query_sanitization_async.py New unit tests intended for async sanitization behavior.
sdk/cosmos/azure-cosmos/tests/test_telemetry_integration.py New live integration tests validating key span attributes (sync).
sdk/cosmos/azure-cosmos/tests/test_telemetry_integration_async.py New live integration tests validating key span attributes (async).

Comment on lines +106 to +109
# Extract query and parameters BEFORE the function runs (for telemetry)
query_for_telemetry = func_kwargs.get('query')
parameters_for_telemetry = func_kwargs.get('parameters')

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the async version, this wrapper only captures query/parameters from **func_kwargs. If callers pass query positionally (supported by methods like ContainerProxy.query_items), the query won’t be included/sanitized in telemetry because the wrapper doesn’t see the callee’s internal positional-to-kwarg normalization. Consider also deriving query/parameters from args when applicable.

Copilot uses AI. Check for mistakes.
...

@distributed_trace_async
@cosmos_span_attributes_async()
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_database is decorated with @cosmos_span_attributes_async() without an operation_type, so spans for this operation won’t include db.cosmosdb.operation_type. This looks inconsistent with the sync client (create_database uses operation_type=...CREATE) and with other async methods in this PR. Consider setting operation_type=Constants.OpenTelemetryOperationTypes.CREATE here.

Suggested change
@cosmos_span_attributes_async()
@cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.CREATE)

Copilot uses AI. Check for mistakes.
return self.client_connection.ReadItem(document_link=doc_link, options=request_options, **kwargs)

@distributed_trace
@cosmos_span_attributes(operation_type=Constants.OpenTelemetryOperationTypes.READ)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_items is documented as a batched point-read (“read-many”) operation, but it’s tagged with operation_type=...READ. In the async client, the same method is tagged as BATCH, so telemetry will differ between sync/async. Consider aligning both sides (and likely using BATCH for read-many) to keep db.cosmosdb.operation_type consistent.

Suggested change
@cosmos_span_attributes(operation_type=Constants.OpenTelemetryOperationTypes.READ)
@cosmos_span_attributes(operation_type=Constants.OpenTelemetryOperationTypes.BATCH)

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +20
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SpanExporter, SpanExportResult, SimpleSpanProcessor
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test module hard-depends on opentelemetry-sdk at import time. sdk/cosmos/azure-cosmos/dev_requirements.txt currently doesn’t include OpenTelemetry packages, so these tests may fail to import in CI. Consider adding OpenTelemetry to the package’s test/dev requirements or skipping the tests when the dependency isn’t installed.

Copilot uses AI. Check for mistakes.
client = CosmosClient(test_config.TestConfig.host, test_config.TestConfig.masterKey)
try:
await client.delete_database(cls.test_db_name)
except:
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid a bare except: in the async cleanup; it can hide unexpected errors and also catches KeyboardInterrupt/SystemExit. Prefer catching Exception or the specific Cosmos exception(s) you expect when the database doesn’t exist.

Suggested change
except:
except exceptions.CosmosResourceNotFoundError:

Copilot uses AI. Check for mistakes.
return items

@distributed_trace_async
@cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.BATCH)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async read_items is tagged with operation_type=...BATCH, while sync ContainerProxy.read_items is tagged as READ. If db.cosmosdb.operation_type is intended to be consistent across sync/async APIs, one side should be updated so the same logical operation reports the same operation type.

Suggested change
@cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.BATCH)
@cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.READ)

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +26
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SpanExporter, SpanExportResult, SimpleSpanProcessor
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests import opentelemetry-sdk/opentelemetry modules, but sdk/cosmos/azure-cosmos/dev_requirements.txt doesn’t list OpenTelemetry dependencies. In CI, this will likely fail at import time unless OpenTelemetry is installed transitively. Consider adding the needed packages to test/dev requirements or skipping these tests when OpenTelemetry isn’t installed (e.g., conditional import + unittest.skipIf).

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +63
settings.tracing_implementation = OpenTelemetrySpan

# Set up in-memory exporter
cls.exporter = InMemorySpanExporter()
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(cls.exporter))
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.tracing_implementation and the global OpenTelemetry tracer provider are modified in setUpClass, but they aren’t restored in tearDownClass. This can leak tracing configuration into other tests in the same process and cause hard-to-debug failures. Consider saving the previous settings.tracing_implementation/tracer provider in setUpClass and restoring them in tearDownClass.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +58
# Enable OpenTelemetry tracing
settings.tracing_implementation = OpenTelemetrySpan

# Set up in-memory exporter
cls.exporter = InMemorySpanExporter()
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(cls.exporter))

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.tracing_implementation and the global OpenTelemetry tracer provider are set in setUpClass but never restored. This can affect other tests running in the same process. Consider capturing previous values in setUpClass and restoring them in tearDownClass (and/or using trace.get_tracer_provider() safely if another provider is already configured).

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +93
# Create tracer provider with the sampler
trace.set_tracer_provider(TracerProvider(sampler=sampler))

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace.set_tracer_provider(...) is called in both basic_tracing_example and head_sampling_example. In OpenTelemetry Python, the tracer provider is effectively global and typically only set once; subsequent calls may be ignored, meaning the sampler in head_sampling_example might not take effect if basic_tracing_example ran first. Consider configuring the provider once (or explicitly documenting that these examples should be run separately).

Copilot uses AI. Check for mistakes.
)

@distributed_trace_async
@cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.DELETE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be BATCH maybe?

name_of_span: Optional[str] = None,
kind: Optional[Any] = None,
tracing_attributes: Optional[Mapping[str, Any]] = None,
operation_type: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made non-optional? (I'm not very familiar with Python decorators).

CREATE: Literal["create"] = "create"
DELETE: Literal["delete"] = "delete"
EXECUTE: Literal["execute"] = "execute"
EXECUTE_JAVASCRIPT: Literal["execute_javascript"] = "execute_javascript"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't referenced?

OpenTelemetry setup with head sampling (probabilistic sampling).

Head sampling decides at the start of a trace whether to sample it.
This reduces overhead by only tracing a percentage of requests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the code in the attribute doesn't run at all, or the resulting trace is not output?

## Release History

### 4.15.1 (Unreleased)
### 4.15.0b3 (Unreleased)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong merge to this file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants