Conversation
Adding new decorators and tests to allow cosmos specific attributes to be added into otel spans.
There was a problem hiding this comment.
Pull request overview
Adds Cosmos DB OpenTelemetry operation-level span enrichment following proposed Cosmos semantic conventions, including query text sanitization and updated tracing samples.
Changes:
- Introduces sync + async decorators to add Cosmos-specific OTel attributes to spans, and adds new OTel attribute/operation-type constants.
- Applies the new decorators across core Cosmos client/proxy operations (sync and async).
- Adds unit/integration tests for query sanitization + span attributes, and updates the OpenTelemetry tracing sample to demonstrate head sampling.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_span_attributes.py | New sync decorator + query sanitization + attribute extraction from request/response. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_span_attributes_async.py | New async decorator that reuses sync telemetry helpers. |
| sdk/cosmos/azure-cosmos/azure/cosmos/_constants.py | Adds Cosmos OpenTelemetry attribute keys and operation type constants. |
| sdk/cosmos/azure-cosmos/azure/cosmos/container.py | Adds telemetry decorator to container operations (create/read/query/etc). |
| sdk/cosmos/azure-cosmos/azure/cosmos/database.py | Adds telemetry decorator to database operations (containers/users/throughput/etc). |
| sdk/cosmos/azure-cosmos/azure/cosmos/user.py | Adds telemetry decorator to user/permission operations. |
| sdk/cosmos/azure-cosmos/azure/cosmos/cosmos_client.py | Adds telemetry decorator to client create_database. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py | Adds telemetry decorator to async container operations. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py | Adds telemetry decorator to async database operations. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_user.py | Adds telemetry decorator to async user/permission operations. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py | Adds telemetry decorator to async client operations. |
| sdk/cosmos/azure-cosmos/samples/tracing_open_telemetry.py | Updates sample to show basic tracing + head sampling configuration. |
| sdk/cosmos/azure-cosmos/tests/test_query_sanitization.py | New unit tests for sanitization + attribute injection helpers (sync). |
| sdk/cosmos/azure-cosmos/tests/test_query_sanitization_async.py | New unit tests intended for async sanitization behavior. |
| sdk/cosmos/azure-cosmos/tests/test_telemetry_integration.py | New live integration tests validating key span attributes (sync). |
| sdk/cosmos/azure-cosmos/tests/test_telemetry_integration_async.py | New live integration tests validating key span attributes (async). |
| # Extract query and parameters BEFORE the function runs (for telemetry) | ||
| query_for_telemetry = func_kwargs.get('query') | ||
| parameters_for_telemetry = func_kwargs.get('parameters') | ||
|
|
There was a problem hiding this comment.
Like the async version, this wrapper only captures query/parameters from **func_kwargs. If callers pass query positionally (supported by methods like ContainerProxy.query_items), the query won’t be included/sanitized in telemetry because the wrapper doesn’t see the callee’s internal positional-to-kwarg normalization. Consider also deriving query/parameters from args when applicable.
| ... | ||
|
|
||
| @distributed_trace_async | ||
| @cosmos_span_attributes_async() |
There was a problem hiding this comment.
create_database is decorated with @cosmos_span_attributes_async() without an operation_type, so spans for this operation won’t include db.cosmosdb.operation_type. This looks inconsistent with the sync client (create_database uses operation_type=...CREATE) and with other async methods in this PR. Consider setting operation_type=Constants.OpenTelemetryOperationTypes.CREATE here.
| @cosmos_span_attributes_async() | |
| @cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.CREATE) |
| return self.client_connection.ReadItem(document_link=doc_link, options=request_options, **kwargs) | ||
|
|
||
| @distributed_trace | ||
| @cosmos_span_attributes(operation_type=Constants.OpenTelemetryOperationTypes.READ) |
There was a problem hiding this comment.
read_items is documented as a batched point-read (“read-many”) operation, but it’s tagged with operation_type=...READ. In the async client, the same method is tagged as BATCH, so telemetry will differ between sync/async. Consider aligning both sides (and likely using BATCH for read-many) to keep db.cosmosdb.operation_type consistent.
| @cosmos_span_attributes(operation_type=Constants.OpenTelemetryOperationTypes.READ) | |
| @cosmos_span_attributes(operation_type=Constants.OpenTelemetryOperationTypes.BATCH) |
| from opentelemetry import trace | ||
| from opentelemetry.sdk.trace import TracerProvider | ||
| from opentelemetry.sdk.trace.export import SpanExporter, SpanExportResult, SimpleSpanProcessor |
There was a problem hiding this comment.
This test module hard-depends on opentelemetry-sdk at import time. sdk/cosmos/azure-cosmos/dev_requirements.txt currently doesn’t include OpenTelemetry packages, so these tests may fail to import in CI. Consider adding OpenTelemetry to the package’s test/dev requirements or skipping the tests when the dependency isn’t installed.
| client = CosmosClient(test_config.TestConfig.host, test_config.TestConfig.masterKey) | ||
| try: | ||
| await client.delete_database(cls.test_db_name) | ||
| except: |
There was a problem hiding this comment.
Avoid a bare except: in the async cleanup; it can hide unexpected errors and also catches KeyboardInterrupt/SystemExit. Prefer catching Exception or the specific Cosmos exception(s) you expect when the database doesn’t exist.
| except: | |
| except exceptions.CosmosResourceNotFoundError: |
| return items | ||
|
|
||
| @distributed_trace_async | ||
| @cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.BATCH) |
There was a problem hiding this comment.
Async read_items is tagged with operation_type=...BATCH, while sync ContainerProxy.read_items is tagged as READ. If db.cosmosdb.operation_type is intended to be consistent across sync/async APIs, one side should be updated so the same logical operation reports the same operation type.
| @cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.BATCH) | |
| @cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.READ) |
| from opentelemetry import trace | ||
| from opentelemetry.sdk.trace import TracerProvider | ||
| from opentelemetry.sdk.trace.export import SpanExporter, SpanExportResult, SimpleSpanProcessor |
There was a problem hiding this comment.
These tests import opentelemetry-sdk/opentelemetry modules, but sdk/cosmos/azure-cosmos/dev_requirements.txt doesn’t list OpenTelemetry dependencies. In CI, this will likely fail at import time unless OpenTelemetry is installed transitively. Consider adding the needed packages to test/dev requirements or skipping these tests when OpenTelemetry isn’t installed (e.g., conditional import + unittest.skipIf).
| settings.tracing_implementation = OpenTelemetrySpan | ||
|
|
||
| # Set up in-memory exporter | ||
| cls.exporter = InMemorySpanExporter() | ||
| trace.set_tracer_provider(TracerProvider()) | ||
| trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(cls.exporter)) |
There was a problem hiding this comment.
settings.tracing_implementation and the global OpenTelemetry tracer provider are modified in setUpClass, but they aren’t restored in tearDownClass. This can leak tracing configuration into other tests in the same process and cause hard-to-debug failures. Consider saving the previous settings.tracing_implementation/tracer provider in setUpClass and restoring them in tearDownClass.
| # Enable OpenTelemetry tracing | ||
| settings.tracing_implementation = OpenTelemetrySpan | ||
|
|
||
| # Set up in-memory exporter | ||
| cls.exporter = InMemorySpanExporter() | ||
| trace.set_tracer_provider(TracerProvider()) | ||
| trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(cls.exporter)) | ||
|
|
There was a problem hiding this comment.
settings.tracing_implementation and the global OpenTelemetry tracer provider are set in setUpClass but never restored. This can affect other tests running in the same process. Consider capturing previous values in setUpClass and restoring them in tearDownClass (and/or using trace.get_tracer_provider() safely if another provider is already configured).
| # Create tracer provider with the sampler | ||
| trace.set_tracer_provider(TracerProvider(sampler=sampler)) | ||
|
|
There was a problem hiding this comment.
trace.set_tracer_provider(...) is called in both basic_tracing_example and head_sampling_example. In OpenTelemetry Python, the tracer provider is effectively global and typically only set once; subsequent calls may be ignored, meaning the sampler in head_sampling_example might not take effect if basic_tracing_example ran first. Consider configuring the provider once (or explicitly documenting that these examples should be run separately).
…briz/azure-sdk-for-python into bambriz-opentelemetry-cosmos
| ) | ||
|
|
||
| @distributed_trace_async | ||
| @cosmos_span_attributes_async(operation_type=Constants.OpenTelemetryOperationTypes.DELETE) |
| name_of_span: Optional[str] = None, | ||
| kind: Optional[Any] = None, | ||
| tracing_attributes: Optional[Mapping[str, Any]] = None, | ||
| operation_type: Optional[str] = None, |
There was a problem hiding this comment.
Can this be made non-optional? (I'm not very familiar with Python decorators).
| CREATE: Literal["create"] = "create" | ||
| DELETE: Literal["delete"] = "delete" | ||
| EXECUTE: Literal["execute"] = "execute" | ||
| EXECUTE_JAVASCRIPT: Literal["execute_javascript"] = "execute_javascript" |
| OpenTelemetry setup with head sampling (probabilistic sampling). | ||
|
|
||
| Head sampling decides at the start of a trace whether to sample it. | ||
| This reduces overhead by only tracing a percentage of requests. |
There was a problem hiding this comment.
So the code in the attribute doesn't run at all, or the resulting trace is not output?
| ## Release History | ||
|
|
||
| ### 4.15.1 (Unreleased) | ||
| ### 4.15.0b3 (Unreleased) |
Description
This PR adds new integrations for Open Telemetry spans by adding Cosmos Specific attributes to operation level spans as described by the Cosmos Semantics here: https://github.com/devopsleague/opentelemetry-semantic-conventions/blob/main/docs/database/cosmosdb.md
This includes query sanitation. On top of that the otel samples have been updated to show how to use head sampling to reduce the noise level of spans captured. It can range from 0% to 100% capture rate. Using head sampling would significantly reduce the overhead of capturing spans.
Here is a sample of how captured spans look and what information they can hold:
[ { "name": "ContainerProxy.create_item", "attributes": { "db.system": "cosmosdb", "db.operation.name": "create_item", "db.cosmosdb.operation_type": "create", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 6.86, "db.cosmosdb.request_diagnostics_id": "c4670c67-d8fe-498a-9312-888025763e16" } }, { "name": "ContainerProxy.read_item", "attributes": { "db.system": "cosmosdb", "db.operation.name": "read_item", "db.cosmosdb.operation_type": "read", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 1.0, "db.cosmosdb.request_diagnostics_id": "a6f42dca-fc8d-4a81-8c0c-47cc9691ebf1" } }, { "name": "ContainerProxy.query_items", "attributes": { "db.system": "cosmosdb", "db.operation.name": "query_items", "db.cosmosdb.operation_type": "query", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.query.text": "SELECT * FROM c WHERE c.price > @minPrice AND c.category = @category", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 2.83, "db.cosmosdb.request_diagnostics_id": "f9d42a8c-3e71-4d9a-bd56-1a8c67e45f23" }, "_comment": "Parameterized query - NOT sanitized. Parameter values (500, 'electronics') are NOT logged." }, { "name": "ContainerProxy.query_items", "attributes": { "db.system": "cosmosdb", "db.operation.name": "query_items", "db.cosmosdb.operation_type": "query", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.query.text": "SELECT * FROM c WHERE c.price = ? AND c.name = '?'", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 2.83, "db.cosmosdb.request_diagnostics_id": "a2b78f9d-1c45-4e67-9abc-d3f45e67a890" }, "_comment": "Non-parameterized query - SANITIZED. Original values (999.99, 'Laptop') replaced with ? for security." }, { "name": "ContainerProxy.upsert_item", "attributes": { "db.system": "cosmosdb", "db.operation.name": "upsert_item", "db.cosmosdb.operation_type": "upsert", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 6.67, "db.cosmosdb.request_diagnostics_id": "e15a333f-987f-4182-948f-87ec5ad5059c" } }, { "name": "ContainerProxy.replace_item", "attributes": { "db.system": "cosmosdb", "db.operation.name": "replace_item", "db.cosmosdb.operation_type": "replace", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 10.67, "db.cosmosdb.request_diagnostics_id": "3512dd7c-4d72-4471-8267-48f35df73aec" } }, { "name": "ContainerProxy.delete_item", "attributes": { "db.system": "cosmosdb", "db.operation.name": "delete_item", "db.cosmosdb.operation_type": "delete", "db.namespace": "MyDatabase", "db.collection.name": "MyContainer", "db.cosmosdb.connection_mode": "gateway", "db.cosmosdb.client_id": "b244ce26-c517-47e3-9837-ed7f73309991", "db.cosmosdb.request_charge": 4.29 } }, { "name": "ContainerProxy.read_item", "status": "ERROR", "attributes": { "db.system": "cosmosdb", "db.cosmosdb.status_code": 404, "db.cosmosdb.request_charge": 1.0, "db.cosmosdb.request_diagnostics_id": "068762c2-16d3-4bd8-9b27-e9570ed29324", "error.type": "azure.cosmos.exceptions.CosmosResourceNotFoundError" }, "_comment": "Error case - db.system attribute is present even when operation fails (404 Not Found)." } ]