Skip to content

feat: OTEL metrics pipeline for task workers#3061

Draft
ericallam wants to merge 13 commits intomainfrom
feat/otel-metrics
Draft

feat: OTEL metrics pipeline for task workers#3061
ericallam wants to merge 13 commits intomainfrom
feat/otel-metrics

Conversation

@ericallam
Copy link
Member

@ericallam ericallam commented Feb 14, 2026

  • Adds an end-to-end OTEL metrics pipeline: task workers collect and export metrics via OpenTelemetry, the webapp ingests them into ClickHouse, and they're queryable through the existing dashboard query engine
  • Workers emit process CPU/memory metrics (via @opentelemetry/host-metrics) and Node.js runtime metrics (event loop utilization, event loop delay, heap usage)
  • Users can create custom metrics in their tasks via otel.metrics.getMeter() from @trigger.dev/sdk
  • Metrics are automatically tagged with run context (run ID, task slug, machine, worker version) so they can be sliced per-run, per-task, or per-machine
  • The TSQL query engine gains metrics table support with typed attribute columns, prettyFormat() for human-readable values, and per-schema time bucket thresholds
  • Includes reference tasks (references/hello-world/src/trigger/metrics.ts) demonstrating CPU-intensive, memory-ramp, bursty workload, and custom metrics patterns

What changed

Metrics collection (packages/core, packages/cli-v3)

  • Metrics export pipelineTracingSDK now sets up a MeterProvider with a PeriodicExportingMetricReader that chains through TaskContextMetricExporter (adds run context attributes) and BufferingMetricExporter (batches exports to reduce overhead)
  • Host metrics — Enabled @opentelemetry/host-metrics for process CPU, memory, and system-level metrics
  • Node.js runtime metrics — New nodejsRuntimeMetrics.ts module using performance.eventLoopUtilization(), monitorEventLoopDelay(), and process.memoryUsage() to emit 6 observable gauges
  • Custom metrics — Exposed otel.metrics from @trigger.dev/sdk so users can create counters, histograms, and gauges in their tasks
  • Machine ID — Stable per-worker machine identifier for grouping metrics
  • Dev worker — Drops system.* metrics to reduce noise, keeps sending metrics between runs in warm workers

Metrics ingestion (apps/webapp)

  • OTEL endpointotel.v1.metrics.ts accepts OTEL metric export requests (JSON and protobuf), converts to ClickHouse rows
  • ClickHouse schema016_create_metrics_v1.sql with 10-second aggregation buckets, JSON attributes column, 30-day TTL, and materialized views for 1m/5m rollups

Query engine (internal-packages/tsql, apps/webapp)

  • Metrics query schema — Typed columns for metric attributes (task_identifier, run_id, machine_name, worker_version, etc.) extracted from the JSON attributes column
  • prettyFormat() — TSQL function that annotates columns with format hints (bytes, percent, durationSeconds) for frontend rendering without changing the underlying data
  • Per-schema time buckets — Different tables can define their own time bucket thresholds (metrics uses tighter intervals than runs)
  • AI query integration — The AI query service knows about the metrics table and can generate metric queries
  • Chart improvements — Better formatting for byte values, percentages, and durations in charts and tables

Reference project

  • references/hello-world/src/trigger/metrics.ts — 6 example tasks: cpu-intensive, memory-ramp, bursty-workload, sustained-workload, concurrent-load, custom-metrics

Test plan

  • Build all packages and webapp
  • Start dev worker with hello-world reference project
  • Run cpu-intensive, memory-ramp, and custom-metrics tasks
  • Verify metrics in ClickHouse: SELECT DISTINCT metric_name FROM metrics_v1
  • Query via dashboard AI: "show me CPU utilization over time"
  • Verify prettyFormat renders correctly in chart tooltips and table cells
  • Confirm dev worker drops system.* metrics but keeps process.* and nodejs.*

@changeset-bot
Copy link

changeset-bot bot commented Feb 14, 2026

⚠️ No Changeset found

Latest commit: aa450a6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

Walkthrough

Adds format-aware rendering for numeric columns (ColumnFormatType) with value-formatting utilities and plumbing into charts, legends, tooltips, tables, and big-number cards. Introduces per-column format metadata propagation in the tsql printer and schema, per-table timeBucketThresholds, and time-bucketing behavior. Adds a ClickHouse metrics table, ClickHouse metrics insert helpers, an OTLP metrics POST route, and OTLPImporter/OTLPExporter support to convert and store ResourceMetrics. Expands OTEL integration across SDK/CLI/worker (metric readers, exporters, host/node runtime metrics, buffering/task-context exporters), env vars, machine ID propagation, and related API surfaces.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (40 files):

⚔️ apps/webapp/app/components/code/QueryResultsChart.tsx (content)
⚔️ apps/webapp/app/components/code/TSQLResultsTable.tsx (content)
⚔️ apps/webapp/app/components/primitives/charts/BigNumberCard.tsx (content)
⚔️ apps/webapp/app/components/primitives/charts/Chart.tsx (content)
⚔️ apps/webapp/app/components/primitives/charts/ChartBar.tsx (content)
⚔️ apps/webapp/app/components/primitives/charts/ChartLegendCompound.tsx (content)
⚔️ apps/webapp/app/components/primitives/charts/ChartLine.tsx (content)
⚔️ apps/webapp/app/components/primitives/charts/ChartRoot.tsx (content)
⚔️ apps/webapp/app/components/runs/v3/SharedFilters.tsx (content)
⚔️ apps/webapp/app/env.server.ts (content)
⚔️ apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.concurrency/route.tsx (content)
⚔️ apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.query/AITabContent.tsx (content)
⚔️ apps/webapp/app/services/queryService.server.ts (content)
⚔️ apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts (content)
⚔️ apps/webapp/app/v3/otlpExporter.server.ts (content)
⚔️ apps/webapp/app/v3/querySchemas.ts (content)
⚔️ apps/webapp/app/v3/services/aiQueryService.server.ts (content)
⚔️ internal-packages/clickhouse/src/index.ts (content)
⚔️ internal-packages/otlp-importer/src/index.ts (content)
⚔️ internal-packages/tsql/src/index.ts (content)
⚔️ internal-packages/tsql/src/query/printer.test.ts (content)
⚔️ internal-packages/tsql/src/query/printer.ts (content)
⚔️ internal-packages/tsql/src/query/schema.ts (content)
⚔️ internal-packages/tsql/src/query/time_buckets.ts (content)
⚔️ packages/cli-v3/src/dev/taskRunProcessPool.ts (content)
⚔️ packages/cli-v3/src/entryPoints/dev-run-worker.ts (content)
⚔️ packages/cli-v3/src/entryPoints/managed-run-worker.ts (content)
⚔️ packages/cli-v3/src/entryPoints/managed/taskRunProcessProvider.ts (content)
⚔️ packages/cli-v3/src/executions/taskRunProcess.ts (content)
⚔️ packages/core/package.json (content)
⚔️ packages/core/src/v3/config.ts (content)
⚔️ packages/core/src/v3/index.ts (content)
⚔️ packages/core/src/v3/otel/tracingSDK.ts (content)
⚔️ packages/core/src/v3/schemas/messages.ts (content)
⚔️ packages/core/src/v3/semanticInternalAttributes.ts (content)
⚔️ packages/core/src/v3/taskContext/index.ts (content)
⚔️ packages/core/src/v3/taskContext/otelProcessors.ts (content)
⚔️ packages/core/src/v3/workers/index.ts (content)
⚔️ packages/trigger-sdk/src/v3/otel.ts (content)
⚔️ pnpm-lock.yaml (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: OTEL metrics pipeline for task workers' accurately and concisely summarizes the main change: adding an end-to-end OpenTelemetry metrics pipeline for collecting, exporting, and querying metrics from task workers.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering what was changed (metrics collection, ingestion, query engine, reference project), including detailed subsections with implementation specifics. However, the description does not include completion of the checklist items or explicit test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/otel-metrics
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feat/otel-metrics
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
packages/core/src/v3/otel/tracingSDK.ts (2)

282-315: Misleading comment and confusing interval semantics.

Line 300 states PeriodicReader(10s) but with defaults (collectionIntervalMs=10000, exportTimeoutMillis=30000), Math.max(10000, 30000) yields 30s, not 10s. The comment is inaccurate.

Additionally, the variable collectionIntervalMs (sourced from TRIGGER_OTEL_METRICS_COLLECTION_INTERVAL_MILLIS) is semantically misleading — PeriodicExportingMetricReader.exportIntervalMillis controls both collection and export. Meanwhile, exportIntervalMs (from TRIGGER_OTEL_METRICS_EXPORT_INTERVAL_MILLIS) is only used for the BufferingMetricExporter, not the reader itself. This naming inversion will confuse future maintainers.

Suggested clarification
-    // Chain: PeriodicReader(10s) → TaskContextMetricExporter → BufferingMetricExporter(30s) → OTLP
+    // Chain: PeriodicReader → TaskContextMetricExporter → BufferingMetricExporter → OTLP
+    // PeriodicReader interval = max(collectionInterval, exportTimeout) to satisfy SDK constraints

Consider renaming collectionIntervalMsreaderIntervalMs and exportIntervalMsbufferFlushIntervalMs to reflect actual usage.


342-356: Promise.all swallows sibling flush/shutdown errors on first rejection.

If any one of the three providers rejects, the other two results are discarded. Consider Promise.allSettled to ensure all providers attempt to flush/shutdown, then aggregate errors.

Example with allSettled
   public async flush() {
-    await Promise.all([
+    const results = await Promise.allSettled([
       this._traceProvider.forceFlush(),
       this._logProvider.forceFlush(),
       this._meterProvider.forceFlush(),
     ]);
+    const errors = results.filter((r) => r.status === "rejected").map((r) => r.reason);
+    if (errors.length) throw new AggregateError(errors, "flush failed");
   }
📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82a6a23 and aa450a6.

📒 Files selected for processing (1)
  • packages/core/src/v3/otel/tracingSDK.ts
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

  • packages/core/src/v3/otel/tracingSDK.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • packages/core/src/v3/otel/tracingSDK.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

  • packages/core/src/v3/otel/tracingSDK.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/core/src/v3/otel/tracingSDK.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • packages/core/src/v3/otel/tracingSDK.ts
{packages,integrations}/**/*

📄 CodeRabbit inference engine (CLAUDE.md)

Add a changeset when modifying any public package in packages/* or integrations/* using pnpm run changeset:add

Files:

  • packages/core/src/v3/otel/tracingSDK.ts
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Configure OpenTelemetry instrumentations and exporters in trigger.config.ts for enhanced logging
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Configure OpenTelemetry instrumentations and exporters in trigger.config.ts for enhanced logging

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2026-01-08T15:57:09.323Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2026-01-08T15:57:09.323Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/otel-metrics.mdc:0-0
Timestamp: 2026-01-08T15:57:09.323Z
Learning: Applies to **/*.ts : Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use logger methods (debug, log, info, warn, error) from `trigger.dev/sdk/v3` for structured logging in tasks

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Export tasks with unique IDs within the project to enable proper task discovery and execution

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Specify machine resources using the `machine` property with preset options like 'large-1x'

Applied to files:

  • packages/core/src/v3/otel/tracingSDK.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (4)
packages/core/src/v3/otel/tracingSDK.ts (4)

552-552: LGTM — correctly reads both the legacy "http.url" and the newer "url.full" attribute with a fallback chain.


77-92: Config type looks good.

New optional fields for metrics are well-typed, documented with JSDoc where helpful, and use type per guidelines.


120-134: Resource attributes updated to string literals — appropriate for semantic-conventions 1.36.0+.

Note that MACHINE_ID is added at the resource level (not as a per-datapoint metric attribute), which is the correct OTEL pattern for identifying telemetry sources. Based on learnings: "Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs" — this applies to metric attributes, not resource attributes, so this is compliant.


317-324: No issue found. The { type: AggregationType.DROP } syntax is the correct and officially documented API for configuring views in @opentelemetry/sdk-metrics@2.0.1. The code is valid.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal-packages/tsql/src/query/schema.ts (1)

306-330: ⚠️ Potential issue | 🟡 Minor

Prefer a type alias for OutputColumnMetadata.
Since this interface is being extended, converting it to a type keeps the file aligned with the TS guideline.

Suggested change
-export interface OutputColumnMetadata {
+export type OutputColumnMetadata = {
   /** Column name in the result set (after AS aliasing) */
   name: string;
   /** ClickHouse data type (from schema or inferred for computed expressions) */
   type: ClickHouseType;
   ...
-}
+};
As per coding guidelines, `**/*.{ts,tsx}`: Use types over interfaces for TypeScript.
🤖 Fix all issues with AI agents
In `@apps/webapp/app/components/code/QueryResultsChart.tsx`:
- Around line 1111-1127: The bytes formatter block in QueryResultsChart.tsx can
mis-handle maxVal when it's -Infinity; update the formatter to guard against
non-finite maxVal by computing a safeMax (e.g., const safeMax =
Number.isFinite(maxVal) ? Math.abs(maxVal) : 1) and use safeMax in the
log/division math instead of Math.abs(maxVal || 1), keeping the existing value
=== 0 early return and the rest of the scaling logic (divisor, units, i
calculation, scaled, return string) unchanged.

In `@apps/webapp/app/components/code/TSQLResultsTable.tsx`:
- Around line 518-542: The current format-hint block is skipped whenever
column.customRenderType is set and also only handles
bytes/decimalBytes/percent/quantity, causing prettyFormat hints like "duration"
or "cost" (and any overrides on columns with customRenderType) to be ignored;
update the conditional to always check column.format regardless of
column.customRenderType and extend the switch in the format handling (the block
that checks column.format) to include "duration" and "cost" cases (using the
existing formatBytes/formatDecimalBytes/formatQuantity helpers as a model—call
the appropriate formatDuration/formatCost utilities or create them if missing)
so that prettyFormat hints are applied consistently.

In `@apps/webapp/app/routes/otel.v1.metrics.ts`:
- Around line 27-29: The response returned from
ExportMetricsServiceResponse.encode(exportResponse).finish() is missing the
Content-Type header; update the Response construction in the export handler (the
return new Response(...) line) to include headers: { "Content-Type":
"application/x-protobuf" } so clients can correctly interpret the protobuf
payload.
- Around line 13-17: The route currently casts the request body unsafely with
"body as ExportMetricsServiceRequest" before calling otlpExporter.exportMetrics,
which bypasses validation; replace this with a zod schema for the
ExportMetricsServiceRequest shape, parse/validate the raw JSON from
request.json() (throwing or returning a 4xx on parse failure), and pass the
validated/typed result to otlpExporter.exportMetrics so that
ExportMetricsServiceRequest, otlpExporter.exportMetrics, and request.json are
used with proper runtime validation rather than an unsafe cast.

In `@apps/webapp/app/services/queryService.server.ts`:
- Around line 155-160: The code currently picks a single time column using a
regex search over options.query which breaks for multi-table queries; update the
logic that computes timeColumn to (1) scan options.query for all referenced
table names from querySchemas (use word-boundary matches and avoid matching
inside string literals/comments by stripping strings/comments or, better, using
a lightweight SQL tokenizer/parser) and map each matched schema to its
timeConstraint, then (2) if the matched set of timeConstraint values contains
more than one distinct column, either reject the query with a clear error
(preferred) or implement per-table constraint enforcement in executeTSQL;
replace the single matchedSchema/timeColumn assignment with this multi-table
detection and add an explicit error path referencing querySchemas,
timeConstraint, options.query, timeColumn, and executeTSQL so callers get a
clear message when conflicting time columns are present.

In `@apps/webapp/app/utils/columnFormat.ts`:
- Around line 8-28: The bug is that for 0 < |bytes| < 1 the computed index i
becomes negative and selects units[-1]; in both formatBytes and
formatDecimalBytes clamp the computed index to be at least 0 and at most
units.length - 1 (e.g., compute rawIndex = Math.floor(Math.log(Math.abs(bytes))
/ Math.log(1024 or 1000)) then set i = Math.max(0, Math.min(rawIndex,
units.length - 1))). Update the calculations in the functions formatBytes and
formatDecimalBytes to use the clamped i before dividing and formatting.

In `@apps/webapp/app/v3/otlpExporter.server.ts`:
- Around line 490-512: The current value selection conflates a real 0 with
protobuf3 defaults by comparing dp.asDouble !== 0 and dp.asInt !== BigInt(0);
instead, detect which value field is actually set (use the protobuf oneof
discriminator such as dp.valueCase / dp.value?.case if available, or check
presence via hasOwnProperty/undefined checks for dp.asDouble vs dp.asInt) and
then choose that field (prefer dp.asDouble when present, even if 0, otherwise
use Number(dp.asInt)). Update the gauge handling where metric.gauge is iterated
(the dp value extraction before rows.push, referenced by metric.gauge,
dp.asDouble, dp.asInt, resolveDataPointContext, floorToTenSecondBucket) to use
the presence/oneof check and proper BigInt -> Number conversion.

In `@apps/webapp/app/v3/querySchemas.ts`:
- Around line 557-563: The attempt_number column is incorrectly defined as a
String which prevents numeric operations; update the attempt_number definition
to use a numeric ClickHouse type (e.g., change column("String", {...}) to
column("Int64", {...})), update the example to a numeric value (e.g., 1) and
keep the expression "attributes.trigger.attempt_number" and description
unchanged so the schema matches the metrics_v1 JSON typing and supports numeric
sorting/operations.

In `@apps/webapp/app/v3/services/aiQueryService.server.ts`:
- Line 58: Update the model string used to initialize the LanguageModelV1
instance to a currently supported Codex mini model: replace the literal
"codex-mini-latest" in the class field declaration (private readonly model:
LanguageModelV1 = openai("codex-mini-latest")) with a supported model name such
as "gpt-5-codex-mini" or "gpt-5.1-codex-mini" so the openai(...) call uses an
available model.

In `@internal-packages/tsql/src/query/time_buckets.ts`:
- Around line 20-27: Replace the exported interface BucketThreshold with a type
alias: change the declaration "export interface BucketThreshold {
maxRangeSeconds: number; interval: TimeBucketInterval; }" to "export type
BucketThreshold = { maxRangeSeconds: number; interval: TimeBucketInterval; }",
preserving the JSDoc comments and export so all usages of BucketThreshold
continue to work unchanged; no other logic changes are necessary.

In `@packages/core/package.json`:
- Around line 179-185: The package.json change updates public package
`@trigger.dev/core` but lacks a changeset; run pnpm run changeset:add to create a
changeset documenting the dependency updates to
`@opentelemetry/exporter-metrics-otlp-http`,
`@opentelemetry/exporter-trace-otlp-http`, `@opentelemetry/instrumentation`,
`@opentelemetry/sdk-logs`, `@opentelemetry/sdk-metrics` (and any related
`@opentelemetry` entries), fill in the change type/summary, save and commit the
generated changeset file alongside your package.json changes so the release
tooling will pick up the modification to `@trigger.dev/core`.
- Around line 179-185: Update the `@opentelemetry/host-metrics` dependency to a
0.200.x (unstable 2.0 line) release pinned to an exact version to match the
`@opentelemetry/sdk-metrics`@2.0.1 compatibility; specifically replace
"@opentelemetry/host-metrics": "^0.36.0" with an exact 0.200.x version (e.g.,
"0.200.0") in package.json so the host-metrics package line aligns with the SDK
2.0.x packages, or alternatively revert `@opentelemetry/sdk-metrics` back to the
1.10.x line if you prefer to keep host-metrics@0.36.0.

In `@packages/core/src/v3/index.ts`:
- Line 52: You added new public exports generateFriendlyId and fromFriendlyId
(export line in index.ts) which expands the package API; create and commit a
changeset so release tooling and changelogs capture this public change by
running the repository's changeset flow (e.g., run pnpm run changeset:add),
select the appropriate version bump and provide a brief summary mentioning
generateFriendlyId and fromFriendlyId, then commit the generated changeset file
alongside your export change.

In `@packages/core/src/v3/taskContext/index.ts`:
- Around line 105-112: The apiClient header injection currently checks only
taskContext.isInsideTask and therefore still injects worker headers when a run
is disabled; update the conditional in the api client (around the header
injection logic that references taskContext.isInsideTask) to also require
taskContext.isRunDisabled === false (e.g., if (taskContext.isInsideTask &&
!taskContext.isRunDisabled) { ... }), ensuring disable() (which sets
_runDisabled via disable()) prevents injection; verify related symbols:
disable(), setGlobalTaskContext(), isInsideTask, isRunDisabled, and the header
keys "x-trigger-worker"/"x-trigger-engine-version" are guarded by the new
condition.

In `@packages/core/src/v3/taskContext/otelProcessors.ts`:
- Around line 143-173: The code is injecting high-cardinality attributes into
metric data via contextAttrs (in the block using taskContext.isRunDisabled and
the else branch) which adds IDs and free-form values; remove ENVIRONMENT_ID,
ORGANIZATION_ID, PROJECT_ID, RUN_ID, WORKER_ID, TASK_SLUG, ATTEMPT_NUMBER, and
RUN_TAGS from the per-data-point metric attributes set on contextAttrs and
instead either: 1) move any needed identifiers to the Resource once (use the
Resource API when initializing telemetry) for correlation, or 2) drop them
entirely and only keep low-cardinality attributes (ENVIRONMENT_TYPE and
MACHINE_PRESET_NAME) in contextAttrs; update the logic that conditionally adds
RUN_TAGS and worker fields (the if (!taskContext.isRunDisabled &&
ctx.run.tags?.length) and the if (taskContext.worker) blocks) to avoid attaching
high-cardinality values to metrics.
🧹 Nitpick comments (14)
packages/core/src/v3/taskContext/otelProcessors.ts (3)

224-235: Buffer grows without an upper bound between flush intervals.

If the flush interval is long or export calls are frequent, _buffer accumulates indefinitely until the next time-based flush. Consider adding a max buffer size cap that triggers an early flush to prevent unbounded memory growth.

💡 Proposed safeguard
+  private static readonly MAX_BUFFER_SIZE = 100;
+
   export(metrics: ResourceMetrics, resultCallback: (result: ExportResult) => void): void {
     this._buffer.push(metrics);
 
     const now = Date.now();
-    if (now - this._lastFlushTime >= this._flushIntervalMs) {
+    if (now - this._lastFlushTime >= this._flushIntervalMs || this._buffer.length >= BufferingMetricExporter.MAX_BUFFER_SIZE) {
       this._lastFlushTime = now;
       const merged = this._mergeBuffer();
       this._innerExporter.export(merged, resultCallback);

258-306: Merge uses only the first batch's resource and keys scopes by name alone.

  1. If different ResourceMetrics entries carry different resource objects (e.g., after a resource attribute update), all but the first are silently dropped (Line 300).
  2. Scopes are keyed solely by scope.name (Line 273), so two scopes with the same name but different version or schemaUrl would be merged incorrectly.

These are edge cases but worth a defensive check or at least a comment documenting the assumption.

💡 Use composite scope key
-        const scopeKey = sm.scope.name;
+        const scopeKey = `${sm.scope.name}@${sm.scope.version ?? ""}`;

118-130: Duplicated delegation boilerplate across both exporters.

Both TaskContextMetricExporter and BufferingMetricExporter have identical constructor logic for forwarding selectAggregationTemporality and selectAggregation. Consider extracting a small helper or base class to DRY this up if more decorating exporters are planned.

Also applies to: 204-222

apps/webapp/app/v3/services/aiQueryService.server.ts (2)

377-382: Consider extracting the shared prompt sections into helper methods to reduce duplication.

The "Choosing the Right Table" block, syntax guide, and common patterns are nearly identical between buildSystemPrompt and buildEditSystemPrompt. Extracting shared sections (e.g., buildTableSelectionGuide(), buildSyntaxGuide()) would make future prompt updates less error-prone.

Also applies to: 525-529


589-594: Edit-mode prompt has fewer metrics examples than new-mode.

The new-mode prompt includes detailed example queries for CPU and memory metrics (lines 465–483) and the full list of prettyFormat types (line 461), while the edit-mode prompt omits these. If the user asks to edit a query to add metrics columns or switch tables, the AI will have less guidance. Consider whether this asymmetry is intentional.

packages/cli-v3/src/dev/taskRunProcessPool.ts (2)

44-44: Silence the Biome warning by using a block body in the forEach callback.

The arrow shorthand implicitly returns the result of clearIdleTimer, which Biome flags as useIterableCallbackReturn. Use a block body to make the intent explicit.

♻️ Proposed fix
-    processesToKill.forEach((process) => this.clearIdleTimer(process));
+    processesToKill.forEach((process) => {
+      this.clearIdleTimer(process);
+    });

226-243: Unhandled promise from killProcess in the idle timeout callback.

this.killProcess(process) returns a Promise that is neither awaited nor .catch()-ed. While killProcess has internal error handling, any unexpected rejection would become an unhandled promise rejection. Consider adding a .catch():

🛡️ Proposed fix
-      this.killProcess(process);
+      this.killProcess(process).catch((err) => {
+        logger.debug("[TaskRunProcessPool] Error killing idle process", { err });
+      });
apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts (1)

966-976: TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS reuses the export interval value.

Both in dev (here) and prod (line 1124), the export timeout is set to the same value as the export interval. This means a slow export could consume the entire interval window before the next export fires. If this is intentional as a "timeout should never exceed interval" heuristic, a brief comment would help future readers. Otherwise, consider exposing a separate DEV_OTEL_METRICS_EXPORT_TIMEOUT_MILLIS env var.

internal-packages/clickhouse/schema/016_create_metrics_v1.sql (1)

1-37: Confirm whether this should follow the raw_ input-table naming convention.
If trigger_dev.metrics_v1 is the ingest/raw metrics table, consider renaming to trigger_dev.raw_metrics_v1 (or adding a raw table + MV) to match the repo’s naming rule.
As per coding guidelines, internal-packages/clickhouse/schema/**/*.sql: Follow ClickHouse naming conventions: raw_ prefix for input tables, _v1, _v2 suffixes for versioning, _mv_v1 suffix for materialized views.

internal-packages/tsql/src/query/printer.ts (1)

944-988: Consider centralizing the allowed prettyFormat types.
The hard‑coded validFormats list can drift from ColumnFormatType/formatters; a shared exported const would keep things in sync.

apps/webapp/app/components/primitives/charts/ChartBar.tsx (1)

170-188: tooltipValueFormatter is silently ignored when no tooltipLabelFormatter is provided.

In the bar chart, value formatting only takes effect when tooltipLabelFormatter is supplied (line 175), because otherwise the ZoomTooltip component is rendered. This differs from ChartLine.tsx where the formatter is always passed to ChartTooltipContent. If this is intentional (zoom tooltip has its own format), no action needed — but worth confirming this asymmetry is desired.

apps/webapp/app/routes/otel.v1.metrics.ts (1)

36-40: Consider using a structured logger instead of console.error.

Other parts of the webapp use a structured logger for better observability. Also, logging the raw error object might inadvertently leak sensitive data in production logs.

apps/webapp/app/v3/otlpExporter.server.ts (1)

575-582: Use type instead of interface per coding guidelines.

As per coding guidelines: "Use types over interfaces for TypeScript."

Suggested fix
-interface ResourceContext {
-  taskSlug: string | undefined;
-  runId: string | undefined;
-  attemptNumber: number | undefined;
-  machineId: string | undefined;
-  workerId: string | undefined;
-  workerVersion: string | undefined;
-}
+type ResourceContext = {
+  taskSlug: string | undefined;
+  runId: string | undefined;
+  attemptNumber: number | undefined;
+  machineId: string | undefined;
+  workerId: string | undefined;
+  workerVersion: string | undefined;
+};
packages/core/src/v3/otel/tracingSDK.ts (1)

340-354: flush() and shutdown() correctly include the meter provider.

Using Promise.all for parallel shutdown/flush of all three providers is appropriate. One consideration: if one provider's shutdown rejects, Promise.all will reject immediately, potentially leaving other providers in an inconsistent state. Consider Promise.allSettled for shutdown() to ensure all providers get a chance to clean up.

Optional: use allSettled for more resilient shutdown
  public async shutdown() {
-    await Promise.all([
+    await Promise.allSettled([
       this._traceProvider.shutdown(),
       this._logProvider.shutdown(),
       this._meterProvider.shutdown(),
     ]);
   }

Comment on lines +1111 to +1127
if (format === "bytes" || format === "decimalBytes") {
const divisor = format === "bytes" ? 1024 : 1000;
const units =
format === "bytes"
? ["B", "KiB", "MiB", "GiB", "TiB"]
: ["B", "KB", "MB", "GB", "TB"];
return (value: number): string => {
if (value === 0) return "0 B";
// Use consistent unit for all ticks based on max value
const i = Math.min(
Math.floor(Math.log(Math.abs(maxVal || 1)) / Math.log(divisor)),
units.length - 1
);
const scaled = value / Math.pow(divisor, i);
return `${scaled.toFixed(scaled < 10 ? 1 : 0)} ${units[i]}`;
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edge case: maxVal could be -Infinity when the formatter is invoked with bytes format.

If maxVal remains -Infinity (e.g., all values are null/non-numeric), then Math.abs(maxVal || 1) evaluates to Math.abs(-Infinity) = Infinity because -Infinity is truthy. This makes i clamp to the last unit index, so a non-zero value would be formatted as e.g. "0.0 TiB" instead of "0 B". In practice, the early returns at lines 977–983 prevent rendering an empty chart, so this is unlikely to trigger. But a small guard would make the formatter more robust.

Proposed defensive fix
     return (value: number): string => {
       if (value === 0) return "0 B";
       // Use consistent unit for all ticks based on max value
+      const refValue = isFinite(maxVal) ? Math.abs(maxVal) : Math.abs(value);
       const i = Math.min(
-        Math.floor(Math.log(Math.abs(maxVal || 1)) / Math.log(divisor)),
+        Math.floor(Math.log(refValue || 1) / Math.log(divisor)),
         units.length - 1
       );
🤖 Prompt for AI Agents
In `@apps/webapp/app/components/code/QueryResultsChart.tsx` around lines 1111 -
1127, The bytes formatter block in QueryResultsChart.tsx can mis-handle maxVal
when it's -Infinity; update the formatter to guard against non-finite maxVal by
computing a safeMax (e.g., const safeMax = Number.isFinite(maxVal) ?
Math.abs(maxVal) : 1) and use safeMax in the log/division math instead of
Math.abs(maxVal || 1), keeping the existing value === 0 early return and the
rest of the scaling logic (divisor, units, i calculation, scaled, return string)
unchanged.

Comment on lines +518 to +542
// Check format hint for new format types (from prettyFormat())
if (column.format && !column.customRenderType) {
switch (column.format) {
case "bytes":
if (typeof value === "number") {
return <span className="tabular-nums">{formatBytes(value)}</span>;
}
break;
case "decimalBytes":
if (typeof value === "number") {
return <span className="tabular-nums">{formatDecimalBytes(value)}</span>;
}
break;
case "percent":
if (typeof value === "number") {
return <span className="tabular-nums">{value.toFixed(2)}%</span>;
}
break;
case "quantity":
if (typeof value === "number") {
return <span className="tabular-nums">{formatQuantity(value)}</span>;
}
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

prettyFormat hints are skipped when customRenderType exists (and duration/cost formats are ignored).
The format-hint block only runs when !column.customRenderType and only handles bytes/decimal/percent/quantity. That means prettyFormat(..., "duration" | "cost") or overrides on columns with customRenderType won’t render as intended. Prefer column.format and handle all supported formats.

Suggested change
-  if (column.format && !column.customRenderType) {
+  if (column.format) {
     switch (column.format) {
       case "bytes":
         if (typeof value === "number") {
           return <span className="tabular-nums">{formatBytes(value)}</span>;
         }
         break;
       case "decimalBytes":
         if (typeof value === "number") {
           return <span className="tabular-nums">{formatDecimalBytes(value)}</span>;
         }
         break;
       case "percent":
         if (typeof value === "number") {
           return <span className="tabular-nums">{value.toFixed(2)}%</span>;
         }
         break;
       case "quantity":
         if (typeof value === "number") {
           return <span className="tabular-nums">{formatQuantity(value)}</span>;
         }
         break;
+      case "duration":
+        if (typeof value === "number") {
+          return (
+            <span className="tabular-nums">
+              {formatDurationMilliseconds(value, { style: "short" })}
+            </span>
+          );
+        }
+        break;
+      case "durationSeconds":
+        if (typeof value === "number") {
+          return (
+            <span className="tabular-nums">
+              {formatDurationMilliseconds(value * 1000, { style: "short" })}
+            </span>
+          );
+        }
+        break;
+      case "costInDollars":
+        if (typeof value === "number") {
+          return <span className="tabular-nums">{formatCurrencyAccurate(value)}</span>;
+        }
+        break;
+      case "cost":
+        if (typeof value === "number") {
+          return <span className="tabular-nums">{formatCurrencyAccurate(value / 100)}</span>;
+        }
+        break;
     }
   }
🤖 Prompt for AI Agents
In `@apps/webapp/app/components/code/TSQLResultsTable.tsx` around lines 518 - 542,
The current format-hint block is skipped whenever column.customRenderType is set
and also only handles bytes/decimalBytes/percent/quantity, causing prettyFormat
hints like "duration" or "cost" (and any overrides on columns with
customRenderType) to be ignored; update the conditional to always check
column.format regardless of column.customRenderType and extend the switch in the
format handling (the block that checks column.format) to include "duration" and
"cost" cases (using the existing formatBytes/formatDecimalBytes/formatQuantity
helpers as a model—call the appropriate formatDuration/formatCost utilities or
create them if missing) so that prettyFormat hints are applied consistently.

Comment on lines +13 to +17
const body = await request.json();

const exportResponse = await otlpExporter.exportMetrics(
body as ExportMetricsServiceRequest
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual file
head -50 apps/webapp/app/routes/otel.v1.metrics.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1426


🏁 Script executed:

# Search for ExportMetricsServiceRequest to understand where it comes from
rg 'ExportMetricsServiceRequest' -n --type ts apps/webapp/app/routes/otel.v1.metrics.ts

Repository: triggerdotdev/trigger.dev

Length of output: 239


🏁 Script executed:

# Check what ExportMetricsServiceRequest is imported from
rg 'import.*ExportMetricsServiceRequest' -n --type ts

Repository: triggerdotdev/trigger.dev

Length of output: 51


🏁 Script executed:

# Check if ExportMetricsServiceRequest has a fromJSON method in the codebase
rg -B5 -A10 'fromJSON' --type ts | head -100

Repository: triggerdotdev/trigger.dev

Length of output: 11593


🏁 Script executed:

# Search for ExportMetricsServiceRequest definition and its fromJSON method
fd 'metrics' --type f -e ts internal-packages/otlp-importer/src/generated/ | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 262


🏁 Script executed:

# Look specifically for ExportMetricsServiceRequest
rg 'ExportMetricsServiceRequest' internal-packages/otlp-importer/src/generated -A 30 | grep -E '(fromJSON|class|interface|export)'

Repository: triggerdotdev/trigger.dev

Length of output: 2544


🏁 Script executed:

# Check how zod is used in the webapp for validation of API inputs
rg 'zod|z\.parse|z\.validate' apps/webapp/app/routes -l --type ts | head -10

Repository: triggerdotdev/trigger.dev

Length of output: 758


🏁 Script executed:

# Look at a specific example of zod validation in webapp routes
head -80 apps/webapp/app/routes/admin.api.v1.environments.\$environmentId.ts

Repository: triggerdotdev/trigger.dev

Length of output: 2326


🏁 Script executed:

# Check if there's a pattern for API body validation using zod in the file or nearby
rg -A5 'z\.parse|request\.json()' apps/webapp/app/routes/admin.api.v1.environments.\$environmentId.ts --type ts

Repository: triggerdotdev/trigger.dev

Length of output: 272


🏁 Script executed:

# Let's examine the full ExportMetricsServiceRequest.fromJSON implementation
rg -A 20 'fromJSON\(object: any\): ExportMetricsServiceRequest' internal-packages/otlp-importer/src/generated/opentelemetry/proto/collector/metrics/v1/metrics_service.ts

Repository: triggerdotdev/trigger.dev

Length of output: 928


Use zod validation for the JSON body instead of an unsafe cast.

The unvalidated cast body as ExportMetricsServiceRequest bypasses type safety. Per the coding guidelines, zod should be used for validation in apps/webapp. Define a validation schema for the request and parse the body against it:

      const body = await request.json();
+     const exportRequest = ExportMetricsServiceRequest.fromJSON(body);
+
      const exportResponse = await otlpExporter.exportMetrics(
-       body as ExportMetricsServiceRequest
+       exportRequest
      );

Note: While ExportMetricsServiceRequest.fromJSON() is available in the generated protobuf code, consider using a zod schema instead to align with project validation standards in the webapp.

🤖 Prompt for AI Agents
In `@apps/webapp/app/routes/otel.v1.metrics.ts` around lines 13 - 17, The route
currently casts the request body unsafely with "body as
ExportMetricsServiceRequest" before calling otlpExporter.exportMetrics, which
bypasses validation; replace this with a zod schema for the
ExportMetricsServiceRequest shape, parse/validate the raw JSON from
request.json() (throwing or returning a 4xx on parse failure), and pass the
validated/typed result to otlpExporter.exportMetrics so that
ExportMetricsServiceRequest, otlpExporter.exportMetrics, and request.json are
used with proper runtime validation rather than an unsafe cast.

Comment on lines +27 to +29
return new Response(ExportMetricsServiceResponse.encode(exportResponse).finish(), {
status: 200,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing Content-Type header on protobuf response.

The protobuf response should include Content-Type: application/x-protobuf so clients can correctly interpret the response format.

Suggested fix
       return new Response(ExportMetricsServiceResponse.encode(exportResponse).finish(), {
         status: 200,
+        headers: { "Content-Type": "application/x-protobuf" },
       });
🤖 Prompt for AI Agents
In `@apps/webapp/app/routes/otel.v1.metrics.ts` around lines 27 - 29, The response
returned from ExportMetricsServiceResponse.encode(exportResponse).finish() is
missing the Content-Type header; update the Response construction in the export
handler (the return new Response(...) line) to include headers: {
"Content-Type": "application/x-protobuf" } so clients can correctly interpret
the protobuf payload.

Comment on lines +155 to +160
// Detect which table the query targets to determine the time column
// Each table schema declares its primary time column via timeConstraint
const matchedSchema = querySchemas.find((s) =>
new RegExp(`\\bFROM\\s+${s.name}\\b`, "i").test(options.query)
);
const timeColumn = matchedSchema?.timeConstraint ?? "triggered_at";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the querySchemas definition and schema declarations
rg -n "querySchemas" apps/webapp/app/services/queryService.server.ts -A 5 -B 5

Repository: triggerdotdev/trigger.dev

Length of output: 1435


🏁 Script executed:

#!/bin/bash
# Find all timeConstraint declarations across the codebase
rg -n "timeConstraint" apps/webapp/ --type=ts -C 2

Repository: triggerdotdev/trigger.dev

Length of output: 1739


🏁 Script executed:

#!/bin/bash
# Check the full context around lines 155-160 and surrounding code
sed -n '140,260p' apps/webapp/app/services/queryService.server.ts | cat -n

Repository: triggerdotdev/trigger.dev

Length of output: 5691


🏁 Script executed:

#!/bin/bash
# Search for JOIN queries in tests or examples
rg -n "JOIN" apps/webapp/ --type=ts -C 2 | head -100

Repository: triggerdotdev/trigger.dev

Length of output: 8142


🏁 Script executed:

#!/bin/bash
# Find the executeTSQL function to understand how it handles enforcedWhereClause
rg -n "executeTSQL" apps/webapp/ --type=ts -l

Repository: triggerdotdev/trigger.dev

Length of output: 116


🏁 Script executed:

#!/bin/bash
# Check if there are any documented multi-table queries or comments about JOIN support
rg -n "JOIN\|multi.table\|multiple.table" apps/webapp/app/services/queryService.server.ts -i -C 3

Repository: triggerdotdev/trigger.dev

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Look at the full querySchemas to see all available tables
head -450 apps/webapp/app/v3/querySchemas.ts | tail -50

Repository: triggerdotdev/trigger.dev

Length of output: 1780


🏁 Script executed:

#!/bin/bash
# Find the executeTSQL implementation
find apps/webapp -name "*.ts" -type f | xargs grep -l "executeTSQL" | head -10

Repository: triggerdotdev/trigger.dev

Length of output: 116


🏁 Script executed:

#!/bin/bash
# Look for the actual executeTSQL function definition
rg -n "^(export\s+)?(async\s+)?function executeTSQL|const executeTSQL" apps/webapp --type=ts -A 20

Repository: triggerdotdev/trigger.dev

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Check if executeTSQL is imported from an external package
rg -n "executeTSQL" apps/webapp/app/services/queryService.server.ts -B 30 | grep -E "import|from"

Repository: triggerdotdev/trigger.dev

Length of output: 279


🏁 Script executed:

#!/bin/bash
# Search more carefully for where executeTSQL is defined or imported
rg "executeTSQL" apps/webapp/app/services/queryService.server.ts -B 50 | head -100

Repository: triggerdotdev/trigger.dev

Length of output: 2161


🏁 Script executed:

#!/bin/bash
# Search for executeTSQL in the entire webapp to find where it's defined
fd -e ts -e tsx apps/webapp | xargs grep -l "executeTSQL"

Repository: triggerdotdev/trigger.dev

Length of output: 420


🏁 Script executed:

#!/bin/bash
# Look at imports at the top of queryService.server.ts
head -50 apps/webapp/app/services/queryService.server.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1866


🏁 Script executed:

#!/bin/bash
# Check if there are any query examples or integration tests that show multi-table queries being used
rg -n "FROM.*JOIN|JOIN.*metrics|JOIN.*runs" apps/webapp/app --type=ts -C 2

Repository: triggerdotdev/trigger.dev

Length of output: 4566


🏁 Script executed:

#!/bin/bash
# Verify the test example to see how queryService is tested with queries
rg -n "executeQuery\|queryService" apps/webapp/test --type=ts -l 2>/dev/null | head -5

Repository: triggerdotdev/trigger.dev

Length of output: 51


Multi-table queries with different time columns will fail or apply incorrect constraints.

When a query JOINs tables with different timeConstraint values (e.g., FROM runs JOIN metrics_v1), the regex-based detection at line 157–158 matches only the first table name, causing timeColumn to be set incorrectly for the other table. For instance, runs uses "triggered_at" while metrics_v1 uses "bucket_start". The enforced WHERE clause at line 206 and fallback at line 248 will then reference the wrong column name for the non-primary table, leading to query failures or silently incorrect filtering.

The regex can also match table names inside string literals or comments, though this is a lower-risk edge case.

Verify whether the downstream executeTSQL layer handles per-table constraint enforcement, or whether multi-table queries should be explicitly rejected if they involve tables with conflicting time columns.

🧰 Tools
🪛 ast-grep (0.40.5)

[warning] 157-157: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\bFROM\\s+${s.name}\\b, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
In `@apps/webapp/app/services/queryService.server.ts` around lines 155 - 160, The
code currently picks a single time column using a regex search over
options.query which breaks for multi-table queries; update the logic that
computes timeColumn to (1) scan options.query for all referenced table names
from querySchemas (use word-boundary matches and avoid matching inside string
literals/comments by stripping strings/comments or, better, using a lightweight
SQL tokenizer/parser) and map each matched schema to its timeConstraint, then
(2) if the matched set of timeConstraint values contains more than one distinct
column, either reject the query with a clear error (preferred) or implement
per-table constraint enforcement in executeTSQL; replace the single
matchedSchema/timeColumn assignment with this multi-table detection and add an
explicit error path referencing querySchemas, timeConstraint, options.query,
timeColumn, and executeTSQL so callers get a clear message when conflicting time
columns are present.

Comment on lines +20 to +27
* A threshold mapping a maximum time range duration to a bucket interval.
*/
export interface BucketThreshold {
/** Maximum range duration in seconds for this threshold to apply */
maxRangeSeconds: number;
/** The bucket interval to use when the range is under maxRangeSeconds */
interval: TimeBucketInterval;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a type alias for BucketThreshold.
The TS guideline prefers type aliases over interface for new definitions.

Suggested change
-export interface BucketThreshold {
+export type BucketThreshold = {
   /** Maximum range duration in seconds for this threshold to apply */
   maxRangeSeconds: number;
   /** The bucket interval to use when the range is under maxRangeSeconds */
   interval: TimeBucketInterval;
-}
+};
As per coding guidelines, `**/*.{ts,tsx}`: Use types over interfaces for TypeScript.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* A threshold mapping a maximum time range duration to a bucket interval.
*/
export interface BucketThreshold {
/** Maximum range duration in seconds for this threshold to apply */
maxRangeSeconds: number;
/** The bucket interval to use when the range is under maxRangeSeconds */
interval: TimeBucketInterval;
}
* A threshold mapping a maximum time range duration to a bucket interval.
*/
export type BucketThreshold = {
/** Maximum range duration in seconds for this threshold to apply */
maxRangeSeconds: number;
/** The bucket interval to use when the range is under maxRangeSeconds */
interval: TimeBucketInterval;
};
🤖 Prompt for AI Agents
In `@internal-packages/tsql/src/query/time_buckets.ts` around lines 20 - 27,
Replace the exported interface BucketThreshold with a type alias: change the
declaration "export interface BucketThreshold { maxRangeSeconds: number;
interval: TimeBucketInterval; }" to "export type BucketThreshold = {
maxRangeSeconds: number; interval: TimeBucketInterval; }", preserving the JSDoc
comments and export so all usages of BucketThreshold continue to work unchanged;
no other logic changes are necessary.

Comment on lines +179 to +185
"@opentelemetry/exporter-metrics-otlp-http": "0.203.0",
"@opentelemetry/host-metrics": "^0.36.0",
"@opentelemetry/exporter-trace-otlp-http": "0.203.0",
"@opentelemetry/instrumentation": "0.203.0",
"@opentelemetry/resources": "2.0.1",
"@opentelemetry/sdk-logs": "0.203.0",
"@opentelemetry/sdk-metrics": "2.0.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if a changeset was added in this PR for `@trigger.dev/core`
fd --type f '.' '.changeset' --exec grep -l '@trigger.dev/core' {} 2>/dev/null || echo "No changeset found referencing `@trigger.dev/core`"

Repository: triggerdotdev/trigger.dev

Length of output: 225


🏁 Script executed:

# Also check the actual package.json to verify it's a public package
cat packages/core/package.json | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 503


🏁 Script executed:

# List changesets to see what exists
ls -la .changeset/ 2>/dev/null | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 914


🏁 Script executed:

# Check if these are new dependencies or version updates by examining git diff
git diff HEAD -- packages/core/package.json | head -50

Repository: triggerdotdev/trigger.dev

Length of output: 51


🏁 Script executed:

# Also check the specific lines mentioned in the review
sed -n '179,185p' packages/core/package.json

Repository: triggerdotdev/trigger.dev

Length of output: 407


Add a changeset for modifications to this public package.

@trigger.dev/core is a public package, and modifications require a changeset. Run pnpm run changeset:add to document these dependency changes.

🤖 Prompt for AI Agents
In `@packages/core/package.json` around lines 179 - 185, The package.json change
updates public package `@trigger.dev/core` but lacks a changeset; run pnpm run
changeset:add to create a changeset documenting the dependency updates to
`@opentelemetry/exporter-metrics-otlp-http`,
`@opentelemetry/exporter-trace-otlp-http`, `@opentelemetry/instrumentation`,
`@opentelemetry/sdk-logs`, `@opentelemetry/sdk-metrics` (and any related
`@opentelemetry` entries), fill in the change type/summary, save and commit the
generated changeset file alongside your package.json changes so the release
tooling will pick up the modification to `@trigger.dev/core`.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the actual state of the file
cat -n packages/core/package.json | sed -n '175,190p'

Repository: triggerdotdev/trigger.dev

Length of output: 895


🏁 Script executed:

# Search for changesets in the repository
fd -t f 'changeset' packages/core/

Repository: triggerdotdev/trigger.dev

Length of output: 51


🌐 Web query:

@opentelemetry/host-metrics 0.36.0 peer dependencies npm

💡 Result:

For @opentelemetry/host-metrics@0.36.0, the npm peer dependency is:

  • @opentelemetry/api: ^1.3.0 [1]

🌐 Web query:

@opentelemetry/host-metrics 0.36 sdk-metrics 2.0 compatibility support

💡 Result:

@opentelemetry/host-metrics@0.36.0 is an experimental (0.x) package from opentelemetry-js-contrib and is expected to pair with the stable SDK 1.10.x line (compatibility matrix: stable 1.10.x ↔ experimental 0.36.x). It is not in the expected-compatible set for @opentelemetry/sdk-metrics@2.0.x, which pairs with experimental 0.200.x packages. [1]

Also, as of the npm package metadata, @opentelemetry/host-metrics’ latest published version is still 0.36.0, not a 0.200.x-aligned release. [2]

So: No—@opentelemetry/host-metrics@0.36.0 is not considered compatible with @opentelemetry/sdk-metrics@2.0.x. Use a 1.10.x-era @opentelemetry/sdk-metrics with host-metrics 0.36.x, or wait for/upgrade to a 0.200.x-era host-metrics if/when it’s released. [1][3]

Sources:
[1] OpenTelemetry JS “Package Version Compatibility” matrix (opentelemetry-js repo)
[2] npm package page for @opentelemetry/host-metrics (shows 0.36.0 as current)
[3] OpenTelemetry blog: JS SDK 2.0 uses stable >=2.0.0 with unstable >=0.200.0 packages


🏁 Script executed:

# Also check if there's a CHANGELOG or package.json for host-metrics to understand sdk-metrics dependency
# And verify the full state of the changeset requirement
git log --oneline --all -S "@opentelemetry/host-metrics" | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 188


Replace @opentelemetry/host-metrics@^0.36.0 with version 0.200.x or later to match SDK 2.0.x compatibility.

@opentelemetry/host-metrics@0.36.0 is incompatible with @opentelemetry/sdk-metrics@2.0.1. The 0.36.x line is designed for SDK 1.10.x; SDK 2.0.x requires the 0.200.x unstable package line. Either upgrade host-metrics to 0.200.x (once released) or downgrade the SDK to 1.10.x. Additionally, pin the host-metrics version to an exact number (e.g., 0.200.0 or the compatible 0.36.0 if reverting the SDK) for consistency with other OTel dependencies, which are all pinned exact.

🤖 Prompt for AI Agents
In `@packages/core/package.json` around lines 179 - 185, Update the
`@opentelemetry/host-metrics` dependency to a 0.200.x (unstable 2.0 line) release
pinned to an exact version to match the `@opentelemetry/sdk-metrics`@2.0.1
compatibility; specifically replace "@opentelemetry/host-metrics": "^0.36.0"
with an exact 0.200.x version (e.g., "0.200.0") in package.json so the
host-metrics package line aligns with the SDK 2.0.x packages, or alternatively
revert `@opentelemetry/sdk-metrics` back to the 1.10.x line if you prefer to keep
host-metrics@0.36.0.

NULL_SENTINEL,
} from "./utils/flattenAttributes.js";
export { omit } from "./utils/omit.js";
export { generateFriendlyId, fromFriendlyId } from "./isomorphic/friendlyId.js";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for changesets in the .changeset directory
echo "=== Checking for changesets ==="
find .changeset -name "*.md" -type f 2>/dev/null | head -20
echo ""
echo "=== Looking for changesets mentioning core or friendlyId ==="
fd -e md . .changeset/ 2>/dev/null --exec grep -l -i "friendlyId\|core\|generateFriendlyId" {} \; 2>/dev/null || echo "No matching changesets found"

Repository: triggerdotdev/trigger.dev

Length of output: 658


🏁 Script executed:

#!/bin/bash
# Verify the friendlyId.js file exists and contains the exports
echo "=== Checking friendlyId.js file ==="
cat packages/core/src/v3/isomorphic/friendlyId.js 2>/dev/null | head -50

Repository: triggerdotdev/trigger.dev

Length of output: 104


🏁 Script executed:

#!/bin/bash
# Check if these functions were previously exported from v3/index.ts
echo "=== Checking git history for this line ==="
cd packages/core/src/v3 && git log -p --follow -S "generateFriendlyId" -- index.ts 2>/dev/null | head -100 || echo "Git history check failed or not available"

Repository: triggerdotdev/trigger.dev

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Check if the file exists at the expected path
echo "=== Checking if friendlyId.js exists ==="
ls -la packages/core/src/v3/isomorphic/ 2>/dev/null || echo "isomorphic directory not found"

Repository: triggerdotdev/trigger.dev

Length of output: 721


🏁 Script executed:

#!/bin/bash
# Check the actual v3/index.ts file
echo "=== Checking v3/index.ts ==="
cat -n packages/core/src/v3/index.ts 2>/dev/null | grep -A 2 -B 2 "friendlyId" || echo "No friendlyId exports found"

Repository: triggerdotdev/trigger.dev

Length of output: 328


🏁 Script executed:

#!/bin/bash
# List all changesets to see their contents
echo "=== Changesets content ==="
for file in .changeset/*.md; do
  if [ "$file" != ".changeset/README.md" ]; then
    echo "=== $file ==="
    cat "$file"
    echo ""
  fi
done | head -200

Repository: triggerdotdev/trigger.dev

Length of output: 2252


🏁 Script executed:

#!/bin/bash
# Check the friendlyId.ts file to confirm the functions exist
echo "=== Checking friendlyId.ts exports ==="
cat packages/core/src/v3/isomorphic/friendlyId.ts | head -50

Repository: triggerdotdev/trigger.dev

Length of output: 1370


🏁 Script executed:

#!/bin/bash
# Check if there's a changeset that covers this specific change but uses different naming
echo "=== Checking all changesets for 'friendlyId' case-insensitive ==="
grep -r -i "friendlyid\|generateFriendlyId\|fromFriendlyId" .changeset/ 2>/dev/null || echo "No matches found in changesets"

Repository: triggerdotdev/trigger.dev

Length of output: 165


🏁 Script executed:

#!/bin/bash
# Verify this is a new addition - check git diff or status
echo "=== Check git status for changes ==="
git status --short packages/core/src/v3/index.ts 2>/dev/null || echo "Not available"
echo ""
echo "=== Check git diff for this file ==="
git diff HEAD packages/core/src/v3/index.ts 2>/dev/null | head -30 || echo "Not available"

Repository: triggerdotdev/trigger.dev

Length of output: 143


New public API exports — ensure a changeset is included.

Exporting generateFriendlyId and fromFriendlyId from @trigger.dev/core/v3 expands the public API surface. As per coding guidelines, modifications to public packages in packages/* require a changeset via pnpm run changeset:add.

🤖 Prompt for AI Agents
In `@packages/core/src/v3/index.ts` at line 52, You added new public exports
generateFriendlyId and fromFriendlyId (export line in index.ts) which expands
the package API; create and commit a changeset so release tooling and changelogs
capture this public change by running the repository's changeset flow (e.g., run
pnpm run changeset:add), select the appropriate version bump and provide a brief
summary mentioning generateFriendlyId and fromFriendlyId, then commit the
generated changeset file alongside your export change.

Comment on lines 105 to 112
public disable() {
unregisterGlobal(API_NAME);
this._runDisabled = true;
}

public setGlobalTaskContext(taskContext: TaskContext): boolean {
return registerGlobal(API_NAME, taskContext);
this._runDisabled = false;
return registerGlobal(API_NAME, taskContext, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all usages of disable() and isInsideTask to check if any caller
# depends on isInsideTask becoming false after disable()

echo "=== Callers of .disable() on TaskContextAPI ==="
rg -n --type=ts 'taskContext.*\.disable\(\)|\.disable\(\)' -A3 -B1

echo ""
echo "=== Usages of isInsideTask ==="
rg -n --type=ts '\.isInsideTask' -A2 -B1

echo ""
echo "=== Usages of isRunDisabled ==="
rg -n --type=ts '\.isRunDisabled' -A2 -B1

echo ""
echo "=== File: packages/core/src/v3/taskContext/index.ts (context) ==="
sed -n '1,50p' packages/core/src/v3/taskContext/index.ts
sed -n '100,120p' packages/core/src/v3/taskContext/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 7524


🏁 Script executed:

#!/bin/bash
echo "=== Context around isInsideTask usage in apiClient ==="
sed -n '1420,1450p' packages/core/src/v3/apiClient/index.ts

echo ""
echo "=== Check if isRunDisabled is used in apiClient ==="
rg -n 'isRunDisabled' packages/core/src/v3/apiClient/index.ts -A2 -B2

Repository: triggerdotdev/trigger.dev

Length of output: 1002


Fix apiClient header injection logic to check isRunDisabled.

The behavioral change where disable() no longer unregisters the task context global introduces a bug: after disable() is called, isInsideTask remains true (the context is deliberately preserved between runs), but the run is actually disabled. The apiClient injects worker-specific headers (x-trigger-worker, x-trigger-engine-version) based solely on isInsideTask and will incorrectly inject these headers during disabled states (between runs).

Update the check at packages/core/src/v3/apiClient/index.ts:1431 to also verify that isRunDisabled is false:

if (taskContext.isInsideTask && !taskContext.isRunDisabled) {
  headers["x-trigger-worker"] = "true";
  headers["x-trigger-engine-version"] = "V2";
  // ...
}

This aligns with how other systems (e.g., otelProcessors) already handle the distinction between a registered context and an active run.

🤖 Prompt for AI Agents
In `@packages/core/src/v3/taskContext/index.ts` around lines 105 - 112, The
apiClient header injection currently checks only taskContext.isInsideTask and
therefore still injects worker headers when a run is disabled; update the
conditional in the api client (around the header injection logic that references
taskContext.isInsideTask) to also require taskContext.isRunDisabled === false
(e.g., if (taskContext.isInsideTask && !taskContext.isRunDisabled) { ... }),
ensuring disable() (which sets _runDisabled via disable()) prevents injection;
verify related symbols: disable(), setGlobalTaskContext(), isInsideTask,
isRunDisabled, and the header keys "x-trigger-worker"/"x-trigger-engine-version"
are guarded by the new condition.

Comment on lines +143 to +173
if (taskContext.isRunDisabled) {
// Between runs: keep environment/project/org/machine attrs, strip run-specific ones
contextAttrs = {
[SemanticInternalAttributes.ENVIRONMENT_ID]: ctx.environment.id,
[SemanticInternalAttributes.ENVIRONMENT_TYPE]: ctx.environment.type,
[SemanticInternalAttributes.ORGANIZATION_ID]: ctx.organization.id,
[SemanticInternalAttributes.PROJECT_ID]: ctx.project.id,
[SemanticInternalAttributes.MACHINE_PRESET_NAME]: ctx.machine?.name,
};
} else {
// During a run: full context attrs
contextAttrs = {
[SemanticInternalAttributes.RUN_ID]: ctx.run.id,
[SemanticInternalAttributes.TASK_SLUG]: ctx.task.id,
[SemanticInternalAttributes.ATTEMPT_NUMBER]: ctx.attempt.number,
[SemanticInternalAttributes.ENVIRONMENT_ID]: ctx.environment.id,
[SemanticInternalAttributes.ORGANIZATION_ID]: ctx.organization.id,
[SemanticInternalAttributes.PROJECT_ID]: ctx.project.id,
[SemanticInternalAttributes.MACHINE_PRESET_NAME]: ctx.machine?.name,
[SemanticInternalAttributes.ENVIRONMENT_TYPE]: ctx.environment.type,
};
}

if (taskContext.worker) {
contextAttrs[SemanticInternalAttributes.WORKER_ID] = taskContext.worker.id;
contextAttrs[SemanticInternalAttributes.WORKER_VERSION] = taskContext.worker.version;
}

if (!taskContext.isRunDisabled && ctx.run.tags?.length) {
contextAttrs[SemanticInternalAttributes.RUN_TAGS] = ctx.run.tags;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

High-cardinality attributes injected into every metric data point.

The coding guidelines explicitly prohibit using UUIDs/IDs (envId, runId, projectId, organizationId), unbounded integers (attemptNumber), and free-form strings (taskSlug, runTags) as OTEL metric attributes because they cause cardinality explosion in Prometheus/Grafana.

Attributes that violate the guideline:

  • ENVIRONMENT_ID, ORGANIZATION_ID, PROJECT_ID, RUN_ID, WORKER_ID — all IDs
  • TASK_SLUG — unbounded string
  • ATTEMPT_NUMBER — unbounded integer
  • RUN_TAGS — free-form string array

Only ENVIRONMENT_TYPE and MACHINE_PRESET_NAME appear to be low-cardinality.

Consider moving the high-cardinality identifiers to resource attributes (set once on the Resource object) rather than per-data-point attributes if they are needed for correlation, or drop them from metrics entirely and rely on trace/log correlation for drill-down. If these are intentionally scoped to internal infrastructure metrics (not user-facing), document that exception clearly.

As per coding guidelines: "Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)." Also based on learnings about OTEL metric attribute cardinality.

🤖 Prompt for AI Agents
In `@packages/core/src/v3/taskContext/otelProcessors.ts` around lines 143 - 173,
The code is injecting high-cardinality attributes into metric data via
contextAttrs (in the block using taskContext.isRunDisabled and the else branch)
which adds IDs and free-form values; remove ENVIRONMENT_ID, ORGANIZATION_ID,
PROJECT_ID, RUN_ID, WORKER_ID, TASK_SLUG, ATTEMPT_NUMBER, and RUN_TAGS from the
per-data-point metric attributes set on contextAttrs and instead either: 1) move
any needed identifiers to the Resource once (use the Resource API when
initializing telemetry) for correlation, or 2) drop them entirely and only keep
low-cardinality attributes (ENVIRONMENT_TYPE and MACHINE_PRESET_NAME) in
contextAttrs; update the logic that conditionally adds RUN_TAGS and worker
fields (the if (!taskContext.isRunDisabled && ctx.run.tags?.length) and the if
(taskContext.worker) blocks) to avoid attaching high-cardinality values to
metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant