feat: OTEL metrics pipeline for task workers#3061
Conversation
…d, decouple flushing metrics with metric bucket intervals
|
WalkthroughAdds format-aware rendering for numeric columns (ColumnFormatType) with value-formatting utilities and plumbing into charts, legends, tooltips, tables, and big-number cards. Introduces per-column format metadata propagation in the tsql printer and schema, per-table timeBucketThresholds, and time-bucketing behavior. Adds a ClickHouse metrics table, ClickHouse metrics insert helpers, an OTLP metrics POST route, and OTLPImporter/OTLPExporter support to convert and store ResourceMetrics. Expands OTEL integration across SDK/CLI/worker (metric readers, exporters, host/node runtime metrics, buffering/task-context exporters), env vars, machine ID propagation, and related API surfaces. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (6)**/*.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
{packages/core,apps/webapp}/**/*.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.ts📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
Files:
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}📄 CodeRabbit inference engine (AGENTS.md)
Files:
{packages,integrations}/**/*📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (8)📓 Common learnings📚 Learning: 2025-11-27T16:27:35.304ZApplied to files:
📚 Learning: 2026-01-08T15:57:09.323ZApplied to files:
📚 Learning: 2026-01-08T15:57:09.323ZApplied to files:
📚 Learning: 2025-11-27T16:27:35.304ZApplied to files:
📚 Learning: 2025-11-27T16:27:35.304ZApplied to files:
📚 Learning: 2025-11-27T16:27:35.304ZApplied to files:
📚 Learning: 2025-11-27T16:27:35.304ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
🔇 Additional comments (4)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal-packages/tsql/src/query/schema.ts (1)
306-330:⚠️ Potential issue | 🟡 MinorPrefer a type alias for
OutputColumnMetadata.
Since this interface is being extended, converting it to atypekeeps the file aligned with the TS guideline.As per coding guidelines, `**/*.{ts,tsx}`: Use types over interfaces for TypeScript.Suggested change
-export interface OutputColumnMetadata { +export type OutputColumnMetadata = { /** Column name in the result set (after AS aliasing) */ name: string; /** ClickHouse data type (from schema or inferred for computed expressions) */ type: ClickHouseType; ... -} +};
🤖 Fix all issues with AI agents
In `@apps/webapp/app/components/code/QueryResultsChart.tsx`:
- Around line 1111-1127: The bytes formatter block in QueryResultsChart.tsx can
mis-handle maxVal when it's -Infinity; update the formatter to guard against
non-finite maxVal by computing a safeMax (e.g., const safeMax =
Number.isFinite(maxVal) ? Math.abs(maxVal) : 1) and use safeMax in the
log/division math instead of Math.abs(maxVal || 1), keeping the existing value
=== 0 early return and the rest of the scaling logic (divisor, units, i
calculation, scaled, return string) unchanged.
In `@apps/webapp/app/components/code/TSQLResultsTable.tsx`:
- Around line 518-542: The current format-hint block is skipped whenever
column.customRenderType is set and also only handles
bytes/decimalBytes/percent/quantity, causing prettyFormat hints like "duration"
or "cost" (and any overrides on columns with customRenderType) to be ignored;
update the conditional to always check column.format regardless of
column.customRenderType and extend the switch in the format handling (the block
that checks column.format) to include "duration" and "cost" cases (using the
existing formatBytes/formatDecimalBytes/formatQuantity helpers as a model—call
the appropriate formatDuration/formatCost utilities or create them if missing)
so that prettyFormat hints are applied consistently.
In `@apps/webapp/app/routes/otel.v1.metrics.ts`:
- Around line 27-29: The response returned from
ExportMetricsServiceResponse.encode(exportResponse).finish() is missing the
Content-Type header; update the Response construction in the export handler (the
return new Response(...) line) to include headers: { "Content-Type":
"application/x-protobuf" } so clients can correctly interpret the protobuf
payload.
- Around line 13-17: The route currently casts the request body unsafely with
"body as ExportMetricsServiceRequest" before calling otlpExporter.exportMetrics,
which bypasses validation; replace this with a zod schema for the
ExportMetricsServiceRequest shape, parse/validate the raw JSON from
request.json() (throwing or returning a 4xx on parse failure), and pass the
validated/typed result to otlpExporter.exportMetrics so that
ExportMetricsServiceRequest, otlpExporter.exportMetrics, and request.json are
used with proper runtime validation rather than an unsafe cast.
In `@apps/webapp/app/services/queryService.server.ts`:
- Around line 155-160: The code currently picks a single time column using a
regex search over options.query which breaks for multi-table queries; update the
logic that computes timeColumn to (1) scan options.query for all referenced
table names from querySchemas (use word-boundary matches and avoid matching
inside string literals/comments by stripping strings/comments or, better, using
a lightweight SQL tokenizer/parser) and map each matched schema to its
timeConstraint, then (2) if the matched set of timeConstraint values contains
more than one distinct column, either reject the query with a clear error
(preferred) or implement per-table constraint enforcement in executeTSQL;
replace the single matchedSchema/timeColumn assignment with this multi-table
detection and add an explicit error path referencing querySchemas,
timeConstraint, options.query, timeColumn, and executeTSQL so callers get a
clear message when conflicting time columns are present.
In `@apps/webapp/app/utils/columnFormat.ts`:
- Around line 8-28: The bug is that for 0 < |bytes| < 1 the computed index i
becomes negative and selects units[-1]; in both formatBytes and
formatDecimalBytes clamp the computed index to be at least 0 and at most
units.length - 1 (e.g., compute rawIndex = Math.floor(Math.log(Math.abs(bytes))
/ Math.log(1024 or 1000)) then set i = Math.max(0, Math.min(rawIndex,
units.length - 1))). Update the calculations in the functions formatBytes and
formatDecimalBytes to use the clamped i before dividing and formatting.
In `@apps/webapp/app/v3/otlpExporter.server.ts`:
- Around line 490-512: The current value selection conflates a real 0 with
protobuf3 defaults by comparing dp.asDouble !== 0 and dp.asInt !== BigInt(0);
instead, detect which value field is actually set (use the protobuf oneof
discriminator such as dp.valueCase / dp.value?.case if available, or check
presence via hasOwnProperty/undefined checks for dp.asDouble vs dp.asInt) and
then choose that field (prefer dp.asDouble when present, even if 0, otherwise
use Number(dp.asInt)). Update the gauge handling where metric.gauge is iterated
(the dp value extraction before rows.push, referenced by metric.gauge,
dp.asDouble, dp.asInt, resolveDataPointContext, floorToTenSecondBucket) to use
the presence/oneof check and proper BigInt -> Number conversion.
In `@apps/webapp/app/v3/querySchemas.ts`:
- Around line 557-563: The attempt_number column is incorrectly defined as a
String which prevents numeric operations; update the attempt_number definition
to use a numeric ClickHouse type (e.g., change column("String", {...}) to
column("Int64", {...})), update the example to a numeric value (e.g., 1) and
keep the expression "attributes.trigger.attempt_number" and description
unchanged so the schema matches the metrics_v1 JSON typing and supports numeric
sorting/operations.
In `@apps/webapp/app/v3/services/aiQueryService.server.ts`:
- Line 58: Update the model string used to initialize the LanguageModelV1
instance to a currently supported Codex mini model: replace the literal
"codex-mini-latest" in the class field declaration (private readonly model:
LanguageModelV1 = openai("codex-mini-latest")) with a supported model name such
as "gpt-5-codex-mini" or "gpt-5.1-codex-mini" so the openai(...) call uses an
available model.
In `@internal-packages/tsql/src/query/time_buckets.ts`:
- Around line 20-27: Replace the exported interface BucketThreshold with a type
alias: change the declaration "export interface BucketThreshold {
maxRangeSeconds: number; interval: TimeBucketInterval; }" to "export type
BucketThreshold = { maxRangeSeconds: number; interval: TimeBucketInterval; }",
preserving the JSDoc comments and export so all usages of BucketThreshold
continue to work unchanged; no other logic changes are necessary.
In `@packages/core/package.json`:
- Around line 179-185: The package.json change updates public package
`@trigger.dev/core` but lacks a changeset; run pnpm run changeset:add to create a
changeset documenting the dependency updates to
`@opentelemetry/exporter-metrics-otlp-http`,
`@opentelemetry/exporter-trace-otlp-http`, `@opentelemetry/instrumentation`,
`@opentelemetry/sdk-logs`, `@opentelemetry/sdk-metrics` (and any related
`@opentelemetry` entries), fill in the change type/summary, save and commit the
generated changeset file alongside your package.json changes so the release
tooling will pick up the modification to `@trigger.dev/core`.
- Around line 179-185: Update the `@opentelemetry/host-metrics` dependency to a
0.200.x (unstable 2.0 line) release pinned to an exact version to match the
`@opentelemetry/sdk-metrics`@2.0.1 compatibility; specifically replace
"@opentelemetry/host-metrics": "^0.36.0" with an exact 0.200.x version (e.g.,
"0.200.0") in package.json so the host-metrics package line aligns with the SDK
2.0.x packages, or alternatively revert `@opentelemetry/sdk-metrics` back to the
1.10.x line if you prefer to keep host-metrics@0.36.0.
In `@packages/core/src/v3/index.ts`:
- Line 52: You added new public exports generateFriendlyId and fromFriendlyId
(export line in index.ts) which expands the package API; create and commit a
changeset so release tooling and changelogs capture this public change by
running the repository's changeset flow (e.g., run pnpm run changeset:add),
select the appropriate version bump and provide a brief summary mentioning
generateFriendlyId and fromFriendlyId, then commit the generated changeset file
alongside your export change.
In `@packages/core/src/v3/taskContext/index.ts`:
- Around line 105-112: The apiClient header injection currently checks only
taskContext.isInsideTask and therefore still injects worker headers when a run
is disabled; update the conditional in the api client (around the header
injection logic that references taskContext.isInsideTask) to also require
taskContext.isRunDisabled === false (e.g., if (taskContext.isInsideTask &&
!taskContext.isRunDisabled) { ... }), ensuring disable() (which sets
_runDisabled via disable()) prevents injection; verify related symbols:
disable(), setGlobalTaskContext(), isInsideTask, isRunDisabled, and the header
keys "x-trigger-worker"/"x-trigger-engine-version" are guarded by the new
condition.
In `@packages/core/src/v3/taskContext/otelProcessors.ts`:
- Around line 143-173: The code is injecting high-cardinality attributes into
metric data via contextAttrs (in the block using taskContext.isRunDisabled and
the else branch) which adds IDs and free-form values; remove ENVIRONMENT_ID,
ORGANIZATION_ID, PROJECT_ID, RUN_ID, WORKER_ID, TASK_SLUG, ATTEMPT_NUMBER, and
RUN_TAGS from the per-data-point metric attributes set on contextAttrs and
instead either: 1) move any needed identifiers to the Resource once (use the
Resource API when initializing telemetry) for correlation, or 2) drop them
entirely and only keep low-cardinality attributes (ENVIRONMENT_TYPE and
MACHINE_PRESET_NAME) in contextAttrs; update the logic that conditionally adds
RUN_TAGS and worker fields (the if (!taskContext.isRunDisabled &&
ctx.run.tags?.length) and the if (taskContext.worker) blocks) to avoid attaching
high-cardinality values to metrics.
🧹 Nitpick comments (14)
packages/core/src/v3/taskContext/otelProcessors.ts (3)
224-235: Buffer grows without an upper bound between flush intervals.If the flush interval is long or export calls are frequent,
_bufferaccumulates indefinitely until the next time-based flush. Consider adding a max buffer size cap that triggers an early flush to prevent unbounded memory growth.💡 Proposed safeguard
+ private static readonly MAX_BUFFER_SIZE = 100; + export(metrics: ResourceMetrics, resultCallback: (result: ExportResult) => void): void { this._buffer.push(metrics); const now = Date.now(); - if (now - this._lastFlushTime >= this._flushIntervalMs) { + if (now - this._lastFlushTime >= this._flushIntervalMs || this._buffer.length >= BufferingMetricExporter.MAX_BUFFER_SIZE) { this._lastFlushTime = now; const merged = this._mergeBuffer(); this._innerExporter.export(merged, resultCallback);
258-306: Merge uses only the first batch'sresourceand keys scopes by name alone.
- If different
ResourceMetricsentries carry differentresourceobjects (e.g., after a resource attribute update), all but the first are silently dropped (Line 300).- Scopes are keyed solely by
scope.name(Line 273), so two scopes with the same name but differentversionorschemaUrlwould be merged incorrectly.These are edge cases but worth a defensive check or at least a comment documenting the assumption.
💡 Use composite scope key
- const scopeKey = sm.scope.name; + const scopeKey = `${sm.scope.name}@${sm.scope.version ?? ""}`;
118-130: Duplicated delegation boilerplate across both exporters.Both
TaskContextMetricExporterandBufferingMetricExporterhave identical constructor logic for forwardingselectAggregationTemporalityandselectAggregation. Consider extracting a small helper or base class to DRY this up if more decorating exporters are planned.Also applies to: 204-222
apps/webapp/app/v3/services/aiQueryService.server.ts (2)
377-382: Consider extracting the shared prompt sections into helper methods to reduce duplication.The "Choosing the Right Table" block, syntax guide, and common patterns are nearly identical between
buildSystemPromptandbuildEditSystemPrompt. Extracting shared sections (e.g.,buildTableSelectionGuide(),buildSyntaxGuide()) would make future prompt updates less error-prone.Also applies to: 525-529
589-594: Edit-mode prompt has fewer metrics examples than new-mode.The new-mode prompt includes detailed example queries for CPU and memory metrics (lines 465–483) and the full list of
prettyFormattypes (line 461), while the edit-mode prompt omits these. If the user asks to edit a query to add metrics columns or switch tables, the AI will have less guidance. Consider whether this asymmetry is intentional.packages/cli-v3/src/dev/taskRunProcessPool.ts (2)
44-44: Silence the Biome warning by using a block body in theforEachcallback.The arrow shorthand implicitly returns the result of
clearIdleTimer, which Biome flags asuseIterableCallbackReturn. Use a block body to make the intent explicit.♻️ Proposed fix
- processesToKill.forEach((process) => this.clearIdleTimer(process)); + processesToKill.forEach((process) => { + this.clearIdleTimer(process); + });
226-243: Unhandled promise fromkillProcessin the idle timeout callback.
this.killProcess(process)returns aPromisethat is neither awaited nor.catch()-ed. WhilekillProcesshas internal error handling, any unexpected rejection would become an unhandled promise rejection. Consider adding a.catch():🛡️ Proposed fix
- this.killProcess(process); + this.killProcess(process).catch((err) => { + logger.debug("[TaskRunProcessPool] Error killing idle process", { err }); + });apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts (1)
966-976:TRIGGER_OTEL_METRICS_EXPORT_TIMEOUT_MILLISreuses the export interval value.Both in dev (here) and prod (line 1124), the export timeout is set to the same value as the export interval. This means a slow export could consume the entire interval window before the next export fires. If this is intentional as a "timeout should never exceed interval" heuristic, a brief comment would help future readers. Otherwise, consider exposing a separate
DEV_OTEL_METRICS_EXPORT_TIMEOUT_MILLISenv var.internal-packages/clickhouse/schema/016_create_metrics_v1.sql (1)
1-37: Confirm whether this should follow theraw_input-table naming convention.
Iftrigger_dev.metrics_v1is the ingest/raw metrics table, consider renaming totrigger_dev.raw_metrics_v1(or adding a raw table + MV) to match the repo’s naming rule.
As per coding guidelines,internal-packages/clickhouse/schema/**/*.sql: Follow ClickHouse naming conventions:raw_prefix for input tables,_v1,_v2suffixes for versioning,_mv_v1suffix for materialized views.internal-packages/tsql/src/query/printer.ts (1)
944-988: Consider centralizing the allowed prettyFormat types.
The hard‑codedvalidFormatslist can drift from ColumnFormatType/formatters; a shared exported const would keep things in sync.apps/webapp/app/components/primitives/charts/ChartBar.tsx (1)
170-188:tooltipValueFormatteris silently ignored when notooltipLabelFormatteris provided.In the bar chart, value formatting only takes effect when
tooltipLabelFormatteris supplied (line 175), because otherwise theZoomTooltipcomponent is rendered. This differs fromChartLine.tsxwhere the formatter is always passed toChartTooltipContent. If this is intentional (zoom tooltip has its own format), no action needed — but worth confirming this asymmetry is desired.apps/webapp/app/routes/otel.v1.metrics.ts (1)
36-40: Consider using a structured logger instead ofconsole.error.Other parts of the webapp use a structured logger for better observability. Also, logging the raw error object might inadvertently leak sensitive data in production logs.
apps/webapp/app/v3/otlpExporter.server.ts (1)
575-582: Usetypeinstead ofinterfaceper coding guidelines.As per coding guidelines: "Use types over interfaces for TypeScript."
Suggested fix
-interface ResourceContext { - taskSlug: string | undefined; - runId: string | undefined; - attemptNumber: number | undefined; - machineId: string | undefined; - workerId: string | undefined; - workerVersion: string | undefined; -} +type ResourceContext = { + taskSlug: string | undefined; + runId: string | undefined; + attemptNumber: number | undefined; + machineId: string | undefined; + workerId: string | undefined; + workerVersion: string | undefined; +};packages/core/src/v3/otel/tracingSDK.ts (1)
340-354:flush()andshutdown()correctly include the meter provider.Using
Promise.allfor parallel shutdown/flush of all three providers is appropriate. One consideration: if one provider's shutdown rejects,Promise.allwill reject immediately, potentially leaving other providers in an inconsistent state. ConsiderPromise.allSettledforshutdown()to ensure all providers get a chance to clean up.Optional: use allSettled for more resilient shutdown
public async shutdown() { - await Promise.all([ + await Promise.allSettled([ this._traceProvider.shutdown(), this._logProvider.shutdown(), this._meterProvider.shutdown(), ]); }
| if (format === "bytes" || format === "decimalBytes") { | ||
| const divisor = format === "bytes" ? 1024 : 1000; | ||
| const units = | ||
| format === "bytes" | ||
| ? ["B", "KiB", "MiB", "GiB", "TiB"] | ||
| : ["B", "KB", "MB", "GB", "TB"]; | ||
| return (value: number): string => { | ||
| if (value === 0) return "0 B"; | ||
| // Use consistent unit for all ticks based on max value | ||
| const i = Math.min( | ||
| Math.floor(Math.log(Math.abs(maxVal || 1)) / Math.log(divisor)), | ||
| units.length - 1 | ||
| ); | ||
| const scaled = value / Math.pow(divisor, i); | ||
| return `${scaled.toFixed(scaled < 10 ? 1 : 0)} ${units[i]}`; | ||
| }; | ||
| } |
There was a problem hiding this comment.
Edge case: maxVal could be -Infinity when the formatter is invoked with bytes format.
If maxVal remains -Infinity (e.g., all values are null/non-numeric), then Math.abs(maxVal || 1) evaluates to Math.abs(-Infinity) = Infinity because -Infinity is truthy. This makes i clamp to the last unit index, so a non-zero value would be formatted as e.g. "0.0 TiB" instead of "0 B". In practice, the early returns at lines 977–983 prevent rendering an empty chart, so this is unlikely to trigger. But a small guard would make the formatter more robust.
Proposed defensive fix
return (value: number): string => {
if (value === 0) return "0 B";
// Use consistent unit for all ticks based on max value
+ const refValue = isFinite(maxVal) ? Math.abs(maxVal) : Math.abs(value);
const i = Math.min(
- Math.floor(Math.log(Math.abs(maxVal || 1)) / Math.log(divisor)),
+ Math.floor(Math.log(refValue || 1) / Math.log(divisor)),
units.length - 1
);🤖 Prompt for AI Agents
In `@apps/webapp/app/components/code/QueryResultsChart.tsx` around lines 1111 -
1127, The bytes formatter block in QueryResultsChart.tsx can mis-handle maxVal
when it's -Infinity; update the formatter to guard against non-finite maxVal by
computing a safeMax (e.g., const safeMax = Number.isFinite(maxVal) ?
Math.abs(maxVal) : 1) and use safeMax in the log/division math instead of
Math.abs(maxVal || 1), keeping the existing value === 0 early return and the
rest of the scaling logic (divisor, units, i calculation, scaled, return string)
unchanged.
| // Check format hint for new format types (from prettyFormat()) | ||
| if (column.format && !column.customRenderType) { | ||
| switch (column.format) { | ||
| case "bytes": | ||
| if (typeof value === "number") { | ||
| return <span className="tabular-nums">{formatBytes(value)}</span>; | ||
| } | ||
| break; | ||
| case "decimalBytes": | ||
| if (typeof value === "number") { | ||
| return <span className="tabular-nums">{formatDecimalBytes(value)}</span>; | ||
| } | ||
| break; | ||
| case "percent": | ||
| if (typeof value === "number") { | ||
| return <span className="tabular-nums">{value.toFixed(2)}%</span>; | ||
| } | ||
| break; | ||
| case "quantity": | ||
| if (typeof value === "number") { | ||
| return <span className="tabular-nums">{formatQuantity(value)}</span>; | ||
| } | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
prettyFormat hints are skipped when customRenderType exists (and duration/cost formats are ignored).
The format-hint block only runs when !column.customRenderType and only handles bytes/decimal/percent/quantity. That means prettyFormat(..., "duration" | "cost") or overrides on columns with customRenderType won’t render as intended. Prefer column.format and handle all supported formats.
Suggested change
- if (column.format && !column.customRenderType) {
+ if (column.format) {
switch (column.format) {
case "bytes":
if (typeof value === "number") {
return <span className="tabular-nums">{formatBytes(value)}</span>;
}
break;
case "decimalBytes":
if (typeof value === "number") {
return <span className="tabular-nums">{formatDecimalBytes(value)}</span>;
}
break;
case "percent":
if (typeof value === "number") {
return <span className="tabular-nums">{value.toFixed(2)}%</span>;
}
break;
case "quantity":
if (typeof value === "number") {
return <span className="tabular-nums">{formatQuantity(value)}</span>;
}
break;
+ case "duration":
+ if (typeof value === "number") {
+ return (
+ <span className="tabular-nums">
+ {formatDurationMilliseconds(value, { style: "short" })}
+ </span>
+ );
+ }
+ break;
+ case "durationSeconds":
+ if (typeof value === "number") {
+ return (
+ <span className="tabular-nums">
+ {formatDurationMilliseconds(value * 1000, { style: "short" })}
+ </span>
+ );
+ }
+ break;
+ case "costInDollars":
+ if (typeof value === "number") {
+ return <span className="tabular-nums">{formatCurrencyAccurate(value)}</span>;
+ }
+ break;
+ case "cost":
+ if (typeof value === "number") {
+ return <span className="tabular-nums">{formatCurrencyAccurate(value / 100)}</span>;
+ }
+ break;
}
}🤖 Prompt for AI Agents
In `@apps/webapp/app/components/code/TSQLResultsTable.tsx` around lines 518 - 542,
The current format-hint block is skipped whenever column.customRenderType is set
and also only handles bytes/decimalBytes/percent/quantity, causing prettyFormat
hints like "duration" or "cost" (and any overrides on columns with
customRenderType) to be ignored; update the conditional to always check
column.format regardless of column.customRenderType and extend the switch in the
format handling (the block that checks column.format) to include "duration" and
"cost" cases (using the existing formatBytes/formatDecimalBytes/formatQuantity
helpers as a model—call the appropriate formatDuration/formatCost utilities or
create them if missing) so that prettyFormat hints are applied consistently.
| const body = await request.json(); | ||
|
|
||
| const exportResponse = await otlpExporter.exportMetrics( | ||
| body as ExportMetricsServiceRequest | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual file
head -50 apps/webapp/app/routes/otel.v1.metrics.tsRepository: triggerdotdev/trigger.dev
Length of output: 1426
🏁 Script executed:
# Search for ExportMetricsServiceRequest to understand where it comes from
rg 'ExportMetricsServiceRequest' -n --type ts apps/webapp/app/routes/otel.v1.metrics.tsRepository: triggerdotdev/trigger.dev
Length of output: 239
🏁 Script executed:
# Check what ExportMetricsServiceRequest is imported from
rg 'import.*ExportMetricsServiceRequest' -n --type tsRepository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check if ExportMetricsServiceRequest has a fromJSON method in the codebase
rg -B5 -A10 'fromJSON' --type ts | head -100Repository: triggerdotdev/trigger.dev
Length of output: 11593
🏁 Script executed:
# Search for ExportMetricsServiceRequest definition and its fromJSON method
fd 'metrics' --type f -e ts internal-packages/otlp-importer/src/generated/ | head -20Repository: triggerdotdev/trigger.dev
Length of output: 262
🏁 Script executed:
# Look specifically for ExportMetricsServiceRequest
rg 'ExportMetricsServiceRequest' internal-packages/otlp-importer/src/generated -A 30 | grep -E '(fromJSON|class|interface|export)'Repository: triggerdotdev/trigger.dev
Length of output: 2544
🏁 Script executed:
# Check how zod is used in the webapp for validation of API inputs
rg 'zod|z\.parse|z\.validate' apps/webapp/app/routes -l --type ts | head -10Repository: triggerdotdev/trigger.dev
Length of output: 758
🏁 Script executed:
# Look at a specific example of zod validation in webapp routes
head -80 apps/webapp/app/routes/admin.api.v1.environments.\$environmentId.tsRepository: triggerdotdev/trigger.dev
Length of output: 2326
🏁 Script executed:
# Check if there's a pattern for API body validation using zod in the file or nearby
rg -A5 'z\.parse|request\.json()' apps/webapp/app/routes/admin.api.v1.environments.\$environmentId.ts --type tsRepository: triggerdotdev/trigger.dev
Length of output: 272
🏁 Script executed:
# Let's examine the full ExportMetricsServiceRequest.fromJSON implementation
rg -A 20 'fromJSON\(object: any\): ExportMetricsServiceRequest' internal-packages/otlp-importer/src/generated/opentelemetry/proto/collector/metrics/v1/metrics_service.tsRepository: triggerdotdev/trigger.dev
Length of output: 928
Use zod validation for the JSON body instead of an unsafe cast.
The unvalidated cast body as ExportMetricsServiceRequest bypasses type safety. Per the coding guidelines, zod should be used for validation in apps/webapp. Define a validation schema for the request and parse the body against it:
const body = await request.json();
+ const exportRequest = ExportMetricsServiceRequest.fromJSON(body);
+
const exportResponse = await otlpExporter.exportMetrics(
- body as ExportMetricsServiceRequest
+ exportRequest
);Note: While ExportMetricsServiceRequest.fromJSON() is available in the generated protobuf code, consider using a zod schema instead to align with project validation standards in the webapp.
🤖 Prompt for AI Agents
In `@apps/webapp/app/routes/otel.v1.metrics.ts` around lines 13 - 17, The route
currently casts the request body unsafely with "body as
ExportMetricsServiceRequest" before calling otlpExporter.exportMetrics, which
bypasses validation; replace this with a zod schema for the
ExportMetricsServiceRequest shape, parse/validate the raw JSON from
request.json() (throwing or returning a 4xx on parse failure), and pass the
validated/typed result to otlpExporter.exportMetrics so that
ExportMetricsServiceRequest, otlpExporter.exportMetrics, and request.json are
used with proper runtime validation rather than an unsafe cast.
| return new Response(ExportMetricsServiceResponse.encode(exportResponse).finish(), { | ||
| status: 200, | ||
| }); |
There was a problem hiding this comment.
Missing Content-Type header on protobuf response.
The protobuf response should include Content-Type: application/x-protobuf so clients can correctly interpret the response format.
Suggested fix
return new Response(ExportMetricsServiceResponse.encode(exportResponse).finish(), {
status: 200,
+ headers: { "Content-Type": "application/x-protobuf" },
});🤖 Prompt for AI Agents
In `@apps/webapp/app/routes/otel.v1.metrics.ts` around lines 27 - 29, The response
returned from ExportMetricsServiceResponse.encode(exportResponse).finish() is
missing the Content-Type header; update the Response construction in the export
handler (the return new Response(...) line) to include headers: {
"Content-Type": "application/x-protobuf" } so clients can correctly interpret
the protobuf payload.
| // Detect which table the query targets to determine the time column | ||
| // Each table schema declares its primary time column via timeConstraint | ||
| const matchedSchema = querySchemas.find((s) => | ||
| new RegExp(`\\bFROM\\s+${s.name}\\b`, "i").test(options.query) | ||
| ); | ||
| const timeColumn = matchedSchema?.timeConstraint ?? "triggered_at"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find the querySchemas definition and schema declarations
rg -n "querySchemas" apps/webapp/app/services/queryService.server.ts -A 5 -B 5Repository: triggerdotdev/trigger.dev
Length of output: 1435
🏁 Script executed:
#!/bin/bash
# Find all timeConstraint declarations across the codebase
rg -n "timeConstraint" apps/webapp/ --type=ts -C 2Repository: triggerdotdev/trigger.dev
Length of output: 1739
🏁 Script executed:
#!/bin/bash
# Check the full context around lines 155-160 and surrounding code
sed -n '140,260p' apps/webapp/app/services/queryService.server.ts | cat -nRepository: triggerdotdev/trigger.dev
Length of output: 5691
🏁 Script executed:
#!/bin/bash
# Search for JOIN queries in tests or examples
rg -n "JOIN" apps/webapp/ --type=ts -C 2 | head -100Repository: triggerdotdev/trigger.dev
Length of output: 8142
🏁 Script executed:
#!/bin/bash
# Find the executeTSQL function to understand how it handles enforcedWhereClause
rg -n "executeTSQL" apps/webapp/ --type=ts -lRepository: triggerdotdev/trigger.dev
Length of output: 116
🏁 Script executed:
#!/bin/bash
# Check if there are any documented multi-table queries or comments about JOIN support
rg -n "JOIN\|multi.table\|multiple.table" apps/webapp/app/services/queryService.server.ts -i -C 3Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
#!/bin/bash
# Look at the full querySchemas to see all available tables
head -450 apps/webapp/app/v3/querySchemas.ts | tail -50Repository: triggerdotdev/trigger.dev
Length of output: 1780
🏁 Script executed:
#!/bin/bash
# Find the executeTSQL implementation
find apps/webapp -name "*.ts" -type f | xargs grep -l "executeTSQL" | head -10Repository: triggerdotdev/trigger.dev
Length of output: 116
🏁 Script executed:
#!/bin/bash
# Look for the actual executeTSQL function definition
rg -n "^(export\s+)?(async\s+)?function executeTSQL|const executeTSQL" apps/webapp --type=ts -A 20Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
#!/bin/bash
# Check if executeTSQL is imported from an external package
rg -n "executeTSQL" apps/webapp/app/services/queryService.server.ts -B 30 | grep -E "import|from"Repository: triggerdotdev/trigger.dev
Length of output: 279
🏁 Script executed:
#!/bin/bash
# Search more carefully for where executeTSQL is defined or imported
rg "executeTSQL" apps/webapp/app/services/queryService.server.ts -B 50 | head -100Repository: triggerdotdev/trigger.dev
Length of output: 2161
🏁 Script executed:
#!/bin/bash
# Search for executeTSQL in the entire webapp to find where it's defined
fd -e ts -e tsx apps/webapp | xargs grep -l "executeTSQL"Repository: triggerdotdev/trigger.dev
Length of output: 420
🏁 Script executed:
#!/bin/bash
# Look at imports at the top of queryService.server.ts
head -50 apps/webapp/app/services/queryService.server.tsRepository: triggerdotdev/trigger.dev
Length of output: 1866
🏁 Script executed:
#!/bin/bash
# Check if there are any query examples or integration tests that show multi-table queries being used
rg -n "FROM.*JOIN|JOIN.*metrics|JOIN.*runs" apps/webapp/app --type=ts -C 2Repository: triggerdotdev/trigger.dev
Length of output: 4566
🏁 Script executed:
#!/bin/bash
# Verify the test example to see how queryService is tested with queries
rg -n "executeQuery\|queryService" apps/webapp/test --type=ts -l 2>/dev/null | head -5Repository: triggerdotdev/trigger.dev
Length of output: 51
Multi-table queries with different time columns will fail or apply incorrect constraints.
When a query JOINs tables with different timeConstraint values (e.g., FROM runs JOIN metrics_v1), the regex-based detection at line 157–158 matches only the first table name, causing timeColumn to be set incorrectly for the other table. For instance, runs uses "triggered_at" while metrics_v1 uses "bucket_start". The enforced WHERE clause at line 206 and fallback at line 248 will then reference the wrong column name for the non-primary table, leading to query failures or silently incorrect filtering.
The regex can also match table names inside string literals or comments, though this is a lower-risk edge case.
Verify whether the downstream executeTSQL layer handles per-table constraint enforcement, or whether multi-table queries should be explicitly rejected if they involve tables with conflicting time columns.
🧰 Tools
🪛 ast-grep (0.40.5)
[warning] 157-157: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\bFROM\\s+${s.name}\\b, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🤖 Prompt for AI Agents
In `@apps/webapp/app/services/queryService.server.ts` around lines 155 - 160, The
code currently picks a single time column using a regex search over
options.query which breaks for multi-table queries; update the logic that
computes timeColumn to (1) scan options.query for all referenced table names
from querySchemas (use word-boundary matches and avoid matching inside string
literals/comments by stripping strings/comments or, better, using a lightweight
SQL tokenizer/parser) and map each matched schema to its timeConstraint, then
(2) if the matched set of timeConstraint values contains more than one distinct
column, either reject the query with a clear error (preferred) or implement
per-table constraint enforcement in executeTSQL; replace the single
matchedSchema/timeColumn assignment with this multi-table detection and add an
explicit error path referencing querySchemas, timeConstraint, options.query,
timeColumn, and executeTSQL so callers get a clear message when conflicting time
columns are present.
| * A threshold mapping a maximum time range duration to a bucket interval. | ||
| */ | ||
| export interface BucketThreshold { | ||
| /** Maximum range duration in seconds for this threshold to apply */ | ||
| maxRangeSeconds: number; | ||
| /** The bucket interval to use when the range is under maxRangeSeconds */ | ||
| interval: TimeBucketInterval; | ||
| } |
There was a problem hiding this comment.
Use a type alias for BucketThreshold.
The TS guideline prefers type aliases over interface for new definitions.
Suggested change
-export interface BucketThreshold {
+export type BucketThreshold = {
/** Maximum range duration in seconds for this threshold to apply */
maxRangeSeconds: number;
/** The bucket interval to use when the range is under maxRangeSeconds */
interval: TimeBucketInterval;
-}
+};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * A threshold mapping a maximum time range duration to a bucket interval. | |
| */ | |
| export interface BucketThreshold { | |
| /** Maximum range duration in seconds for this threshold to apply */ | |
| maxRangeSeconds: number; | |
| /** The bucket interval to use when the range is under maxRangeSeconds */ | |
| interval: TimeBucketInterval; | |
| } | |
| * A threshold mapping a maximum time range duration to a bucket interval. | |
| */ | |
| export type BucketThreshold = { | |
| /** Maximum range duration in seconds for this threshold to apply */ | |
| maxRangeSeconds: number; | |
| /** The bucket interval to use when the range is under maxRangeSeconds */ | |
| interval: TimeBucketInterval; | |
| }; |
🤖 Prompt for AI Agents
In `@internal-packages/tsql/src/query/time_buckets.ts` around lines 20 - 27,
Replace the exported interface BucketThreshold with a type alias: change the
declaration "export interface BucketThreshold { maxRangeSeconds: number;
interval: TimeBucketInterval; }" to "export type BucketThreshold = {
maxRangeSeconds: number; interval: TimeBucketInterval; }", preserving the JSDoc
comments and export so all usages of BucketThreshold continue to work unchanged;
no other logic changes are necessary.
| "@opentelemetry/exporter-metrics-otlp-http": "0.203.0", | ||
| "@opentelemetry/host-metrics": "^0.36.0", | ||
| "@opentelemetry/exporter-trace-otlp-http": "0.203.0", | ||
| "@opentelemetry/instrumentation": "0.203.0", | ||
| "@opentelemetry/resources": "2.0.1", | ||
| "@opentelemetry/sdk-logs": "0.203.0", | ||
| "@opentelemetry/sdk-metrics": "2.0.1", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if a changeset was added in this PR for `@trigger.dev/core`
fd --type f '.' '.changeset' --exec grep -l '@trigger.dev/core' {} 2>/dev/null || echo "No changeset found referencing `@trigger.dev/core`"Repository: triggerdotdev/trigger.dev
Length of output: 225
🏁 Script executed:
# Also check the actual package.json to verify it's a public package
cat packages/core/package.json | head -20Repository: triggerdotdev/trigger.dev
Length of output: 503
🏁 Script executed:
# List changesets to see what exists
ls -la .changeset/ 2>/dev/null | head -20Repository: triggerdotdev/trigger.dev
Length of output: 914
🏁 Script executed:
# Check if these are new dependencies or version updates by examining git diff
git diff HEAD -- packages/core/package.json | head -50Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Also check the specific lines mentioned in the review
sed -n '179,185p' packages/core/package.jsonRepository: triggerdotdev/trigger.dev
Length of output: 407
Add a changeset for modifications to this public package.
@trigger.dev/core is a public package, and modifications require a changeset. Run pnpm run changeset:add to document these dependency changes.
🤖 Prompt for AI Agents
In `@packages/core/package.json` around lines 179 - 185, The package.json change
updates public package `@trigger.dev/core` but lacks a changeset; run pnpm run
changeset:add to create a changeset documenting the dependency updates to
`@opentelemetry/exporter-metrics-otlp-http`,
`@opentelemetry/exporter-trace-otlp-http`, `@opentelemetry/instrumentation`,
`@opentelemetry/sdk-logs`, `@opentelemetry/sdk-metrics` (and any related
`@opentelemetry` entries), fill in the change type/summary, save and commit the
generated changeset file alongside your package.json changes so the release
tooling will pick up the modification to `@trigger.dev/core`.
🧩 Analysis chain
🏁 Script executed:
# Check the actual state of the file
cat -n packages/core/package.json | sed -n '175,190p'Repository: triggerdotdev/trigger.dev
Length of output: 895
🏁 Script executed:
# Search for changesets in the repository
fd -t f 'changeset' packages/core/Repository: triggerdotdev/trigger.dev
Length of output: 51
🌐 Web query:
@opentelemetry/host-metrics 0.36.0 peer dependencies npm
💡 Result:
For @opentelemetry/host-metrics@0.36.0, the npm peer dependency is:
@opentelemetry/api:^1.3.0[1]
🌐 Web query:
@opentelemetry/host-metrics 0.36 sdk-metrics 2.0 compatibility support
💡 Result:
@opentelemetry/host-metrics@0.36.0 is an experimental (0.x) package from opentelemetry-js-contrib and is expected to pair with the stable SDK 1.10.x line (compatibility matrix: stable 1.10.x ↔ experimental 0.36.x). It is not in the expected-compatible set for @opentelemetry/sdk-metrics@2.0.x, which pairs with experimental 0.200.x packages. [1]
Also, as of the npm package metadata, @opentelemetry/host-metrics’ latest published version is still 0.36.0, not a 0.200.x-aligned release. [2]
So: No—@opentelemetry/host-metrics@0.36.0 is not considered compatible with @opentelemetry/sdk-metrics@2.0.x. Use a 1.10.x-era @opentelemetry/sdk-metrics with host-metrics 0.36.x, or wait for/upgrade to a 0.200.x-era host-metrics if/when it’s released. [1][3]
Sources:
[1] OpenTelemetry JS “Package Version Compatibility” matrix (opentelemetry-js repo)
[2] npm package page for @opentelemetry/host-metrics (shows 0.36.0 as current)
[3] OpenTelemetry blog: JS SDK 2.0 uses stable >=2.0.0 with unstable >=0.200.0 packages
🏁 Script executed:
# Also check if there's a CHANGELOG or package.json for host-metrics to understand sdk-metrics dependency
# And verify the full state of the changeset requirement
git log --oneline --all -S "@opentelemetry/host-metrics" | head -20Repository: triggerdotdev/trigger.dev
Length of output: 188
Replace @opentelemetry/host-metrics@^0.36.0 with version 0.200.x or later to match SDK 2.0.x compatibility.
@opentelemetry/host-metrics@0.36.0 is incompatible with @opentelemetry/sdk-metrics@2.0.1. The 0.36.x line is designed for SDK 1.10.x; SDK 2.0.x requires the 0.200.x unstable package line. Either upgrade host-metrics to 0.200.x (once released) or downgrade the SDK to 1.10.x. Additionally, pin the host-metrics version to an exact number (e.g., 0.200.0 or the compatible 0.36.0 if reverting the SDK) for consistency with other OTel dependencies, which are all pinned exact.
🤖 Prompt for AI Agents
In `@packages/core/package.json` around lines 179 - 185, Update the
`@opentelemetry/host-metrics` dependency to a 0.200.x (unstable 2.0 line) release
pinned to an exact version to match the `@opentelemetry/sdk-metrics`@2.0.1
compatibility; specifically replace "@opentelemetry/host-metrics": "^0.36.0"
with an exact 0.200.x version (e.g., "0.200.0") in package.json so the
host-metrics package line aligns with the SDK 2.0.x packages, or alternatively
revert `@opentelemetry/sdk-metrics` back to the 1.10.x line if you prefer to keep
host-metrics@0.36.0.
| NULL_SENTINEL, | ||
| } from "./utils/flattenAttributes.js"; | ||
| export { omit } from "./utils/omit.js"; | ||
| export { generateFriendlyId, fromFriendlyId } from "./isomorphic/friendlyId.js"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for changesets in the .changeset directory
echo "=== Checking for changesets ==="
find .changeset -name "*.md" -type f 2>/dev/null | head -20
echo ""
echo "=== Looking for changesets mentioning core or friendlyId ==="
fd -e md . .changeset/ 2>/dev/null --exec grep -l -i "friendlyId\|core\|generateFriendlyId" {} \; 2>/dev/null || echo "No matching changesets found"Repository: triggerdotdev/trigger.dev
Length of output: 658
🏁 Script executed:
#!/bin/bash
# Verify the friendlyId.js file exists and contains the exports
echo "=== Checking friendlyId.js file ==="
cat packages/core/src/v3/isomorphic/friendlyId.js 2>/dev/null | head -50Repository: triggerdotdev/trigger.dev
Length of output: 104
🏁 Script executed:
#!/bin/bash
# Check if these functions were previously exported from v3/index.ts
echo "=== Checking git history for this line ==="
cd packages/core/src/v3 && git log -p --follow -S "generateFriendlyId" -- index.ts 2>/dev/null | head -100 || echo "Git history check failed or not available"Repository: triggerdotdev/trigger.dev
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Check if the file exists at the expected path
echo "=== Checking if friendlyId.js exists ==="
ls -la packages/core/src/v3/isomorphic/ 2>/dev/null || echo "isomorphic directory not found"Repository: triggerdotdev/trigger.dev
Length of output: 721
🏁 Script executed:
#!/bin/bash
# Check the actual v3/index.ts file
echo "=== Checking v3/index.ts ==="
cat -n packages/core/src/v3/index.ts 2>/dev/null | grep -A 2 -B 2 "friendlyId" || echo "No friendlyId exports found"Repository: triggerdotdev/trigger.dev
Length of output: 328
🏁 Script executed:
#!/bin/bash
# List all changesets to see their contents
echo "=== Changesets content ==="
for file in .changeset/*.md; do
if [ "$file" != ".changeset/README.md" ]; then
echo "=== $file ==="
cat "$file"
echo ""
fi
done | head -200Repository: triggerdotdev/trigger.dev
Length of output: 2252
🏁 Script executed:
#!/bin/bash
# Check the friendlyId.ts file to confirm the functions exist
echo "=== Checking friendlyId.ts exports ==="
cat packages/core/src/v3/isomorphic/friendlyId.ts | head -50Repository: triggerdotdev/trigger.dev
Length of output: 1370
🏁 Script executed:
#!/bin/bash
# Check if there's a changeset that covers this specific change but uses different naming
echo "=== Checking all changesets for 'friendlyId' case-insensitive ==="
grep -r -i "friendlyid\|generateFriendlyId\|fromFriendlyId" .changeset/ 2>/dev/null || echo "No matches found in changesets"Repository: triggerdotdev/trigger.dev
Length of output: 165
🏁 Script executed:
#!/bin/bash
# Verify this is a new addition - check git diff or status
echo "=== Check git status for changes ==="
git status --short packages/core/src/v3/index.ts 2>/dev/null || echo "Not available"
echo ""
echo "=== Check git diff for this file ==="
git diff HEAD packages/core/src/v3/index.ts 2>/dev/null | head -30 || echo "Not available"Repository: triggerdotdev/trigger.dev
Length of output: 143
New public API exports — ensure a changeset is included.
Exporting generateFriendlyId and fromFriendlyId from @trigger.dev/core/v3 expands the public API surface. As per coding guidelines, modifications to public packages in packages/* require a changeset via pnpm run changeset:add.
🤖 Prompt for AI Agents
In `@packages/core/src/v3/index.ts` at line 52, You added new public exports
generateFriendlyId and fromFriendlyId (export line in index.ts) which expands
the package API; create and commit a changeset so release tooling and changelogs
capture this public change by running the repository's changeset flow (e.g., run
pnpm run changeset:add), select the appropriate version bump and provide a brief
summary mentioning generateFriendlyId and fromFriendlyId, then commit the
generated changeset file alongside your export change.
| public disable() { | ||
| unregisterGlobal(API_NAME); | ||
| this._runDisabled = true; | ||
| } | ||
|
|
||
| public setGlobalTaskContext(taskContext: TaskContext): boolean { | ||
| return registerGlobal(API_NAME, taskContext); | ||
| this._runDisabled = false; | ||
| return registerGlobal(API_NAME, taskContext, true); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all usages of disable() and isInsideTask to check if any caller
# depends on isInsideTask becoming false after disable()
echo "=== Callers of .disable() on TaskContextAPI ==="
rg -n --type=ts 'taskContext.*\.disable\(\)|\.disable\(\)' -A3 -B1
echo ""
echo "=== Usages of isInsideTask ==="
rg -n --type=ts '\.isInsideTask' -A2 -B1
echo ""
echo "=== Usages of isRunDisabled ==="
rg -n --type=ts '\.isRunDisabled' -A2 -B1
echo ""
echo "=== File: packages/core/src/v3/taskContext/index.ts (context) ==="
sed -n '1,50p' packages/core/src/v3/taskContext/index.ts
sed -n '100,120p' packages/core/src/v3/taskContext/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 7524
🏁 Script executed:
#!/bin/bash
echo "=== Context around isInsideTask usage in apiClient ==="
sed -n '1420,1450p' packages/core/src/v3/apiClient/index.ts
echo ""
echo "=== Check if isRunDisabled is used in apiClient ==="
rg -n 'isRunDisabled' packages/core/src/v3/apiClient/index.ts -A2 -B2Repository: triggerdotdev/trigger.dev
Length of output: 1002
Fix apiClient header injection logic to check isRunDisabled.
The behavioral change where disable() no longer unregisters the task context global introduces a bug: after disable() is called, isInsideTask remains true (the context is deliberately preserved between runs), but the run is actually disabled. The apiClient injects worker-specific headers (x-trigger-worker, x-trigger-engine-version) based solely on isInsideTask and will incorrectly inject these headers during disabled states (between runs).
Update the check at packages/core/src/v3/apiClient/index.ts:1431 to also verify that isRunDisabled is false:
if (taskContext.isInsideTask && !taskContext.isRunDisabled) {
headers["x-trigger-worker"] = "true";
headers["x-trigger-engine-version"] = "V2";
// ...
}This aligns with how other systems (e.g., otelProcessors) already handle the distinction between a registered context and an active run.
🤖 Prompt for AI Agents
In `@packages/core/src/v3/taskContext/index.ts` around lines 105 - 112, The
apiClient header injection currently checks only taskContext.isInsideTask and
therefore still injects worker headers when a run is disabled; update the
conditional in the api client (around the header injection logic that references
taskContext.isInsideTask) to also require taskContext.isRunDisabled === false
(e.g., if (taskContext.isInsideTask && !taskContext.isRunDisabled) { ... }),
ensuring disable() (which sets _runDisabled via disable()) prevents injection;
verify related symbols: disable(), setGlobalTaskContext(), isInsideTask,
isRunDisabled, and the header keys "x-trigger-worker"/"x-trigger-engine-version"
are guarded by the new condition.
| if (taskContext.isRunDisabled) { | ||
| // Between runs: keep environment/project/org/machine attrs, strip run-specific ones | ||
| contextAttrs = { | ||
| [SemanticInternalAttributes.ENVIRONMENT_ID]: ctx.environment.id, | ||
| [SemanticInternalAttributes.ENVIRONMENT_TYPE]: ctx.environment.type, | ||
| [SemanticInternalAttributes.ORGANIZATION_ID]: ctx.organization.id, | ||
| [SemanticInternalAttributes.PROJECT_ID]: ctx.project.id, | ||
| [SemanticInternalAttributes.MACHINE_PRESET_NAME]: ctx.machine?.name, | ||
| }; | ||
| } else { | ||
| // During a run: full context attrs | ||
| contextAttrs = { | ||
| [SemanticInternalAttributes.RUN_ID]: ctx.run.id, | ||
| [SemanticInternalAttributes.TASK_SLUG]: ctx.task.id, | ||
| [SemanticInternalAttributes.ATTEMPT_NUMBER]: ctx.attempt.number, | ||
| [SemanticInternalAttributes.ENVIRONMENT_ID]: ctx.environment.id, | ||
| [SemanticInternalAttributes.ORGANIZATION_ID]: ctx.organization.id, | ||
| [SemanticInternalAttributes.PROJECT_ID]: ctx.project.id, | ||
| [SemanticInternalAttributes.MACHINE_PRESET_NAME]: ctx.machine?.name, | ||
| [SemanticInternalAttributes.ENVIRONMENT_TYPE]: ctx.environment.type, | ||
| }; | ||
| } | ||
|
|
||
| if (taskContext.worker) { | ||
| contextAttrs[SemanticInternalAttributes.WORKER_ID] = taskContext.worker.id; | ||
| contextAttrs[SemanticInternalAttributes.WORKER_VERSION] = taskContext.worker.version; | ||
| } | ||
|
|
||
| if (!taskContext.isRunDisabled && ctx.run.tags?.length) { | ||
| contextAttrs[SemanticInternalAttributes.RUN_TAGS] = ctx.run.tags; | ||
| } |
There was a problem hiding this comment.
High-cardinality attributes injected into every metric data point.
The coding guidelines explicitly prohibit using UUIDs/IDs (envId, runId, projectId, organizationId), unbounded integers (attemptNumber), and free-form strings (taskSlug, runTags) as OTEL metric attributes because they cause cardinality explosion in Prometheus/Grafana.
Attributes that violate the guideline:
ENVIRONMENT_ID,ORGANIZATION_ID,PROJECT_ID,RUN_ID,WORKER_ID— all IDsTASK_SLUG— unbounded stringATTEMPT_NUMBER— unbounded integerRUN_TAGS— free-form string array
Only ENVIRONMENT_TYPE and MACHINE_PRESET_NAME appear to be low-cardinality.
Consider moving the high-cardinality identifiers to resource attributes (set once on the Resource object) rather than per-data-point attributes if they are needed for correlation, or drop them from metrics entirely and rely on trace/log correlation for drill-down. If these are intentionally scoped to internal infrastructure metrics (not user-facing), document that exception clearly.
As per coding guidelines: "Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)." Also based on learnings about OTEL metric attribute cardinality.
🤖 Prompt for AI Agents
In `@packages/core/src/v3/taskContext/otelProcessors.ts` around lines 143 - 173,
The code is injecting high-cardinality attributes into metric data via
contextAttrs (in the block using taskContext.isRunDisabled and the else branch)
which adds IDs and free-form values; remove ENVIRONMENT_ID, ORGANIZATION_ID,
PROJECT_ID, RUN_ID, WORKER_ID, TASK_SLUG, ATTEMPT_NUMBER, and RUN_TAGS from the
per-data-point metric attributes set on contextAttrs and instead either: 1) move
any needed identifiers to the Resource once (use the Resource API when
initializing telemetry) for correlation, or 2) drop them entirely and only keep
low-cardinality attributes (ENVIRONMENT_TYPE and MACHINE_PRESET_NAME) in
contextAttrs; update the logic that conditionally adds RUN_TAGS and worker
fields (the if (!taskContext.isRunDisabled && ctx.run.tags?.length) and the if
(taskContext.worker) blocks) to avoid attaching high-cardinality values to
metrics.
@opentelemetry/host-metrics) and Node.js runtime metrics (event loop utilization, event loop delay, heap usage)otel.metrics.getMeter()from@trigger.dev/sdkprettyFormat()for human-readable values, and per-schema time bucket thresholdsreferences/hello-world/src/trigger/metrics.ts) demonstrating CPU-intensive, memory-ramp, bursty workload, and custom metrics patternsWhat changed
Metrics collection (packages/core, packages/cli-v3)
TracingSDKnow sets up aMeterProviderwith aPeriodicExportingMetricReaderthat chains throughTaskContextMetricExporter(adds run context attributes) andBufferingMetricExporter(batches exports to reduce overhead)@opentelemetry/host-metricsfor process CPU, memory, and system-level metricsnodejsRuntimeMetrics.tsmodule usingperformance.eventLoopUtilization(),monitorEventLoopDelay(), andprocess.memoryUsage()to emit 6 observable gaugesotel.metricsfrom@trigger.dev/sdkso users can create counters, histograms, and gauges in their taskssystem.*metrics to reduce noise, keeps sending metrics between runs in warm workersMetrics ingestion (apps/webapp)
otel.v1.metrics.tsaccepts OTEL metric export requests (JSON and protobuf), converts to ClickHouse rows016_create_metrics_v1.sqlwith 10-second aggregation buckets, JSON attributes column, 30-day TTL, and materialized views for 1m/5m rollupsQuery engine (internal-packages/tsql, apps/webapp)
task_identifier,run_id,machine_name,worker_version, etc.) extracted from the JSON attributes columnprettyFormat()— TSQL function that annotates columns with format hints (bytes,percent,durationSeconds) for frontend rendering without changing the underlying dataReference project
references/hello-world/src/trigger/metrics.ts— 6 example tasks:cpu-intensive,memory-ramp,bursty-workload,sustained-workload,concurrent-load,custom-metricsTest plan
cpu-intensive,memory-ramp, andcustom-metricstasksSELECT DISTINCT metric_name FROM metrics_v1prettyFormatrenders correctly in chart tooltips and table cellssystem.*metrics but keepsprocess.*andnodejs.*