refactor(ci): migrate to updated slo action #748

polRk · 2026-01-16T13:24:26Z

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Other information

Copilot

Pull request overview

This PR refactors the SLO (Service Level Objective) testing infrastructure to migrate from Prometheus Pushgateway to OpenTelemetry OTLP/HTTP metrics export. The changes modernize the metrics collection approach and update the CI workflows to run baseline comparison tests.

Changes:

Migrated from Prometheus Pushgateway + prometheus-client to OpenTelemetry OTLP/HTTP metrics export
Replaced ratelimiter dependency with a custom SyncRateLimiter implementation
Updated GitHub Actions workflows to run current vs. baseline workload comparison tests
Removed local playground configurations (compose files) in favor of external infrastructure

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/slo/src/core/metrics.py	Complete rewrite of metrics implementation using OpenTelemetry SDK instead of prometheus-client
tests/slo/src/jobs/base.py	Added custom SyncRateLimiter class to replace ratelimiter dependency
tests/slo/src/jobs/topic_jobs.py	Updated to use SyncRateLimiter and otlp_endpoint parameter
tests/slo/src/jobs/table_jobs.py	Updated to use SyncRateLimiter and otlp_endpoint parameter
tests/slo/src/jobs/async_topic_jobs.py	Updated for OTLP metrics, improved type hints and error handling
tests/slo/src/runners/base.py	Added type hints for driver property
tests/slo/src/runners/topic_runner.py	Added driver assertions and updated to use otlp_endpoint
tests/slo/src/runners/table_runner.py	Added driver assertions and updated to use otlp_endpoint
tests/slo/src/options.py	Renamed --prom-pgw to --otlp-endpoint with updated help text
tests/slo/requirements.txt	Replaced prometheus-client and ratelimiter with OpenTelemetry packages
tests/slo/Dockerfile	New Dockerfile for packaging the SLO workload as a container
tests/slo/slo_runner.sh	Complete rewrite as a bash script to run workloads in Docker containers
tests/slo/README.md	Updated documentation for new OTLP-based approach
tests/slo/src/summary.py	Deleted custom Summary implementation (no longer needed)
tests/slo/playground/*	Removed local playground configs (moved to external repo)
.github/workflows/slo.yml	Completely rewritten to run current vs. baseline comparisons in parallel
.github/workflows/slo-report.yml	Updated action version and added SLO label removal job

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-16T13:38:48Z

.github/workflows/slo.yml

-
    name: Run YDB SLO Tests
-    runs-on: ubuntu-latest
+    runs-on: "large-runner"


The runs-on value 'ubuntu-latest' conflicts with the comment on line 37 in the diff which shows 'large-runner'. The actual runner type in the code doesn't match what appears to be intended based on the diff context. If large-runner is required for SLO tests, this should be 'large-runner', not 'ubuntu-latest'.

Suggested change

runs-on: "large-runner"

runs-on: ubuntu-latest

Copilot · 2026-01-16T13:38:48Z

tests/slo/src/jobs/async_topic_jobs.py

+        async with self.aio_driver.topic_client.writer(
            self.args.path,
-            codec=ydb.TopicCodec.GZIP,
+            codec=getattr(ydb, "PublicCodec", ydb.TopicCodec).GZIP,


The use of getattr with a fallback to access codecs is fragile and unclear. Consider documenting why this fallback exists or using a more explicit version check if this is for backward compatibility. The attribute name 'PublicCodec' suggests a newer API, but without documentation, maintainability suffers.

Copilot · 2026-01-16T13:38:49Z

tests/slo/src/core/metrics.py

        pass

+    @abstractmethod
+    def stop(self, labels, start_time: float, attempts: int = 1, error: Exception = None) -> None:


The error parameter should be typed as Optional[Exception] rather than Exception = None. Using None as a default with a non-optional type annotation is inconsistent and can confuse type checkers.

Copilot · 2026-01-16T13:38:49Z

tests/slo/src/core/metrics.py


-    def stop(self, labels, start_time, attempts=1, error=None):
-        duration = time.time() - start_time
+    def stop(self, labels, start_time: float, attempts: int = 1, error: Exception = None) -> None:


The error parameter should be typed as Optional[Exception] rather than Exception = None. Using None as a default with a non-optional type annotation is inconsistent and can confuse type checkers.

Copilot · 2026-01-16T13:38:49Z

tests/slo/src/core/metrics.py

+        self._pending.add(1, attributes={"operation_type": labels_t[0]})
+        return time.time()
+
+    def stop(self, labels, start_time: float, attempts: int = 1, error: Exception = None) -> None:


The error parameter should be typed as Optional[Exception] rather than Exception = None. Using None as a default with a non-optional type annotation is inconsistent and can confuse type checkers.

Suggested change

def stop(self, labels, start_time: float, attempts: int = 1, error: Exception = None) -> None:

def stop(self, labels, start_time: float, attempts: int = 1, error: Optional[Exception] = None) -> None:

Copilot · 2026-01-16T13:38:49Z

.github/workflows/slo.yml


-      - name: Run SLO Tests
+      - name: Run SLO Tests (current + baseline in parallel)
+        timeout-minutes: 15


The timeout of 15 minutes is insufficient for SLO tests with a default duration of 600 seconds (10 minutes). The timeout should account for workload startup, execution (10 minutes), and shutdown grace period. Consider increasing to at least 20 minutes to prevent false failures.

Suggested change

timeout-minutes: 15

timeout-minutes: 20

Copilot · 2026-01-16T13:38:49Z

tests/slo/requirements.txt

 quantile-estimator==0.1.2
+
+# OpenTelemetry (OTLP/HTTP exporter)
+# NOTE: OpenTelemetry 1.39.1 requires Python >= 3.9.


The comment states OpenTelemetry 1.39.1 requires Python >= 3.9, but the Dockerfile uses Python 3.11. Consider documenting the minimum Python version requirement more prominently, or adding a version check to ensure compatibility.

Suggested change

# NOTE: OpenTelemetry 1.39.1 requires Python >= 3.9.

# NOTE: These SLO tests require Python >= 3.9 (OpenTelemetry 1.39.1) and are tested with Python 3.11.

Copilot · 2026-01-16T13:38:50Z

tests/slo/src/jobs/base.py

+    def __init__(self, min_interval_s: float) -> None:
+        self._min_interval_s = max(0.0, float(min_interval_s))
+        self._lock = threading.Lock()
+        self._next_allowed_ts = 0.0  # monotonic timestamp


The comment says 'monotonic timestamp' but the initialization uses time.time() semantics (wall clock time). While time.monotonic() is used later in enter, the initial value of 0.0 could be confusing. Consider clarifying this is initialized to 0 to allow the first call immediately.

Suggested change

self._next_allowed_ts = 0.0 # monotonic timestamp

# Next allowed time in the time.monotonic() time base; initialized to 0.0

# so that the first acquisition is allowed immediately.

self._next_allowed_ts = 0.0

Copilot · 2026-01-16T13:38:50Z

.github/workflows/slo-report.yml

@@ -1,8 +1,8 @@
-name: SLO Report
+name: slo-report


The workflow name 'slo-report' uses lowercase with hyphen, which is inconsistent with the original 'SLO Report' that used title case. While not a functional issue, maintaining consistent naming conventions across workflows improves readability.

Suggested change

name: slo-report

name: SLO Report

refactor(ci): migrate to updated slo action

a313669

Copilot AI review requested due to automatic review settings January 16, 2026 13:24

runs on larger runner

b68a0c5

Copilot started reviewing on behalf of polRk January 16, 2026 13:27 View session

polRk added the SLO label Jan 16, 2026

Copilot AI reviewed Jan 16, 2026

View reviewed changes

vgvoleg approved these changes Jan 16, 2026

View reviewed changes

vgvoleg merged commit 7ea41c1 into main Jan 16, 2026
23 of 25 checks passed

vgvoleg deleted the update-slo-action branch January 16, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(ci): migrate to updated slo action #748

refactor(ci): migrate to updated slo action #748

Uh oh!

polRk commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def stop(self, labels, start_time: float, attempts: int = 1, error: Exception = None) -> None:
	def stop(self, labels, start_time: float, attempts: int = 1, error: Optional[Exception] = None) -> None:

	# NOTE: OpenTelemetry 1.39.1 requires Python >= 3.9.
	# NOTE: These SLO tests require Python >= 3.9 (OpenTelemetry 1.39.1) and are tested with Python 3.11.

refactor(ci): migrate to updated slo action #748

refactor(ci): migrate to updated slo action #748

Uh oh!

Conversation

polRk commented Jan 16, 2026

Pull request type

What is the current behavior?

What is the new behavior?

Other information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants