feat: introduce e2e benchmarking by alpe · Pull Request #3078 · evstack/ev-node

alpe · 2026-02-17T17:09:34Z

Initial workflow

Runs and publishes results when running on main only

… Actions workflow.

claude · 2026-02-17T17:09:55Z

Claude finished @alpe's task in 1m 30s —— View job

Code Review: feat: introduce e2e benchmarking

Gather context and read changed files
Review benchmark implementation
Review test helper refactoring
Review workflow file
Assess security, performance, and error handling
Post comprehensive review

Summary

Good foundation for E2E benchmarking with OTLP trace collection. The testing.TB refactor is clean and well-executed. A few issues worth addressing before merge, ranging from correctness concerns to minor style nits.

Issues

Correctness / Design

b.Loop() vs manual iteration index (test/e2e/evm_contract_bench_test.go:97-113)

The benchmark uses b.Loop() (Go 1.24+) but manages its own i counter alongside it to index into signedTxs. If b.N exceeds maxIter=1024, the require.Less(b, i, maxIter) assertion will fail the benchmark rather than report an error, which is confusing UX. Consider panicking or b.Fatalf with a clear message, or deriving i from a per-benchmark counter that wraps. More importantly, the current for b.Loop() idiom is intended to replace the classic for i := 0; i < b.N; i++ — mixing them by maintaining a separate i is fragile.

A simpler pattern: pre-build b.N transactions after b.ResetTimer() is called, or cap and document the limit clearly in the docstring.

Missing shutdown error propagation (test/e2e/evm_contract_bench_test.go:51)

defer collector.close() // nolint: errcheck // test only

chatton already noted this. The close() error from http.Server.Shutdown should at least be logged via b.Logf. The nolint suppression is fine for linting, but silently swallowing a shutdown error could hide port leak issues in CI.

waitForReceipt timeout is too short for a benchmark (test/e2e/evm_contract_bench_test.go:292)

require.Eventually(t, ..., 2*time.Second, 50*time.Millisecond, ...)

The E2E test in evm_contract_e2e_test.go uses a 15-second timeout for receipt polling. The benchmark uses 2 seconds. Under CI load or with slower block times, this will produce spurious failures rather than measuring latency. Either increase the timeout or surface it as a constant alongside DefaultTestTimeout.

Global globalNonce race (test/e2e/evm_test_common.go:467)

var globalNonce uint64 = 0

This package-level mutable state is reset in setupCommonEVMTest but is not protected by a mutex. If tests run in parallel (e.g., -parallel), there's a data race. Consider passing nonce state explicitly per test, or using sync/atomic.

OTLP collector has no ReadTimeout (test/e2e/evm_contract_bench_test.go:149)

c.server = &http.Server{Handler: mux}

The HTTP server has no timeouts set. A misbehaving ev-node could stall the benchmark indefinitely. Add at least ReadTimeout and WriteTimeout.

Error Handling

proto.Marshal error silently ignored (test/e2e/evm_contract_bench_test.go:193)

out, _ := proto.Marshal(resp)

If marshaling the empty response fails (unlikely, but possible), the handler sends an empty body with 200 OK. The node's OTLP exporter may then retry, causing duplicate span collection. Use require.NoError or at minimum log the error.

deployContract uses require.Eventually but ignores receipt status path (test/e2e/evm_contract_e2e_test.go:283-288)

The function verifies receipt.Status == 1 after the eventually, which is correct, but an EVM revert would pass the Eventually (receipt is non-nil) and then fail the status check with a confusing message. A receipt.Status check inside the Eventually predicate would give a clearer failure.

Code Quality

Magic numbers and hex literals (test/e2e/evm_contract_bench_test.go:73,83-84,91)

Gemini's suggestion is valid. The repeated hex 0x000...002a, gas limit 500000, and gas price 30000000000 appear identically in both evm_contract_bench_test.go and evm_contract_e2e_test.go. These should be constants, ideally shared from evm_test_common.go:

const (
    StoreValue42Hex = "0x000000000000000000000000000000000000000000000000000000000000002a"
    StoreTxGasLimit = 500_000
    DefaultGasPrice = 30_000_000_000
)

sort.Slice can use slices.SortFunc (test/e2e/evm_contract_bench_test.go:253)

Minor: sort.Slice is fine but slices.SortFunc is idiomatic for Go 1.21+.

Unused jwtSecret parameter in restartDAAndSequencer / restartDAAndSequencerLazy (test/e2e/evm_test_common.go:639,689)

Both functions accept a jwtSecret parameter but don't use it — they reconstruct the path filepath.Join(sequencerHome, "jwt-secret.hex") directly. The parameter should be removed or the path should be derived from it.

time.Sleep in restart helpers (test/e2e/evm_test_common.go:649,699)

time.Sleep(25 * time.Millisecond) between DA restart and sequencer start is a fragile timing hack. Consider polling the DA endpoint for readiness instead.

Security

Hardcoded private key (test/e2e/evm_test_common.go:209)

TestPrivateKey = "cece4f25ac74deb1468965160c7185e07dff413f23fcadb611b05ca37ab0a52e"

This is a test-only key for a local devnet, so the risk is low. However, if the chain ID or genesis config were accidentally reused against a real network, funds at this address would be at risk. Consider adding a comment explicitly stating this key controls a well-known test account and must never be used outside local devnets.

Weak passphrase (test/e2e/evm_test_common.go:211)

TestPassphrase = "secret"

Same rationale — acceptable for local tests, but worth a comment clarifying test-only scope.

Workflow

No benchtime flag in CI command (.github/workflows/benchmark.yml:30-31)

go test -tags evm -bench=. -benchmem -run='^$' -timeout=10m --evm-binary=...

Without -benchtime, Go uses the default of 1 second per benchmark. The b.Loop() pattern introduced in Go 1.24 controls iterations differently — double-check that the benchmark produces consistent output with the benchmark-action/github-action-benchmark parser when iteration counts vary.

Missing -v flag for benchmark output — the benchmark-action parser may need verbose output or specific format from go test -bench to store results correctly. Verify the output.txt format matches what the action expects.

auto-push: true pushes to the default gh-pages branch — confirm that branch exists and the contents: write permission is sufficient for the repo.

Positive Observations

The OTLP trace collection approach for breaking down latency by span is a nice touch for profiling.
The testing.TB refactor is clean throughout and doesn't introduce any regressions.
reth.NodeBuilder usage in SetupTestRethNode is a good improvement over the previous approach.
The pre-built signed transactions pattern avoids signing overhead in the timed loop.
The span aggregation report (printCollectedTraceReport) is well-structured and readable.

gemini-code-assist · 2026-02-17T17:09:56Z

Summary of Changes

Hello @alpe, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational end-to-end benchmarking framework for EVM contract interactions, specifically targeting the performance of store and retrieve operations. The primary goal is to establish a baseline for measuring latency and throughput in a realistic environment. To support this, existing test infrastructure has been generalized to accommodate both traditional tests and the new benchmarks, ensuring consistent utility across different testing paradigms.

Highlights

E2E Benchmarking Introduction: A new end-to-end benchmark (BenchmarkEvmContractRoundtrip) has been added to measure the performance of EVM contract store and retrieve operations against a real Reth node. This benchmark focuses on transaction submission, receipt waiting, and data verification.
Test Helper Refactoring: Numerous test helper functions across execution/evm/test/test_helpers.go, execution/evm/test_helpers.go, test/e2e/evm_contract_e2e_test.go, test/e2e/evm_test_common.go, and test/e2e/sut_helper.go have been refactored to accept testing.TB instead of *testing.T. This change enhances flexibility, allowing these helpers to be used seamlessly in both standard tests and benchmarks.
Enhanced Reth Node Setup: The SetupTestRethNode function now utilizes a reth.NodeBuilder for more structured node configuration and integrates zap logging, providing better visibility into Reth node operations during testing and benchmarking.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

execution/evm/test/test_helpers.go
- Imported go.uber.org/zap and go.uber.org/zap/zaptest for logging.
- Updated getTestScopedDockerSetup, SetupTestRethNode, and waitForRethContainer functions to accept testing.TB instead of *testing.T.
- Modified SetupTestRethNode to use reth.NodeBuilder and integrate zap logger based on verbose testing flag.
execution/evm/test_helpers.go
- Updated GetRandomTransaction function to accept testing.TB instead of *testing.T.
test/e2e/evm_contract_bench_test.go
- Added new file evm_contract_bench_test.go.
- Implemented BenchmarkEvmContractRoundtrip for E2E benchmarking of EVM contract store/retrieve operations.
- Included waitForReceipt helper function for polling transaction receipts.
test/e2e/evm_contract_e2e_test.go
- Updated setupTestSequencer and deployContract functions to accept testing.TB instead of *testing.T.
test/e2e/evm_test_common.go
- Updated mustGetAvailablePort, createPassphraseFile, createJWTSecretFile, getNodeP2PAddress, setupSequencerNode, setupSequencerNodeLazy, setupFullNode, submitTransactionAndGetBlockNumber, setupCommonEVMTest, checkBlockInfoAt, setupSequencerOnlyTest, restartDAAndSequencer, restartDAAndSequencerLazy, restartSequencerNode, and verifyNoBlockProduction functions to accept testing.TB instead of *testing.T.
test/e2e/sut_helper.go
- Updated SystemUnderTest struct field t type from *testing.T to testing.TB.
- Updated NewSystemUnderTest constructor to accept testing.TB.
- Updated AwaitNodeUp, AwaitNodeLive, AwaitNBlocks, MustCopyFile, and NodeID methods to accept testing.TB instead of *testing.T.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/benchmark.yml

Activity

The pull request was created with the title 'feat: introduce e2e benchmarking' and the description 'Initial workflow'.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-17T17:09:58Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Feb 18, 2026, 1:50 PM

codecov · 2026-02-17T17:14:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.14%. Comparing base (d4394f6) to head (b3b0881).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3078      +/-   ##
==========================================
+ Coverage   61.11%   61.14%   +0.03%     
==========================================
  Files         113      113              
  Lines       11444    11444              
==========================================
+ Hits         6994     6998       +4     
+ Misses       3661     3657       -4     
  Partials      789      789

Flag	Coverage Δ
combined	`61.14% <ø> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gemini-code-assist

Code Review

This pull request introduces an end-to-end benchmark for EVM contract roundtrip latency and refactors existing test helper functions to accept testing.TB for improved reusability. A security audit identified two medium-severity vulnerabilities related to hardcoded secrets (a private key and a passphrase) within the test suite. While these are confined to test files, they represent a significant security anti-pattern. Additionally, suggestions were made for the new benchmark file to improve maintainability by using constants for magic values and to ensure compatibility with older Go versions. Remediation of the hardcoded secrets is recommended to improve the security posture and adhere to best practices.

gemini-code-assist · 2026-02-17T17:15:04Z

test/e2e/evm_contract_bench_test.go

+	defer cleanup()
+
+	ctx := b.Context()
+	privateKey, err := crypto.HexToECDSA(TestPrivateKey)


A hardcoded private key (TestPrivateKey) is used. Hardcoding secrets, even for testing, is a significant security risk. This key could be inadvertently used in production-like environments or on public testnets, leading to compromised assets. It also sets a poor security precedent. Keys should be loaded from environment variables or a secrets manager, or generated dynamically for tests.

gemini-code-assist · 2026-02-17T17:15:04Z

test/e2e/evm_test_common.go

 // - genesisHash: Hash of the genesis block for chain validation
 // - endpoints: TestEndpoints struct containing unique port assignments
-func setupSequencerNode(t *testing.T, sut *SystemUnderTest, sequencerHome, jwtSecret, genesisHash string, endpoints *TestEndpoints) {
+func setupSequencerNode(t testing.TB, sut *SystemUnderTest, sequencerHome, jwtSecret, genesisHash string, endpoints *TestEndpoints) {


A weak, hardcoded passphrase ("secret") is used for the test sequencer's key file. Hardcoding credentials, especially weak ones, is a poor security practice that can be replicated in production. Passphrases should be loaded from a secure source like environment variables or a secrets manager.

gemini-code-assist · 2026-02-17T17:15:04Z

test/e2e/evm_contract_bench_test.go

+	storeData, err := hexutil.Decode("0x000000000000000000000000000000000000000000000000000000000000002a")
+	require.NoError(b, err)
+
+	const maxIter = 1024
+	signedTxs := make([]*types.Transaction, maxIter)
+	for i := range maxIter {
+		tx := types.NewTx(&types.LegacyTx{
+			Nonce:    nonce + uint64(i),
+			To:       &contractAddr,
+			Value:    big.NewInt(0),
+			Gas:      500000,
+			GasPrice: big.NewInt(30000000000),
+			Data:     storeData,
+		})
+		signedTxs[i], err = types.SignTx(tx, signer, privateKey)
+		require.NoError(b, err)
+	}
+
+	expected := common.HexToHash("0x000000000000000000000000000000000000000000000000000000000000002a").Bytes()


The benchmark uses several magic numbers and strings for transaction parameters and expected values. To improve readability and maintainability, consider defining these as constants at the top of the file.

For example:

const ( benchGasLimit = 500000 benchGasPrice = 30000000000 storedValueHex = "0x000000000000000000000000000000000000000000000000000000000000002a" )

Then you can use these constants in the code:

storeData, err := hexutil.Decode(storedValueHex) // ... Gas: benchGasLimit, GasPrice: big.NewInt(benchGasPrice), // ... expected := common.HexToHash(storedValueHex).Bytes()

gemini-code-assist · 2026-02-17T17:15:05Z

test/e2e/evm_contract_bench_test.go

+
+	const maxIter = 1024
+	signedTxs := make([]*types.Transaction, maxIter)
+	for i := range maxIter {


The loop for i := range maxIter uses a new feature from Go 1.22. If this project needs to maintain compatibility with older Go versions, this should be changed to a traditional for loop to avoid breaking the build on older toolchains.

If Go 1.22 is the minimum required version for the project, then this is fine.

Suggested change

for i := range maxIter {

for i := 0; i < maxIter; i++ {

chatton

I know the PR is still in draft so just leaving some thoughts! nice work so far

chatton · 2026-02-18T09:08:49Z

test/e2e/evm_contract_bench_test.go

+// otlpCollector is a lightweight OTLP/HTTP receiver that collects trace spans
+// in memory. It serves the /v1/traces endpoint that the node's OTLP exporter
+// posts protobuf-encoded ExportTraceServiceRequest messages to.
+type otlpCollector struct {


this is a really nice idea, I imagine this would be used quite a lot in these tests, it's probably worth pulling this out into a separate file file / package to be imported into these tests

It definitively makes sense to pull this out when it is used by a second benchmark.

test/e2e/evm_contract_bench_test.go

chatton · 2026-02-18T09:29:17Z

test/e2e/evm_contract_bench_test.go

+//
+//	PATH="/path/to/binaries:$PATH" go test -tags evm \
+//	  -bench BenchmarkEvmContractRoundtrip -benchmem -benchtime=5x \
+//	  -run='^$' -timeout=10m -v --evm-binary=/path/to/evm .


ah we need to specify the evm binary because we're mixing and matching docker vs local binaries right?

If we are going to create a new set of tests, I would say it's worth while to just do everything via docker for these, we likely won't run them as frequently so I don't think them being slightly heavier is a bad thing in that regard.

It can also be quite confusing having a mix WDYT?

As I said in the standup. Standalone is my personal preference but I would go with the majority. Can we get this in and refactor in a new PR?

This reverts commit 3b74695.

This reverts commit c7a5914.

chatton

LGTM happy to get this merged in as a jumping off point 💪

chatton · 2026-02-18T13:58:38Z

execution/evm/test/test_helpers.go

+	n, err := new(reth.NodeBuilder).
+		WithTestName(testName).
+		WithLogger(logger).
+		WithImage(reth.DefaultImage()).
+		WithBin("ev-reth").


I don't think any of this is required, the default behaviour we get from NewNodeBuilderWithTestName should be fine, if we want to set a logger we can just chain the logger after directly.

feat: introduce EVM contract benchmarking with new tests and a GitHub…

aeaf7fa

… Actions workflow.

github-actions bot assigned alpe Feb 17, 2026

gemini-code-assist bot reviewed Feb 17, 2026

View reviewed changes

alpe added 3 commits February 17, 2026 19:08

Capture otel

c322838

Go mod tidy

b7ed06a

Enable benchmark on PRs - revert before merge

c7a5914

chatton reviewed Feb 18, 2026

View reviewed changes

alpe added 2 commits February 18, 2026 10:49

Push benchmark results on PR - revert later

3b74695

Review feedback

d2512dd

alpe marked this pull request as ready for review February 18, 2026 13:33

alpe added 2 commits February 18, 2026 14:49

Revert "Push benchmark results on PR - revert later"

e747d75

This reverts commit 3b74695.

Revert "Enable benchmark on PRs - revert before merge"

b3b0881

This reverts commit c7a5914.

chatton approved these changes Feb 18, 2026

View reviewed changes

julienrbrt approved these changes Feb 18, 2026

View reviewed changes

Conversation

alpe commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: feat: introduce e2e benchmarking

Summary

Issues

Correctness / Design

Error Handling

Code Quality

Security

Workflow

Positive Observations

Uh oh!

gemini-code-assist bot commented Feb 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatton left a comment

Choose a reason for hiding this comment

Uh oh!

chatton Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

alpe Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatton Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

alpe Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatton left a comment

Choose a reason for hiding this comment

Uh oh!

chatton Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

alpe commented Feb 17, 2026 •

edited

Loading

claude bot commented Feb 17, 2026 •

edited

Loading

github-actions bot commented Feb 17, 2026 •

edited

Loading

codecov bot commented Feb 17, 2026 •

edited

Loading