Skip to content

feat: introduce e2e benchmarking #3078

Open
alpe wants to merge 8 commits intomainfrom
alex/bench_e2e_start
Open

feat: introduce e2e benchmarking #3078
alpe wants to merge 8 commits intomainfrom
alex/bench_e2e_start

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 17, 2026

Initial workflow

Runs and publishes results when running on main only

@claude
Copy link
Contributor

claude bot commented Feb 17, 2026

Claude finished @alpe's task in 1m 30s —— View job


Code Review: feat: introduce e2e benchmarking

  • Gather context and read changed files
  • Review benchmark implementation
  • Review test helper refactoring
  • Review workflow file
  • Assess security, performance, and error handling
  • Post comprehensive review

Summary

Good foundation for E2E benchmarking with OTLP trace collection. The testing.TB refactor is clean and well-executed. A few issues worth addressing before merge, ranging from correctness concerns to minor style nits.


Issues

Correctness / Design

b.Loop() vs manual iteration index (test/e2e/evm_contract_bench_test.go:97-113)

The benchmark uses b.Loop() (Go 1.24+) but manages its own i counter alongside it to index into signedTxs. If b.N exceeds maxIter=1024, the require.Less(b, i, maxIter) assertion will fail the benchmark rather than report an error, which is confusing UX. Consider panicking or b.Fatalf with a clear message, or deriving i from a per-benchmark counter that wraps. More importantly, the current for b.Loop() idiom is intended to replace the classic for i := 0; i < b.N; i++ — mixing them by maintaining a separate i is fragile.

A simpler pattern: pre-build b.N transactions after b.ResetTimer() is called, or cap and document the limit clearly in the docstring.

Missing shutdown error propagation (test/e2e/evm_contract_bench_test.go:51)

defer collector.close() // nolint: errcheck // test only

chatton already noted this. The close() error from http.Server.Shutdown should at least be logged via b.Logf. The nolint suppression is fine for linting, but silently swallowing a shutdown error could hide port leak issues in CI.

waitForReceipt timeout is too short for a benchmark (test/e2e/evm_contract_bench_test.go:292)

require.Eventually(t, ..., 2*time.Second, 50*time.Millisecond, ...)

The E2E test in evm_contract_e2e_test.go uses a 15-second timeout for receipt polling. The benchmark uses 2 seconds. Under CI load or with slower block times, this will produce spurious failures rather than measuring latency. Either increase the timeout or surface it as a constant alongside DefaultTestTimeout.

Global globalNonce race (test/e2e/evm_test_common.go:467)

var globalNonce uint64 = 0

This package-level mutable state is reset in setupCommonEVMTest but is not protected by a mutex. If tests run in parallel (e.g., -parallel), there's a data race. Consider passing nonce state explicitly per test, or using sync/atomic.

OTLP collector has no ReadTimeout (test/e2e/evm_contract_bench_test.go:149)

c.server = &http.Server{Handler: mux}

The HTTP server has no timeouts set. A misbehaving ev-node could stall the benchmark indefinitely. Add at least ReadTimeout and WriteTimeout.


Error Handling

proto.Marshal error silently ignored (test/e2e/evm_contract_bench_test.go:193)

out, _ := proto.Marshal(resp)

If marshaling the empty response fails (unlikely, but possible), the handler sends an empty body with 200 OK. The node's OTLP exporter may then retry, causing duplicate span collection. Use require.NoError or at minimum log the error.

deployContract uses require.Eventually but ignores receipt status path (test/e2e/evm_contract_e2e_test.go:283-288)

The function verifies receipt.Status == 1 after the eventually, which is correct, but an EVM revert would pass the Eventually (receipt is non-nil) and then fail the status check with a confusing message. A receipt.Status check inside the Eventually predicate would give a clearer failure.


Code Quality

Magic numbers and hex literals (test/e2e/evm_contract_bench_test.go:73,83-84,91)

Gemini's suggestion is valid. The repeated hex 0x000...002a, gas limit 500000, and gas price 30000000000 appear identically in both evm_contract_bench_test.go and evm_contract_e2e_test.go. These should be constants, ideally shared from evm_test_common.go:

const (
    StoreValue42Hex = "0x000000000000000000000000000000000000000000000000000000000000002a"
    StoreTxGasLimit = 500_000
    DefaultGasPrice = 30_000_000_000
)

sort.Slice can use slices.SortFunc (test/e2e/evm_contract_bench_test.go:253)

Minor: sort.Slice is fine but slices.SortFunc is idiomatic for Go 1.21+.

Unused jwtSecret parameter in restartDAAndSequencer / restartDAAndSequencerLazy (test/e2e/evm_test_common.go:639,689)

Both functions accept a jwtSecret parameter but don't use it — they reconstruct the path filepath.Join(sequencerHome, "jwt-secret.hex") directly. The parameter should be removed or the path should be derived from it.

time.Sleep in restart helpers (test/e2e/evm_test_common.go:649,699)

time.Sleep(25 * time.Millisecond) between DA restart and sequencer start is a fragile timing hack. Consider polling the DA endpoint for readiness instead.


Security

Hardcoded private key (test/e2e/evm_test_common.go:209)

TestPrivateKey = "cece4f25ac74deb1468965160c7185e07dff413f23fcadb611b05ca37ab0a52e"

This is a test-only key for a local devnet, so the risk is low. However, if the chain ID or genesis config were accidentally reused against a real network, funds at this address would be at risk. Consider adding a comment explicitly stating this key controls a well-known test account and must never be used outside local devnets.

Weak passphrase (test/e2e/evm_test_common.go:211)

TestPassphrase = "secret"

Same rationale — acceptable for local tests, but worth a comment clarifying test-only scope.


Workflow

No benchtime flag in CI command (.github/workflows/benchmark.yml:30-31)

go test -tags evm -bench=. -benchmem -run='^$' -timeout=10m --evm-binary=...

Without -benchtime, Go uses the default of 1 second per benchmark. The b.Loop() pattern introduced in Go 1.24 controls iterations differently — double-check that the benchmark produces consistent output with the benchmark-action/github-action-benchmark parser when iteration counts vary.

Missing -v flag for benchmark output — the benchmark-action parser may need verbose output or specific format from go test -bench to store results correctly. Verify the output.txt format matches what the action expects.

auto-push: true pushes to the default gh-pages branch — confirm that branch exists and the contents: write permission is sufficient for the repo.


Positive Observations

  • The OTLP trace collection approach for breaking down latency by span is a nice touch for profiling.
  • The testing.TB refactor is clean throughout and doesn't introduce any regressions.
  • reth.NodeBuilder usage in SetupTestRethNode is a good improvement over the previous approach.
  • The pre-built signed transactions pattern avoids signing overhead in the timed loop.
  • The span aggregation report (printCollectedTraceReport) is well-structured and readable.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational end-to-end benchmarking framework for EVM contract interactions, specifically targeting the performance of store and retrieve operations. The primary goal is to establish a baseline for measuring latency and throughput in a realistic environment. To support this, existing test infrastructure has been generalized to accommodate both traditional tests and the new benchmarks, ensuring consistent utility across different testing paradigms.

Highlights

  • E2E Benchmarking Introduction: A new end-to-end benchmark (BenchmarkEvmContractRoundtrip) has been added to measure the performance of EVM contract store and retrieve operations against a real Reth node. This benchmark focuses on transaction submission, receipt waiting, and data verification.
  • Test Helper Refactoring: Numerous test helper functions across execution/evm/test/test_helpers.go, execution/evm/test_helpers.go, test/e2e/evm_contract_e2e_test.go, test/e2e/evm_test_common.go, and test/e2e/sut_helper.go have been refactored to accept testing.TB instead of *testing.T. This change enhances flexibility, allowing these helpers to be used seamlessly in both standard tests and benchmarks.
  • Enhanced Reth Node Setup: The SetupTestRethNode function now utilizes a reth.NodeBuilder for more structured node configuration and integrates zap logging, providing better visibility into Reth node operations during testing and benchmarking.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • execution/evm/test/test_helpers.go
    • Imported go.uber.org/zap and go.uber.org/zap/zaptest for logging.
    • Updated getTestScopedDockerSetup, SetupTestRethNode, and waitForRethContainer functions to accept testing.TB instead of *testing.T.
    • Modified SetupTestRethNode to use reth.NodeBuilder and integrate zap logger based on verbose testing flag.
  • execution/evm/test_helpers.go
    • Updated GetRandomTransaction function to accept testing.TB instead of *testing.T.
  • test/e2e/evm_contract_bench_test.go
    • Added new file evm_contract_bench_test.go.
    • Implemented BenchmarkEvmContractRoundtrip for E2E benchmarking of EVM contract store/retrieve operations.
    • Included waitForReceipt helper function for polling transaction receipts.
  • test/e2e/evm_contract_e2e_test.go
    • Updated setupTestSequencer and deployContract functions to accept testing.TB instead of *testing.T.
  • test/e2e/evm_test_common.go
    • Updated mustGetAvailablePort, createPassphraseFile, createJWTSecretFile, getNodeP2PAddress, setupSequencerNode, setupSequencerNodeLazy, setupFullNode, submitTransactionAndGetBlockNumber, setupCommonEVMTest, checkBlockInfoAt, setupSequencerOnlyTest, restartDAAndSequencer, restartDAAndSequencerLazy, restartSequencerNode, and verifyNoBlockProduction functions to accept testing.TB instead of *testing.T.
  • test/e2e/sut_helper.go
    • Updated SystemUnderTest struct field t type from *testing.T to testing.TB.
    • Updated NewSystemUnderTest constructor to accept testing.TB.
    • Updated AwaitNodeUp, AwaitNodeLive, AwaitNBlocks, MustCopyFile, and NodeID methods to accept testing.TB instead of *testing.T.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/benchmark.yml
Activity
  • The pull request was created with the title 'feat: introduce e2e benchmarking' and the description 'Initial workflow'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 17, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 18, 2026, 1:50 PM

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.14%. Comparing base (d4394f6) to head (b3b0881).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3078      +/-   ##
==========================================
+ Coverage   61.11%   61.14%   +0.03%     
==========================================
  Files         113      113              
  Lines       11444    11444              
==========================================
+ Hits         6994     6998       +4     
+ Misses       3661     3657       -4     
  Partials      789      789              
Flag Coverage Δ
combined 61.14% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an end-to-end benchmark for EVM contract roundtrip latency and refactors existing test helper functions to accept testing.TB for improved reusability. A security audit identified two medium-severity vulnerabilities related to hardcoded secrets (a private key and a passphrase) within the test suite. While these are confined to test files, they represent a significant security anti-pattern. Additionally, suggestions were made for the new benchmark file to improve maintainability by using constants for magic values and to ensure compatibility with older Go versions. Remediation of the hardcoded secrets is recommended to improve the security posture and adhere to best practices.

defer cleanup()

ctx := b.Context()
privateKey, err := crypto.HexToECDSA(TestPrivateKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

A hardcoded private key (TestPrivateKey) is used. Hardcoding secrets, even for testing, is a significant security risk. This key could be inadvertently used in production-like environments or on public testnets, leading to compromised assets. It also sets a poor security precedent. Keys should be loaded from environment variables or a secrets manager, or generated dynamically for tests.

// - genesisHash: Hash of the genesis block for chain validation
// - endpoints: TestEndpoints struct containing unique port assignments
func setupSequencerNode(t *testing.T, sut *SystemUnderTest, sequencerHome, jwtSecret, genesisHash string, endpoints *TestEndpoints) {
func setupSequencerNode(t testing.TB, sut *SystemUnderTest, sequencerHome, jwtSecret, genesisHash string, endpoints *TestEndpoints) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

A weak, hardcoded passphrase ("secret") is used for the test sequencer's key file. Hardcoding credentials, especially weak ones, is a poor security practice that can be replicated in production. Passphrases should be loaded from a secure source like environment variables or a secrets manager.

Comment on lines +50 to +68
storeData, err := hexutil.Decode("0x000000000000000000000000000000000000000000000000000000000000002a")
require.NoError(b, err)

const maxIter = 1024
signedTxs := make([]*types.Transaction, maxIter)
for i := range maxIter {
tx := types.NewTx(&types.LegacyTx{
Nonce: nonce + uint64(i),
To: &contractAddr,
Value: big.NewInt(0),
Gas: 500000,
GasPrice: big.NewInt(30000000000),
Data: storeData,
})
signedTxs[i], err = types.SignTx(tx, signer, privateKey)
require.NoError(b, err)
}

expected := common.HexToHash("0x000000000000000000000000000000000000000000000000000000000000002a").Bytes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The benchmark uses several magic numbers and strings for transaction parameters and expected values. To improve readability and maintainability, consider defining these as constants at the top of the file.

For example:

const (
	benchGasLimit  = 500000
	benchGasPrice  = 30000000000
	storedValueHex = "0x000000000000000000000000000000000000000000000000000000000000002a"
)

Then you can use these constants in the code:

storeData, err := hexutil.Decode(storedValueHex)
// ...
Gas:      benchGasLimit,
GasPrice: big.NewInt(benchGasPrice),
// ...
expected := common.HexToHash(storedValueHex).Bytes()


const maxIter = 1024
signedTxs := make([]*types.Transaction, maxIter)
for i := range maxIter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop for i := range maxIter uses a new feature from Go 1.22. If this project needs to maintain compatibility with older Go versions, this should be changed to a traditional for loop to avoid breaking the build on older toolchains.

If Go 1.22 is the minimum required version for the project, then this is fine.

Suggested change
for i := range maxIter {
for i := 0; i < maxIter; i++ {

Copy link
Contributor

@chatton chatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the PR is still in draft so just leaving some thoughts! nice work so far

// otlpCollector is a lightweight OTLP/HTTP receiver that collects trace spans
// in memory. It serves the /v1/traces endpoint that the node's OTLP exporter
// posts protobuf-encoded ExportTraceServiceRequest messages to.
type otlpCollector struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really nice idea, I imagine this would be used quite a lot in these tests, it's probably worth pulling this out into a separate file file / package to be imported into these tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitively makes sense to pull this out when it is used by a second benchmark.

//
// PATH="/path/to/binaries:$PATH" go test -tags evm \
// -bench BenchmarkEvmContractRoundtrip -benchmem -benchtime=5x \
// -run='^$' -timeout=10m -v --evm-binary=/path/to/evm .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah we need to specify the evm binary because we're mixing and matching docker vs local binaries right?

If we are going to create a new set of tests, I would say it's worth while to just do everything via docker for these, we likely won't run them as frequently so I don't think them being slightly heavier is a bad thing in that regard.

It can also be quite confusing having a mix WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in the standup. Standalone is my personal preference but I would go with the majority. Can we get this in and refactor in a new PR?

@alpe alpe marked this pull request as ready for review February 18, 2026 13:33
Copy link
Contributor

@chatton chatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM happy to get this merged in as a jumping off point 💪

Comment on lines +75 to +79
n, err := new(reth.NodeBuilder).
WithTestName(testName).
WithLogger(logger).
WithImage(reth.DefaultImage()).
WithBin("ev-reth").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of this is required, the default behaviour we get from NewNodeBuilderWithTestName should be fine, if we want to set a logger we can just chain the logger after directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments