This repository provides a standalone PromSketch version, which scrapes samples from Prometheus exporters, caches rule queries as intermediate sketches in PromSketch, and forwards all raw samples to Prometheus for backup.
- Create and activate a Python virtual environment, then install the required dependencies.
python3 -m venv .venv source .venv/bin/activate pip install prometheus-client pip install numpy pip install pyyaml pip install requests pip install aiohttp pip install pyshark - Download the CAIDA dataset and use
ExporterStarter/datasets/pcap_process.pyto convert it into.txtformat.sudo apt update sudo apt install tshark # For converting CAIDA pcap traces
This repository contains the implementation of standalone PromSketch, a sketch-based time series processing server, along with supporting tools for ingestion, query testing, visualization, and performance benchmarking.
-
Main PromSketch Server (
PromsketchServer/main.go:64) Hosts the control API, sketch storage, and query execution. The server reads env flags duringinitto wire features like ingestion concurrency and remote write forwarding (PromsketchServer/main.go:153-209). HTTP handlers for/ingestand dynamic port partitions live in the same file (PromsketchServer/main.go:645-689). -
Remote write forwarder (
PromsketchServer/remote_write.go:25-90) Background worker that converts each ingest payload into Prometheus remote write samples and ships them over HTTP with bounded timeouts and monotonic timestamps. It is wired into the ingest path atPromsketchServer/main.go:688-689. -
Custom Ingester (
ExporterStarter/custom_ingester.py) Forwards raw data into the PromSketch server through dynamically created multiport ingestion endpoints. -
Export Manager (
ExporterStarter/ExportManager.py) Generates synthetic time series data for ingestion tests. -
PromTools (
PromsketchServer/promtools.py) Issues PromQL queries to both the PromSketch server and a Prometheus server at fixed intervals (default: every 5 seconds) for side-by-side comparison.
From PromsketchServer/, run:
cd PromsketchServer/
MAX_INGEST_GOROUTINES=n go run .
# or use defaults (MAX_INGEST_GOROUTINES=1024)
go run .MAX_INGEST_GOROUTINEScontrols concurrency for ingestion.- On startup, the server automatically rewrites
prometheus.ymlbased on the number of multiport ingestion endpoints. - Control endpoints on :7000, pprof on localhost:6060, and ingestion partitions on 71xx appear after the ingester registers. If remote write is on (default
http://localhost:9090/api/v1/write), each ingest payload is also serialized and posted asynchronously; setPROMSKETCH_REMOTE_WRITE_ENDPOINT=""to skip or point at another TSDB.
Inside ExporterStarter/, run the following to generate and ingest synthetic data:
In one terminal:
cd ExporterStarter/
# Start Export Manager
python3 ExportManager.py \
--config=num_samples_config.yml \
--targets=8 \
--timeseries=10000 \
--max_windowsize=100000 \
--querytype=entropy \
--waiteval=60In another terminal:
cd ExporterStarter/
# Start Custom Ingester
python3 custom_ingester.py --config=num_samples_config.ymlIngester workflow (see PromsketchServer/main.go and PromsketchServer/README.md for details):
- Read
targetsfrom YAML (num_samples_config.yml). POST /register_configto :7000 with capacity hints (e.g.,estimated_timeseries) to decide how many 71xx ports to spawn.- Scrape each target’s
/metrics, parse samples(name, labels, value). - Map machineid → 71xx port using
MACHINES_PER_PORT. - Batch
POST http://localhost:71xx/ingeston a fixed interval.
From the Prometheus build directory:
cd PromsketchServer/prometheus/ # download prometheus and compile here
./prometheus --config.file=../prometheus-config/prometheus.yml --enable-feature=remote-write-receiver --web.enable-lifecycleEnsure that the prometheus.yml path points to the file rewritten by the server.
If you enable remote write on PromSketch, keep --enable-feature=remote-write-receiver and set PROMSKETCH_REMOTE_WRITE_ENDPOINT (for example http://localhost:9090/api/v1/write) before starting the Go server.
Prometheus can also scrape partition RAW endpoints :71xx/metrics via the promsketch_raw_groups job in prometheus-config/prometheus.yml to observe per-partition ingest.
To use extended prometheus supporting l2_over_time, entropy_over_time, and distinct_over_time, use:
cd PromSketch-Standalone/
git submodule update --init --recursive
cd PromsketchServer/external/prometheus-sketch-VLDB/prometheus-extended/prometheus
make build
./prometheus --config.file=../../../../prometheus-config/prometheus.yml --enable-feature=remote-write-receiver --web.enable-lifecycleRun PromTools from PromsketchServer/ to continuously send PromQL queries:
cd PromsketchServer/
python3 promtools.pyQueries such as avg_over_time, entropy_over_time, and quantile_over_time will be executed every 5 seconds.
From PromsketchServer/demo/, start the live dashboard:
cd PromsketchServer/demo
streamlit run demo.pyYou’ll see live latency charts (Prometheus vs PromSketch), per-expression metric values, and a cost panel fed by Prometheus & PromSketch counters.
- Connect Grafana to your Prometheus instance.
- Create dashboards and panels to display ingested metrics and query outputs.
- Enable auto-refresh for live visualization.
You can benchmark ingestion and query execution as follows:
-
Ingestion Throughput Test Increase
--targets,--timeseries, ornumClientsinExportManager.pyandcustom_ingester.pyto simulate high ingestion rates. -
Query Latency Test Use
promtools.pyto measure query response times while ingestion load is active. -
System Profiling Enable Go’s built-in
pproffor CPU and memory profiling:go tool pprof http://localhost:7000/debug/pprof/profile?seconds=30
- Multiport ingestion endpoints handle raw data ingestion and forward metrics directly to Prometheus.
- Main server (7000) is responsible for sketch aggregation and query execution. It must be active for queries to run.
- Default state: remote write is enabled and targets
http://localhost:9090/api/v1/write(PromsketchServer/main.go:193-208). The writer is only created when the endpoint string is non-empty. - Disable options:
- Env-only toggle: export
PROMSKETCH_REMOTE_WRITE_ENDPOINT=""before starting the server to skip forwarding while keeping local sketch ingestion intact. - Default-off build: set
remoteWriteEndpoint: ""indefaultsinsidePromsketchServer/main.go(around lines80-85), rebuild, and the writer will remain disabled unless you pass a non-empty env var.
- Env-only toggle: export
- Point to another TSDB/gateway: set
PROMSKETCH_REMOTE_WRITE_ENDPOINT="http://your-host:port/api/v1/write"and optionally tunePROMSKETCH_REMOTE_WRITE_TIMEOUT(Go duration, e.g.,5s,1m) to bound delivery latency (PromsketchServer/main.go:198-207). - Delivery path: payloads accepted by
/ingestare forwarded asynchronously so ingest latency is unaffected (PromsketchServer/main.go:645-690). The background worker that serializes and posts the remote write request lives inPromsketchServer/remote_write.go:25-91. - Safety notes: timestamps are made monotonic per series and backpressure is applied with a bounded queue; check the log prefix
[REMOTE WRITE]to confirm deliveries (PromsketchServer/remote_write.go:39-91).
-
Ingestion/scrape config (
config.yamlornum_samples_config.yml)- Location: place alongside the ingester (commonly
ExporterStarter/) and pass via--config. - Structure: Prometheus-style
scrape_configswithtargets,scrape_interval, and labels; ensure every target emits amachineidlabel for sharding. - Example:
scrape_configs: - job_name: fake-exporter scrape_interval: 1s static_configs: - targets: ["localhost:8000", "localhost:8001"]
- Purpose: drives
POST /register_configcapacity hints and how many 71xx ingest ports the server spawns.
- Location: place alongside the ingester (commonly
-
Prometheus config (
PromsketchServer/prometheus-config/prometheus.yml)- Use this when running Prometheus alongside PromSketch—either to scrape partition RAW endpoints (
promsketch_raw_groups) or to accept remote write. - The server’s
UpdatePrometheusYMLhelper rewrites this file with active 71xx ports; keep Prometheus pointed to this path. - Rule files in the same folder:
prometheus-rules.yml,promsketch-latency.yml.
- Use this when running Prometheus alongside PromSketch—either to scrape partition RAW endpoints (
-
Prep tips
- Ensure
machineidexists on exporter samples for correct partition routing. - Tune
scrape_intervalto workload rate and sizetargetsappropriately.
- Ensure