Middleware pipes FastAPI telemetry to Prometheus and Grafana. Grafana dashboard can be uploaded from infra/grafana/dashboard.json to see API metrics with no additional work.
Adapted from https://grafana.com/grafana/dashboards/16110-fastapi-observability/
Generally you can use commands from /scripts or code from /tests to isolate the issue. The DB can be run in isolation, but the API depends on the DB. Visit http://localhost:8080/docs when the service is up to view FastAPI docs.
bash scripts/build_container.shdocker compose updocker compose watch retrieval_service
docker compose exec retrieval_service /bin/bash docker compose exec retrieval_service \
/bin/bash -c "source scripts/api_debug_setup.sh"Data persists between sessions. Delete data like,
bash scripts/wipe_database.shWarning: This will wipe real data too! Only use on test runs.
docker compose downdocker compose ps
docker compose restart retrieval_servicePrereqs:
- Docker
- Conda
conda env create -f environment.yaml
conda activate poetry_env
poetry installAdd the dependency change to pyproject.toml then update the poetry.lock file like so:
conda activate poetry_env
poetry update
Finally, rebuild the Docker containers.
curl -X 'POST' \
'http://localhost:8080/similar' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"embedding": [
0.5,
0.2,
-0.1
],
"k": 50,
"metric": "cosine_distance"
}'Supported metrics are max_inner_product, cosine_distance, and L2_distance.
curl -X 'POST' \
'http://localhost:8080/bulk_similar' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"embedding_list": [
[0.5,0.2,-0.1],
[0.1,0.89,-0.21]
],
"k": 2,
"metric": "cosine_distance"
}' \
| jq .output:
{
"most_similar": [
[
{
"name": "sherbert",
"embedding": [
11.0,
2.0,
3.0
],
"distance": 0.11676757169186391
},
{
"name": "kuma",
"embedding": [
1.0,
6.0,
3.0
],
"distance": 0.6231326316557115
}
],
[
{
"name": "kuma",
"embedding": [
1.0,
6.0,
3.0
],
"distance": 0.22904390253150264
},
{
"name": "mike",
"embedding": [
1.0,
2.0,
3.0
],
"distance": 0.6368304002307885
}
]
]
}To ping in bulk for testing,
python scripts/spam_requests.py --spam_seconds=10
Metrics from the Retrieval API are scraped by Prometheus and can be visualized in Grafana.
Links:
- Prometheus targets: http://localhost:9090/targets
- Supported metrics via Retrieval App: http://localhost:8080/metrics/
- Prometheus: http://localhost:9090/config
- Grafana: http://localhost:3000/explore
- Vector Database via custom extension on Postgres
- API for querying the nearest neighbors
- Limited upload support
- Metrics/Monitoring dash via Prometheus/Grafana
- Dashboard for API and Database
- Bulk inference
- Upload item to vector DB in api call
- Upload from file?
- Integrate with meaningful embeddings
- Move embedding size from static file to ENV
- Indexing to avoid brute force search
- Hot reloads for development. Current unreliable.


