In this tutorial you'll go from git clone to "PR opened in GitHub" in under
30 minutes on a fresh laptop. You'll register a local Elasticsearch cluster,
generate LLM relevance judgments against 1,000 sample products, run a 10-trial
Optuna study to tune multi_match field boosts, read the LLM-generated
digest, and (optionally) open a PR against the public
SoundMindsAI/relyloop-test-configs
config repo with the recommended parameters.
This is the same operator path the smoke test exercises in CI on every PR — if you hit a step that doesn't work, file an issue and link this guide.
| Requirement | Details |
|---|---|
Docker (incl. docker compose v2) |
24+ — make up orchestrates 7 containers |
| 16 GB RAM | Elasticsearch + OpenSearch each consume ~1 GB; 8 GB will OOM |
| One LLM provider (see below) | Required for judgment generation + digest narrative |
git and make |
Standard host tooling |
| GitHub Personal Access Token (PAT) | Optional — only needed for Step 10 ("Open PR") |
You need one of these LLM provider paths:
echo "sk-..." > ./secrets/openai_key # the file is git-ignoredDefault model is gpt-4o-mini. Cost ceiling for the full tutorial:
~$0.05 (5-query judgment generation + 10-trial study + digest narrative).
Set OPENAI_BASE_URL and OPENAI_MODEL in .env before make up. Example
for Ollama:
echo 'OPENAI_BASE_URL=http://host.docker.internal:11434/v1' >> .env
echo 'OPENAI_MODEL=llama3.1:70b-instruct' >> .env
echo 'placeholder' > ./secrets/openai_key # local servers don't validate the keyThe startup capability check probes your local endpoint and surfaces missing
capabilities (chat / function-calling / structured-output) in /healthz.
Important: judgment generation needs structured-output support — pick a
model from the tested matrix in
docs/01_architecture/llm-orchestration.md §"OpenAI-compatible endpoints".
If your local model lacks structured output, judgment generation will surface
LLM_PROVIDER_INCAPABLE and you'll need to use Path A or a different model.
Both paths are first-class. The rest of the tutorial is identical regardless of which one you pick.
The tutorial defaults to Elasticsearch, but every step works against
Apache Solr. The local Compose stack brings up solr:10.0 on
127.0.0.1:8983 alongside the Elasticsearch + OpenSearch containers.
To pick Solr instead:
make up—bootstrap-security.shgenerates the Solr admin credentials on first boot.make seed-solr— creates theproductscollection (UBI + LTR enabled via therelyloop_productsconfigset) + theubi_queries/ubi_eventsUBI collections, then bulk-indexessamples/products.json.make seed-clusters— registerslocal-solras a cluster row alongsidelocal-esandlocal-opensearch.- In the create-study modal, pick
local-solrfrom the cluster dropdown and pick theproducts_edismaxtemplate (orproducts_dismax/products_lucene). Search-space dimensions match the ES path (title_boost,description_boost,bullet_points_boost,tie,mm).
The Optuna loop runs unchanged — the engine difference is hidden behind
the SolrAdapter. See
solr-cluster-registration.md
for the runbook covering /reprobe, LTR model upload, and the optional
UBI on-ramp.
git clone https://github.com/SoundMindsAI/relyloop.git
cd relyloop
make upmake up runs scripts/install.sh (auto-generates required secrets +
placeholder optional ones), then docker compose up -d. First-run cold time:
~90 seconds. Brings up 7 containers: postgres, redis, api, worker,
elasticsearch, opensearch, ui. The first run also builds the ui
image locally (~3–5 min cold; cached on subsequent runs).
Wait until the API is healthy:
curl -s http://localhost:8000/healthz | jq .status
# → "ok"If /healthz shows subsystems.openai = missing_key, you skipped Step 0
Path A or Path B — go back and configure one before continuing.
If something went wrong: check docker compose ps for any container in
Restarting. If Elasticsearch keeps crashing, your host probably needs more
RAM allocated to Docker Desktop (Settings → Resources → at least 8 GB).
make logs tails the api + worker logs.
make migrateApplies the Alembic chain to the head revision (currently
0007_conversations_messages) and initializes the Optuna RDB schema. Without
this, every API call returns 500 with relation "..." does not exist.
If something went wrong: if make migrate errors with
api container is not running, the API container failed Step 1 — re-run
make up and check make logs.
make seed-clustersRegisters two cluster rows: local-es (Elasticsearch 9 at
http://elasticsearch:9200) and local-opensearch (OpenSearch 2 at
http://opensearch:9200). Idempotent — safe to re-run.
You can also register a cluster via the UI at
http://localhost:3000/clusters, but the
seed script saves you the form-filling.
Must run before Step 4 because seed_es.py resolves the cluster URL via
cluster_repo.get_active_cluster_by_name("local-es").
If something went wrong: if both clusters fail with ClusterUnreachable,
your Elasticsearch / OpenSearch containers haven't finished their healthchecks
yet. Wait 30 seconds and try again.
make seed-esWraps docker compose exec api python -m backend.app.scripts.seed_es. Loads
1,000 sample products (Amazon ESCI subset, CC-BY-4.0) from
samples/products.json into the local-es products index. Idempotent —
DELETE+recreates the index every run, so re-running with edited samples
cleanly removes orphans.
Verify the count:
curl -s http://localhost:9200/products/_count | jq .count
# → 1000If something went wrong: if the script reports local-es cluster not registered, you skipped Step 3.
Open the UI:
open http://localhost:3000/query-setsClick "Create query set" and fill the form:
- Name:
tutorial_queries - Cluster: pick
local-esfrom the dropdown — no UUID typing required; the field is a searchable list of every cluster the API knows about, with health-status dots so you can see at a glance which clusters are reachable - Description: anything (e.g. "Tutorial queries from ESCI")
Submit. On the query-set detail page, click "Add queries" and
upload samples/queries.csv — the dialog accepts JSON or CSV with the
file's query_id,query_text shape directly.
48 queries should now be attached to the query set.
You can do the same via the API. The example below resolves the cluster's UUIDv7 from the registry first, then posts the query set using a shell variable — no manual UUID copy-paste:
# Resolve the cluster UUID from its name (no copy-paste, no typos).
LOCAL_ES_ID=$(curl -s http://localhost:8000/api/v1/clusters \
| jq -r '.data[] | select(.name=="local-es") | .id')
QS_ID=$(curl -s -X POST http://localhost:8000/api/v1/query-sets \
-H "Content-Type: application/json" \
-d "{\"name\":\"tutorial_queries\",\"cluster_id\":\"$LOCAL_ES_ID\",\"description\":\"ESCI tutorial\"}" \
| jq -r .id)
curl -X POST http://localhost:8000/api/v1/query-sets/$QS_ID/queries \
-H "Content-Type: application/json" \
-d "$(jq -Rsn --rawfile csv samples/queries.csv \
'{queries: ($csv | split("\n") | .[1:] | map(select(length>0)) | map(split(",") | {query_text: .[1]}))}')"If something went wrong: if LOCAL_ES_ID came back empty, the
cluster wasn't registered — re-run Step 4 first.
This is the only LLM round-trip the tutorial walks you through. Cost is
~$0.01–$0.05 with gpt-4o-mini against a 5-query subset (or all 48).
In the UI, open the query-set detail page you created in Step 5 (e.g.
http://localhost:3000/query-sets/<id>) and click "Generate judgments"
in the associated-judgment-lists card. Fill in:
- Cluster:
local-es - Target index:
products - Current template: create one in Step 7 first if needed (or use the
smoke template in
samples/templates/product_search.j2) - Rubric:
Rate 0-3 by relevance to the query.
Click Generate. The judgment-list status moves through
generating → complete (~30–60s).
If you see OPENAI_NOT_CONFIGURED: you skipped Step 0 — populate
./secrets/openai_key (or OPENAI_BASE_URL for local LLM) and
make down && make up.
If you see LLM_PROVIDER_INCAPABLE: your local model doesn't support
structured output. Switch to a model from the tested matrix in
llm-orchestration.md.
Open:
open http://localhost:3000/templatesClick "Create template". Paste the contents of
samples/templates/product_search.j2 into the Body
field. The template renders an Elasticsearch multi_match query with
three declared params:
| Param | Type | Range |
|---|---|---|
title_boost |
float | 0.5 – 10 |
description_boost |
float | 0.5 – 10 |
bullet_points_boost |
float | 0.5 – 10 |
tie_breaker and fuzziness are intentionally hard-coded in the template so
the search-space stays under the platform's 10^6 cardinality cap on a
10-trial budget. To tune them too, switch to a 30-trial study and pull them
into the search-space.
Submit. Note the template ID for Step 8.
open http://localhost:3000/chatSend a message like:
Tune
product_search v1againsttutorial_queriesonlocal-es:products, max 10 trials.
The agent will introspect the cluster + judgment list, propose a
create_study tool call with a search-space, and ask you to confirm.
Reply "yes". The study queues immediately.
The chat agent is one of two ways to create a study; you can also click
"Create study" on /studies directly.
The 5-step wizard auto-fills Step 4 ("Search space") from the selected
template's declared params, so you only need to tweak the JSON if the
defaults don't match the ranges you want to tune — no paste-from-file
step required.
Open:
open http://localhost:3000/studiesClick the study you just created. The detail page shows trials filling in real-time (10 total, ~30 seconds each).
A few orientation surfaces above the panels:
- Linked entities row — named, clickable links to the cluster, query set, judgment list, and template this study ran against. Click any to drill into the source of truth.
- View-proposal link — once you promote a proposal (Step 10), a
Proposal: view proposal (<status>)link appears below the header for the round-trip from study → proposal. - Glossary tooltips —
(i)icons next to Target, Trials, Best metric, and other column headings. Hover for the short definition; the Guide button (bottom-right) opens the full glossary.
Once status = completed, the digest tab renders:
- Narrative summary — 2–3 sentences describing what won and why
- Recommended config —
*.params.jsonshape, ready to paste into your search-config repo - Parameter importance — bar chart from Optuna's importance evaluator
The Confidence panel sits between the trials table and the digest.
It tells you whether the winner is statistically reliable: the headline
metric with a 95% CI band, per-query outcome chips
(X Improved · Y Unchanged · Z Regressed), named Queries that
improved and Queries that regressed tables, and three secondary
callouts (runner-up gap, late-trial 1σ, convergence regime). Every
(i) icon opens a glossary definition. For a 5-query smoke-test study
the CI band will be wide and that's honest; the value is in seeing
which queries gained and lost, not just the aggregate lift.
If the digest narrative is empty or you see OPENAI_NOT_CONFIGURED in the
worker logs, your LLM provider is misconfigured — re-do Step 0.
Stop here if you don't have a GitHub PAT. You've completed the Karpathy loop end-to-end. Step 10 is the optional "ship it" path.
In the study's digest, click the "Promote to proposal" button on the recommended config. You're routed to the proposal detail page; click "Open PR" in the top-right.
This opens a Pull Request against the public
SoundMindsAI/relyloop-test-configs
repo with the recommended params written into the corresponding
*.params.json file. GitHub PATs are registered per config_repo: drop
the PAT at ./secrets/<auth_ref> (the same auth_ref you provided when
registering the config repo via POST /api/v1/config-repos). Without
the per-repo PAT file the button surfaces a configuration error —
that's the only step that needs write access to the repo.
To use your own config repo: register it via the API (no UI form ships in
MVP1) with your fork's URL + an auth_ref value, drop the PAT at
./secrets/<auth_ref>, then re-run from Step 8.
curl -X POST http://localhost:8000/api/v1/config-repos \
-H "Content-Type: application/json" \
-d '{"name":"my-config-fork","provider":"github","repo_url":"https://github.com/<you>/<repo>","config_path":"params/","auth_ref":"my_pat"}'This step is optional and requires an instrumented cluster. The tutorial completes fully on the LLM path (Steps 1–10) with no UBI cluster. If you have a cluster running the OpenSearch UBI plugin (or the o19s ES UBI fork) with captured click/dwell traffic, you can swap the LLM-graded judgment list for one derived from real user behavior — no LLM cost for the pure converters.
- On the query-set detail page, click Generate judgments again.
- The dialog now shows a Method picker. If your cluster has UBI
traffic for the target index, the picker defaults to a UBI converter
based on the readiness rung (dense traffic →
UBI (click-through); sparse →Hybrid UBI + LLM). If the cluster has no UBI plugin, you'll see the on-ramp nudge with install instructions and the picker stays onLLM-as-judge. - Pick Hybrid UBI + LLM to rate the dense head from clicks and let the LLM fill the long tail (requires a template + rubric, same as the LLM path). Set the UBI window (defaults to the last 30 days).
- Submit. When generation completes, the judgment-list detail page shows a "What real signals bought you" value-delta card comparing the UBI coverage against your prior LLM list.
- Re-run your study (Step 8) against the new judgment list to see how the recommendation shifts when grounded in real behavior.
See the UBI judgment-generation runbook for per-engine plugin install + converter selection guidance.
If you ran make seed-demo (or clicked Reset to demo state on the
dashboard), three demo clusters already carry synthetic UBI
clickstream so this step is browser-walkable without a real
instrumented cluster:
acme-products-prodreportsrung_3— the picker defaults toUBI (click-through)and the resulting judgment list grades against the synthetic events end-to-end.corp-docs-search(rung_1) andjobs-marketplace-prod(rung_2) default toHybrid UBI + LLMso the LLM fills the long tail.news-search-stagingstays atrung_0— the on-ramp nudge appears with engine-specific install guidance. Use this cluster to see what the no-UBI path looks like before you instrument your own.
Every UBI surface on the three synthetic clusters carries a
"Synthetic demo data" chip with a tooltip explaining the data was
fabricated by the demo reseed. The chip never appears on real operator
clusters or on news-search-staging. The deferred Phase 2 feature
feat_ubi_llm_study_comparison
will add a side-by-side Compare two studies view so you can see
the LLM-vs-UBI study output diff for the same query set.
A wide search space is more than one study can sample in a single run. Overnight autopilot makes each next study start where the last one left off — every follow-up narrows around the previous winner, runs deterministically, and stops on its own when the lift plateaus.
- Open the Create study wizard. Pick the Deep (1000) preset.
- Set 🌙 Run overnight (compound automatically) to depth 3.
- Click Create study before you log off.
- In the morning, open the study detail page. The Overnight chain panel summarises what ran, the cumulative lift across the chain, which link won, and why the chain stopped.
- The summary points at a proposal — click it, review the diff, open the
PR. (You can also cancel any mid-chain study with
?cascade=true(the default) to halt pending children.)
RelyLoop runs the exploration overnight unattended, but it never opens a PR on your behalf. The chain ends with a proposal you review and merge — your one decision.
The tutorial registered product_search.j2 — a deliberately minimal
demo template. RelyLoop ships a curated runnable template library
covering function-score decay, bool boosting, and phrase rescore on
ES/OpenSearch + edismax basic and recency-decay on Solr. Each library
template ships with a checked-in .search_space.json starter and a
copy-paste registration block.
samples/templates/README.md— the four runnable ES/OpenSearch templates (multi_match_basic,function_score_decay,bool_boosted,rescore_phrase) with per-template "when to use", expected metric behavior, and a copy-pastecurlregistration block per template.samples/templates/solr/README.md— the two runnable Solr templates (edismax_basic,boost_decay).
Each tunable knob has a per-engine reference page with native + unified names, valid ranges, "when to tune", caveats, and the templates that declare it.
docs/06_vendor_docs/elasticsearch-tunable-params.mddocs/06_vendor_docs/opensearch-tunable-params.md(covers OpenSearch's hybrid normalization-processor — NOT the ESrrfretriever)docs/06_vendor_docs/solr-tunable-params.md(grounded in the checked-in Solr 9 / 10 ref-guide source)
- The full feature set is in
docs/02_product/mvp1-user-stories.md. - The architectural decisions are in
docs/01_architecture/— start withsystem-overview.md. - The umbrella product spec is at
docs/00_overview/relyloop-spec.md. - File feedback or bug reports at GitHub Discussions.
Welcome to RelyLoop.