When to use this runbook: deploying log aggregation for a Powernode environment, adding a new service to log routing, debugging why expected logs aren't showing up in Grafana.
- Stack overview
- Loki + Promtail deployment
- Grafana datasource wiring
- Retention
- Log labels and queries
- Application logging conventions
- Metrics (Prometheus)
- Troubleshooting
Powernode ships configuration scaffolding for a Grafana-Loki-Promtail-Prometheus stack. Operators run the actual containers themselves — the repo does not deploy them.
| Component | Repo config | Role |
|---|---|---|
| Loki | configs/logging/loki-config.yml |
Log storage + query (port 3100) |
| Promtail | configs/logging/promtail-config.yml |
Log shipper, scrapes Docker container stdout (port 9080) |
| Grafana | configs/monitoring/grafana-datasources.yml, configs/monitoring/grafana-dashboards.yml |
UI + alerting |
| Prometheus | (operator-deployed) | Metrics scrape + storage (port 9090, default Grafana datasource) |
The configs assume single-node defaults (replication_factor: 1, filesystem storage). For multi-host fleets, scale via the Loki microservices mode — repo configs are operator-extensible.
Create docker-compose.observability.yml adjacent to the existing Powernode compose:
services:
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
volumes:
- ./configs/logging/loki-config.yml:/etc/loki/local-config.yaml:ro
- loki-data:/tmp/loki
command: -config.file=/etc/loki/local-config.yaml
restart: unless-stopped
promtail:
image: grafana/promtail:2.9.0
volumes:
- ./configs/logging/promtail-config.yml:/etc/promtail/config.yml:ro
- /var/log:/var/log:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
command: -config.file=/etc/promtail/config.yml
depends_on:
- loki
restart: unless-stopped
volumes:
loki-data:Bring it up:
docker compose -f docker-compose.observability.yml up -d
docker compose -f docker-compose.observability.yml logs -f loki | head -30 # smokePromtail only scrapes containers labeled logging=promtail (per the config's docker_sd_configs.filters). When deploying Powernode services, tag them:
services:
backend:
image: powernode/backend
labels:
- logging=promtailThis avoids accidentally shipping infrastructure-component logs to Loki and ballooning storage.
For Docker Swarm, deploy as a global service so Promtail runs on every node:
deploy:
mode: global
placement:
constraints:
- node.platform.os == linuxSee docs/operations/docker-swarm.md for the broader Swarm operations runbook.
The shipped configs/monitoring/grafana-datasources.yml provisions Prometheus as the default datasource. Loki is commented out — uncomment when deploying Loki:
# Edit configs/monitoring/grafana-datasources.yml — uncomment:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
isDefault: false
version: 1
editable: trueMount the file into your Grafana container at /etc/grafana/provisioning/datasources/datasources.yml and restart Grafana to pick it up.
| Layer | Default | Configured in | How to change |
|---|---|---|---|
| Loki logs | 7 days | configs/logging/loki-config.yml (retention_period: 168h) |
Edit value, restart Loki |
| Loki compactor sweep | every 10 min | same file (compaction_interval) |
rarely changed |
| Promtail positions | persistent | /tmp/positions.yaml inside Promtail container |
use a named volume to survive restarts |
For compliance regimes that require longer retention (PCI: 1 year minimum), increase retention_period and ensure storage volume is sized accordingly. The compactor will free disk space automatically once retention_delete_delay: 2h has elapsed.
Promtail's relabel rules surface these labels for every scraped container log line:
| Label | Source | Example |
|---|---|---|
container |
Docker container name | powernode-backend-1 |
logstream |
stdout / stderr |
stderr |
service_name |
Swarm service label | powernode_backend |
stack |
Swarm stack label | powernode-production |
Common LogQL queries (paste into Grafana → Explore → Loki):
# All ERROR lines from backend in the last hour
{service_name="powernode_backend"} |= "ERROR"
# Worker job failures
{service_name="powernode_worker"} |~ "Failed .* Job after"
# Backend 5xx
{service_name="powernode_backend"} |~ "Completed 5\\d\\d"
# Audit log writes from the model layer
{service_name="powernode_backend"} |= "AuditLog" |= "created"
# Report request lifecycle for a specific id
{service_name=~"powernode_(backend|worker)"} |= "019e3c6c-9e1a"
Powernode services follow consistent log emission to make LogQL queries reliable:
- Rails backend uses
Rails.loggeronly (perfeedback_clean_implementationsandfrontend/CLAUDE.md— noputs/print). Output is JSON-ish lines on stdout. - Worker uses
BaseJobhelpers (log_info,log_error) that emit structured fields including job class + JID. - Frontend (browser) uses
loggerfrom@/shared/utils/logger— noconsole.login production (caught byscripts/cleanup-all-console-logs.sh). - Request IDs: each HTTP request gets a
request.uuidRails sets. Include it when logging from a request path so cross-service traces correlate.
If you add a new component that should ship logs to Loki:
- Add the
logging=promtaillabel. - Ensure stdout/stderr is unbuffered (Ruby:
STDOUT.sync = true; Node:process.stdout.writeis line-buffered when TTY). - Use a structured format (JSON or
key=valuepairs) so LogQL can| logfmt-parse fields.
The repo ships configs/monitoring/grafana-dashboards.yml and a grafana-dashboards/ directory. Prometheus scrape config is operator-owned — point Prometheus at:
| Endpoint | What it exposes |
|---|---|
http://backend:3000/metrics |
Rails app metrics (request counts, latency histograms via yabeda-rails) — emitted only if the yabeda-prometheus gem is enabled. As of 2026-05, NOT enabled by default; uncomment in server/Gemfile and re-bundle to opt in. |
http://worker:4567/metrics |
Worker HTTP API metrics (job dispatch counts, queue depth) |
cAdvisor, node_exporter |
Standard host + container metrics — deploy via the same observability compose file |
- Is Loki receiving them?
curl -s http://localhost:3100/ready curl -s http://localhost:3100/metrics | grep ingester_streams_totalingester_streams_totalshould be non-zero and growing. - Is Promtail scraping?
Expected: one entry per
curl -s http://localhost:9080/targets | head -30logging=promtail-labeled container. - Is the container actually labeled?
docker ps --format '{{.Names}}\t{{.Labels}}' | grep -i logging
- Datasource configured in Grafana? Grafana → Configuration → Data Sources → Loki should show "Data source is working".
Promtail's relabel_configs only attach Swarm labels if Swarm metadata is present. For plain Docker Compose deploys, service_name and stack are empty — query by container instead.
The compactor needs time to free space (retention_delete_delay: 2h). If disk is filling faster than retention deletes free, either:
- Reduce
retention_period - Increase the host volume
- Add more aggressive label filters in
promtail-config.ymlto drop noisy logs at the ingest boundary
- production-deployment.md — overall production setup
- docker-swarm.md — Swarm-specific deploys
- incident-response.md — uses these logs during incidents
- performance-tuning.md — metrics-driven tuning