Monitoring

Thyme's web UI has a /monitoring page that embeds a set of Grafana dashboards covering the running platform. This page describes what those dashboards show so you can quickly find the right view when something looks off.

Dashboards

Dashboard	What it shows
System Health Overview	High-level status of all services, error rates, uptime
Engine Performance	Events processed per second, write latency, backfill progress
Query Server Performance	Query latency percentiles (P50 / P95 / P99), QPS, extractor execution time
Definition Service	Commit history, definition counts, topic-creation events
Load Test	Write throughput, read latency vs SLA targets, online/offline parity

When to use which dashboard

You committed something and queries return 404 / null - Definition Service dashboard, then Engine Performance to see whether the new pipeline has produced output.
Latency regressed - Query Server Performance, then drill into specific featuresets via Query Runs.
Throughput looks low - Engine Performance for events/sec per worker, then check the Sources page for stuck cursors.
Something is on fire and you don't know what - start with System Health Overview and follow the red.

Where the metrics come from

Every Thyme service exposes Prometheus metrics on /metrics:

Service	Path	Key metrics
Definition Service	`:8080/metrics`	Commit counts, commit latency, topic creation
Engine	`:8081/metrics`	Events processed, aggregation latency, write latency
Query Server	`:8081/metrics`	Query latency (P50/P95/P99), extractor execution time, QPS

These endpoints are excluded from auth. Your platform team scrapes them into Prometheus and renders the dashboards above; you consume them through the web UI.

Monitoring

Dashboards

When to use which dashboard

Where the metrics come from

On this page