Thyme
Operations

Monitoring

The Grafana dashboards and metrics surfaced on the in-product Monitoring page.

Thyme's web UI has a /monitoring page that embeds a set of Grafana dashboards covering the running platform. This page describes what those dashboards show so you can quickly find the right view when something looks off.

Dashboards

DashboardWhat it shows
System Health OverviewHigh-level status of all services, error rates, uptime
Engine PerformanceEvents processed per second, write latency, backfill progress
Query Server PerformanceQuery latency percentiles (P50 / P95 / P99), QPS, extractor execution time
Definition ServiceCommit history, definition counts, topic-creation events
Load TestWrite throughput, read latency vs SLA targets, online/offline parity

When to use which dashboard

  • You committed something and queries return 404 / null - Definition Service dashboard, then Engine Performance to see whether the new pipeline has produced output.
  • Latency regressed - Query Server Performance, then drill into specific featuresets via Query Runs.
  • Throughput looks low - Engine Performance for events/sec per worker, then check the Sources page for stuck cursors.
  • Something is on fire and you don't know what - start with System Health Overview and follow the red.

Where the metrics come from

Every Thyme service exposes Prometheus metrics on /metrics:

ServicePathKey metrics
Definition Service:8080/metricsCommit counts, commit latency, topic creation
Engine:8081/metricsEvents processed, aggregation latency, write latency
Query Server:8081/metricsQuery latency (P50/P95/P99), extractor execution time, QPS

These endpoints are excluded from auth. Your platform team scrapes them into Prometheus and renders the dashboards above; you consume them through the web UI.

On this page