Read & Write Paths

Two paths, one definition. How Thyme keeps online and offline features consistent.

Thyme runs two paths over the same feature definition. The write path continuously ingests events, applies windowed aggregations, and stores state keyed by entity and event time. The read path answers queries - online and point-in-time - by reading that same state and running extractors on top.

WRITE PATH ▸continuous ingestion

on query▸ READ PATH

Source

Streaming

Kafka · Kinesis

Source

Polling

Postgres · Iceberg · S3

Dataset

Raw Dataset

event-time keyed

Pipeline

Sum · Count · Avg · Min · Max

Shared state

Aggregated Dataset

event-time · exactly-once

HTTP

Pipeline

Query Server

Pipeline

Extractor

composes features

Featureset

Response

online · point-in-time

Both paths share a single source of truth: the feature store, an event-time-keyed state written transactionally by the engine and read non-blocking by the query server. Because online and offline queries read the same materialized state - not a copy, not a reconciled view - they return the same numbers.

Write path

The write path is always running. It is not a batch job; there is no schedule.

Sources. Every dataset has a source - streaming (Kafka, Kinesis) or polling (Postgres, Iceberg, S3, BigQuery, Snowflake). Polling sources track a cursor field and publish new rows onto a per-dataset Kafka topic as they arrive.
Raw Dataset. The landing zone for each source. Event time and entity key are extracted here so all downstream operators see canonical records.
Windowed Pipeline. Your @pipeline decorator defines groupby + aggregate over one or more time windows. Supported operators are Sum, Count, Avg, Min, Max, and ApproxPercentile. Sum/Count/Avg are invertible - old records falling out of a window can be subtracted without a recompute. ApproxPercentile uses tiled t-digest sketches, so percentile rank is ready at query time, not computed on the fly.
Aggregated Dataset. The pipeline's output is itself a queryable, event-time-keyed dataset. This is what your featureset reads from.
Commit. Each pipeline tick commits atomically - output state, internal accumulator state, and source progress advance together, paired with a transactional Kafka produce - giving end-to-end exactly-once semantics.

You do not manage Kafka consumers, topic partitions, watermark advancement, late-event handling, or checkpoint recovery. You declare the feature; the engine runs it.

Read path

The read path runs on demand, against the same state the write path produced.

Query Server. An HTTP service receives a featureset query. Optional ts= makes the query point-in-time; omitted, it serves the latest value.
Extractor DAG. Your @extractor functions form a DAG. The server resolves dependencies, topologically sorts them, and runs the Python extractors in-process - no network hop to a separate Python worker.
Featureset response. Composed features are returned as JSON. P50 server-side latency is well under 1 millisecond in production, because all state is read from a local replica of the feature store with no network reads.
Query-run record. After responding, the server spawns a detached task that records a metadata-only audit row and returns the run's UUID on the X-Query-Run-Id response header. The SDK surfaces this as ThymeResult.query_run_id; the CLI prints a Results: link to the web UI. Persistence is best-effort - a failed audit insert never fails the user's query. See Query Runs for what's captured and what isn't.

Why this shape

The write path and the read path share one source of truth. Neither is a projection of the other. That is the structural reason training/serving skew doesn't exist in Thyme:

A training job issuing a point-in-time query hits the same state, through the same extractor code, as an online request issued during inference.
There is no batch snapshot to reconcile against a streaming job; there is no streaming job to reconcile against a batch snapshot. There is one pipeline, two modes of query.
New features go live as soon as the pipeline advances. No backfill coordination, no offline-online deployment dance.

For the contracts this shape gives you - exactly-once processing, watermark and lateness behaviour, online/offline parity - see Durability & Consistency.

Read & Write Paths

Write path

Read path

Why this shape

On this page