Why Thyme

One definition, two runtimes

Most streaming feature systems maintain two pipelines: a batch job to compute training features over history, and a streaming job to keep serving features fresh. The two drift. A fix in one does not propagate to the other. The model you evaluated offline is not the model you're serving.

Thyme runs one pipeline. The same @pipeline definition produces state that is queried two ways - online for inference, point-in-time for training. There is no second pipeline to drift against.

The rest of this page is the consequences of that one design choice.

1. Kappa, not lambda

Thyme is a kappa-architecture system: every feature is the output of a streaming aggregation, end of story. Historical training features are not a separate thing - they are the same aggregation answered at an earlier event-time.

No batch ETL to maintain
No reconciliation job to keep the two in sync
No "the batch pipeline computed avg_7d differently" tickets

You write the feature once. The engine keeps it fresh forever. Training and serving read the same state.

2. Event-time correctness

Every record in Thyme carries an event time. The state store is keyed by (entity, event_time). Online queries return the latest value; point-in-time queries pass ts= and get the value as it would have been served at that moment.

# Online
curl "$THYME_BASE_URL/features?featureset=UserFeatures&uid=user_42"

# Point-in-time (training)
curl "$THYME_BASE_URL/features?featureset=UserFeatures&uid=user_42&ts=2024-01-15T12:00:00Z"

Because both paths read the same state through the same extractor code, a training dataset generated this way is what the model actually saw at that moment. This is structural, not a convention you enforce in review.

3. Exactly-once semantics

Every pipeline tick commits as a single atomic unit: output state, internal accumulator state, and source progress advance together, paired with a transactional Kafka produce. On restart the engine resumes from the last committed position. No duplicates, no lost events, no partial state.

You do not have to think about this. You do not have to write idempotency tokens on your aggregations. Exactly-once is what the platform gives you.

4. Sub-millisecond reads

The query server runs the extractor DAG in-process and reads aggregated state from a local replica of the feature store. No network hop to fetch state. No sidecar. No JVM.

P50 server-side latency: under 1 ms (0.66–2.23 ms measured in production on AWS)
Throughput: thousands of events/sec per partition, limited by your Kafka throughput, not Thyme

Latency this low means you can use Thyme on the hot path of real-time scoring - not just precomputation.

5. Declarative, not operational

You declare what a feature is, in Python:

@dataset(index=True)
class UserStats:
    user_id: str = field(key=True)
    ts:      datetime = field(timestamp=True)
    avg_amount_7d: float

    @pipeline(version=1)
    @inputs(Transaction)
    def compute(cls, t):
        return t.groupby("user_id").aggregate(
            avg_amount_7d=Avg(of="amount", window="7d"),
        )

You do not manage Kafka consumer groups, state-store maintenance, partition rebalancing, checkpoint upload, watermark advancement, or late-event handling. The engine does. Your job is the feature logic; Thyme's job is running it at scale.

Business outcomes

Faster ML iteration. Define a new feature, commit it, and it's live. No pipeline deployment, no backfill coordination, no ops ticket.

Fewer production incidents. Training/serving parity is structural. It is not a convention you enforce manually and it is not a place new engineers can make a mistake.

Smaller infrastructure footprint. One system replaces the batch ETL, the streaming job, and the custom serving layer.

Safe schema evolution. You can add, rename, and version features without breaking downstream consumers - see Feature Versioning.

Audited queries by default. Every CLI or SDK query is recorded as a Query Run in the web UI - latency, hit rate, API key fingerprint, entities touched. No extra instrumentation required.

For real-world walkthroughs of each architectural choice, see the Case Studies section - three production deployments at different points on the EPS / latency / feature-complexity spectrum.