Thyme

Thyme

Real-time ML features. Defined in Python. Computed in Rust.

Thyme is a streaming feature platform for ML engineers. You write features once in Python - Thyme compiles them to a high-throughput Rust engine that keeps them fresh in real time and serves them over the same code path for both online inference and point-in-time training.

WRITE PATH ▸continuous ingestion
on query▸ READ PATH
Source
Streaming
Kafka · Kinesis
Source
Polling
Postgres · Iceberg · S3
Dataset
Raw Dataset
event-time keyed
Pipeline
Pipeline
Sum · Count · Avg · Min · Max
Shared state
Aggregated Dataset
event-time · exactly-once
HTTP
Pipeline
Query Server
Pipeline
Extractor
composes features
Featureset
Response
online · point-in-time

One pipeline, two modes of query. The online path and the offline path read the same state, so training/serving skew doesn't exist by construction.


Five minutes to features

from datetime import datetime
from thyme.dataset import dataset, field
from thyme.pipeline import pipeline, inputs, Avg, Count
from thyme.featureset import featureset, feature, extractor
from thyme.featureset import extractor_inputs, extractor_outputs

@dataset(index=True)
class Transaction:
    user_id: str   = field(key=True)
    amount:  float
    ts:      datetime = field(timestamp=True)

@dataset(index=True)
class UserStats:
    user_id:       str   = field(key=True)
    ts:            datetime = field(timestamp=True)
    avg_amount_7d: float
    txn_count_30d: int

    @pipeline(version=1)
    @inputs(Transaction)
    def compute(cls, t: Transaction):
        return (
            t.groupby("user_id")
             .aggregate(
                 avg_amount_7d=Avg(of="amount", window="7d"),
                 txn_count_30d=Count(of="user_id", window="30d"),
             )
        )

@featureset
class UserFeatures:
    uid:          str   = feature(id=1)
    avg_spend_7d: float = feature(id=2)
    txn_count_30d: int  = feature(id=3)

    @extractor
    @extractor_inputs("uid")
    @extractor_outputs("avg_spend_7d", "txn_count_30d")
    def from_stats(cls, ts, inputs):
        uid = inputs["uid"]
        row = UserStats.lookup(ts, user_id=uid)
        return row["avg_amount_7d"], row["txn_count_30d"]

Then deploy:

thyme commit features.py

And query - every call appears in the UI as a Query Run with latency, hit rate, and replay:

thyme query features:UserFeatures -e user_42
# ...table...
# Query run: 7b3e4c...
# Results: $THYME_FRONTEND_URL/query-runs/7b3e4c...

How to read these docs

The site is structured around the canonical feature-platform shape. Different readers want different paths through it:

ML engineers building features - start with Concepts to understand datasets, pipelines, featuresets, and extractors. Then read Define features and Aggregations. When you're ready to write tests, Testing covers MockContext.

Platform engineers and operators - start with Operations → Deployment, then Monitoring. Architecture → Durability & Consistency explains the guarantees you can rely on.

Evaluators - start with Why Thyme, then walk one of the Case Studies end to end.

Anyone trying it for the first time - go to Getting Started → Installation, then run through the Interactive Tour once your administrator has given you a hosted instance URL and API key.


On this page