Thyme

Thyme is a streaming feature platform for ML engineers. You write features once in Python - Thyme compiles them to a high-throughput Rust engine that keeps them fresh in real time and serves them over the same code path for both online inference and point-in-time training.

WRITE PATH ▸continuous ingestion

on query▸ READ PATH

Source

Streaming

Kafka · Kinesis

Source

Polling

Postgres · Iceberg · S3

Dataset

Raw Dataset

event-time keyed

Pipeline

Sum · Count · Avg · Min · Max

Shared state

Aggregated Dataset

event-time · exactly-once

HTTP

Pipeline

Query Server

Pipeline

Extractor

composes features

Featureset

Response

online · point-in-time

One pipeline, two modes of query. The online path and the offline path read the same state, so training/serving skew doesn't exist by construction.

Five minutes to features

from datetime import datetime
from thyme.dataset import dataset, field
from thyme.pipeline import pipeline, inputs, Avg, Count
from thyme.featureset import featureset, feature, extractor
from thyme.featureset import extractor_inputs, extractor_outputs

@dataset(index=True)
class Transaction:
    user_id: str   = field(key=True)
    amount:  float
    ts:      datetime = field(timestamp=True)

@dataset(index=True)
class UserStats:
    user_id:       str   = field(key=True)
    ts:            datetime = field(timestamp=True)
    avg_amount_7d: float
    txn_count_30d: int

    @pipeline(version=1)
    @inputs(Transaction)
    def compute(cls, t: Transaction):
        return (
            t.groupby("user_id")
             .aggregate(
                 avg_amount_7d=Avg(of="amount", window="7d"),
                 txn_count_30d=Count(of="user_id", window="30d"),
             )
        )

@featureset
class UserFeatures:
    uid:          str   = feature(id=1)
    avg_spend_7d: float = feature(id=2)
    txn_count_30d: int  = feature(id=3)

    @extractor
    @extractor_inputs("uid")
    @extractor_outputs("avg_spend_7d", "txn_count_30d")
    def from_stats(cls, ts, inputs):
        uid = inputs["uid"]
        row = UserStats.lookup(ts, user_id=uid)
        return row["avg_amount_7d"], row["txn_count_30d"]

Then deploy:

thyme commit features.py

And query - every call appears in the UI as a Query Run with latency, hit rate, and replay:

thyme query features:UserFeatures -e user_42
# ...table...
# Query run: 7b3e4c...
# Results: $THYME_FRONTEND_URL/query-runs/7b3e4c...

How to read these docs

The site is structured around the canonical feature-platform shape. Different readers want different paths through it:

ML engineers building features - start with Concepts to understand datasets, pipelines, featuresets, and extractors. Then read Define features and Aggregations. When you're ready to write tests, Testing covers MockContext.

Platform engineers and operators - start with Operations → Deployment, then Monitoring. Architecture → Durability & Consistency explains the guarantees you can rely on.

Evaluators - start with Why Thyme, then walk one of the Case Studies end to end.

Anyone trying it for the first time - go to Getting Started → Installation, then run through the Interactive Tour once your administrator has given you a hosted instance URL and API key.

Thyme

Five minutes to features

How to read these docs

Why Thyme?

Read & Write Paths

Query Runs

Get Started

Interactive Tour

Case Studies

On this page