Define ML features once in Python. Thyme compiles them to a high-throughput Rust streaming engine — real-time serving, point-in-time correct training, zero skew between the two.
Powerful data engineering workflows, without the infrastructure headaches. Powered by Rust.
Time-windowed aggregations (1m, 24h, 7d) run on a continuous Rust streaming engine. Values are updated within milliseconds of new events arriving - a kappa based architecture that is constantly streaming fresh data.
1from thyme import *
2
3@source(name="transactions")
4class Transaction:
5 user_id: str = field(key=True)
6 ts: datetime = field(timestamp=True)
7 amount: float
8
9@dataset(index=True)
10class UserSpend:
11 user_id: str = field(key=True)
12 avg_24h: float
13 avg_7d: float
14
15 @pipeline(version=1)
16 @inputs(Transaction)
17 def compute(cls, t):
18 return t.groupby("user_id").aggregate(
19 avg_24h=Avg(of="amount", window="24h"),
20 avg_7d=Avg(of="amount", window="7d"),
21 )
Every team building real-time ML hits the same wall. Training features and serving features drift apart, and accuracy quietly erodes in production.
Offline metrics look great. Production accuracy drops within weeks — not because the model is wrong, but because the features it sees in production are computed differently than the features it trained on.
Batch jobs (Spark, dbt) compute training features. Streaming systems (Flink, microservices) compute serving features. A bug fix in one doesn't propagate to the other. The logic drifts.
Batch pipelines run on schedules — hourly, daily. A user's last transaction was 4 minutes ago, but your model sees yesterday's aggregate. You're serving predictions on stale data.
Thyme runs one pipeline. Training and serving read the same state — skew is structurally impossible, not a convention you enforce in review.
Read the full storyFrom feature computation to serving, Thyme handles the entire lifecycle so your team can focus on building great models.
Features defined in Python are compiled to a high-throughput Rust streaming engine. Real-time aggregations with millisecond freshness.
Point-in-time correct feature retrieval for training. Query any feature exactly as it was known at any past moment.
One definition, two modes. The same feature logic runs in both streaming aggregation and offline point-in-time lookups — no divergence, no silent accuracy drops.
Composable abstractions: datasets define event streams, pipelines apply windowed aggregations, and extractors compute derived features on read.
Distributed leasing, checkpointing, and replay logs ensure exactly-once processing with no data loss or duplication.
No Kafka consumers to manage, no state stores to tune, no checkpoint recovery to handle. You own the feature logic — Thyme owns the infrastructure.
A streaming write path keeps features fresh; a query-time read path composes them for your model. Both paths read the same event-time-keyed state, so training and serving cannot drift.
Thyme compiles Python feature definitions to a Rust streaming engine. Low latency, zero skew, and a three-command deployment workflow.
P99 Online Latency
Definition for Online & Offline
Training/Serving Skew
Commands to Deploy
from thyme import *
@dataset(index=True)
class UserStats:
user_id: str = field(key=True)
ts: datetime = field(timestamp=True)
avg_spend_7d: float
@pipeline(version=1)
@inputs(Transaction)
def compute(cls, t):
return t.groupby("user_id").aggregate(
avg_spend_7d=Avg(of="amount", window="7d")
)
Define features in Python. Deploy with thyme commit. Serve in milliseconds.
Join the teams shipping ML features faster with Thyme. Get up and running in minutes, not months.