Aggregations Reference
Every windowed aggregation operator, its signature, and its memory profile.
Aggregation operators are used inside .aggregate() calls within a @pipeline method. Each operator computes a rolling windowed statistic over the events in an input dataset.
Usage
.aggregate(
output_field_name=OperatorClass(of="input_field", window="7d"),
)The keyword argument name becomes a field in the output dataset. Multiple aggregations can be combined in a single .aggregate() call.
Common parameters
All aggregation operators share these parameters:
| Parameter | Type | Description |
|---|---|---|
of | str | The input dataset field to aggregate over |
window | str | Time window size: integer + unit (d, h, m) |
Window format examples: "7d" (7 days), "24h" (24 hours), "30m" (30 minutes).
All windows use event time, not processing time.
At a glance
| Operator | Invertible? | Memory | Output |
|---|---|---|---|
Sum | yes | O(1) | same as input |
Count | yes | O(1) | int |
Avg | yes | O(1) | float |
Min | no - sorted set | O(window) | same as input |
Max | no - sorted set | O(window) | same as input |
ApproxPercentile | no - t-digest tiles | O(tiles) | float |
Invertible operators maintain only the running total and count. When an old record expires out of the window, the engine subtracts its contribution without a recompute. This means Sum/Count/Avg over a 180-day window cost the same memory per entity as over a one-minute window.
Non-invertible operators store enough extra state to answer the query without rescanning the window. Min/Max use sorted sets with tombstones; ApproxPercentile uses tiled t-digest sketches and computes the answer at write time so the read is a single float lookup.
Operators
Sum
Rolling sum of a numeric field.
from thyme.pipeline import Sum
total_spend_7d=Sum(of="amount", window="7d")of: numeric field (floatorint)- Returns: same type as the input field
Count
Rolling count of events within the window.
from thyme.pipeline import Count
txn_count_30d=Count(of="user_id", window="30d")of: any field (used to identify which events to count)- Returns:
int
Avg
Rolling mean of a numeric field.
from thyme.pipeline import Avg
avg_spend_7d=Avg(of="amount", window="7d")of: numeric field (floatorint)- Returns:
float
Min
Rolling minimum of a numeric field.
from thyme.pipeline import Min
min_amount_30d=Min(of="amount", window="30d")of: numeric field (floatorint)- Returns: same type as the input field
Memory is O(distinct values in the window).
Max
Rolling maximum of a numeric field.
from thyme.pipeline import Max
max_amount_7d=Max(of="amount", window="7d")of: numeric field (floatorint)- Returns: same type as the input field
Same structure as Min, inverted.
ApproxPercentile
Percentile rank of an incoming value against its own windowed distribution.
from thyme.pipeline import ApproxPercentile
price_pct_rank_180d=ApproxPercentile(of="price", window="180d", pct=99)of: numeric field (floatorint)pct: the percentile to compute (0–100)- Returns:
float- the value at the given percentile, OR the rank of the latest record, depending on configuration
Backed by tiled t-digest sketches. The window is divided into fixed-duration tiles; each tile stores a sketch; window queries merge the live tiles. Tiles fall out of the window individually in O(1), so a 180-day window is not meaningfully more expensive than a 7-day window.
Complete example
@dataset(index=True)
class UserStats:
user_id: str = field(key=True)
ts: datetime = field(timestamp=True)
avg_amount_7d: float
total_spend_30d: float
txn_count_30d: int
max_amount_7d: float
min_amount_7d: float
p99_amount_30d: float
@pipeline(version=1)
@inputs(Transaction)
def compute(cls, t: Transaction):
return (
t.groupby("user_id")
.aggregate(
avg_amount_7d=Avg(of="amount", window="7d"),
total_spend_30d=Sum(of="amount", window="30d"),
txn_count_30d=Count(of="user_id", window="30d"),
max_amount_7d=Max(of="amount", window="7d"),
min_amount_7d=Min(of="amount", window="7d"),
p99_amount_30d=ApproxPercentile(of="amount", window="30d", pct=99),
)
)