Thyme
Reference

Aggregations Reference

Every windowed aggregation operator, its signature, and its memory profile.

Aggregation operators are used inside .aggregate() calls within a @pipeline method. Each operator computes a rolling windowed statistic over the events in an input dataset.

Usage

.aggregate(
    output_field_name=OperatorClass(of="input_field", window="7d"),
)

The keyword argument name becomes a field in the output dataset. Multiple aggregations can be combined in a single .aggregate() call.


Common parameters

All aggregation operators share these parameters:

ParameterTypeDescription
ofstrThe input dataset field to aggregate over
windowstrTime window size: integer + unit (d, h, m)

Window format examples: "7d" (7 days), "24h" (24 hours), "30m" (30 minutes).

All windows use event time, not processing time.


At a glance

OperatorInvertible?MemoryOutput
SumyesO(1)same as input
CountyesO(1)int
AvgyesO(1)float
Minno - sorted setO(window)same as input
Maxno - sorted setO(window)same as input
ApproxPercentileno - t-digest tilesO(tiles)float

Invertible operators maintain only the running total and count. When an old record expires out of the window, the engine subtracts its contribution without a recompute. This means Sum/Count/Avg over a 180-day window cost the same memory per entity as over a one-minute window.

Non-invertible operators store enough extra state to answer the query without rescanning the window. Min/Max use sorted sets with tombstones; ApproxPercentile uses tiled t-digest sketches and computes the answer at write time so the read is a single float lookup.


Operators

Sum

Rolling sum of a numeric field.

from thyme.pipeline import Sum

total_spend_7d=Sum(of="amount", window="7d")
  • of: numeric field (float or int)
  • Returns: same type as the input field

Count

Rolling count of events within the window.

from thyme.pipeline import Count

txn_count_30d=Count(of="user_id", window="30d")
  • of: any field (used to identify which events to count)
  • Returns: int

Avg

Rolling mean of a numeric field.

from thyme.pipeline import Avg

avg_spend_7d=Avg(of="amount", window="7d")
  • of: numeric field (float or int)
  • Returns: float

Min

Rolling minimum of a numeric field.

from thyme.pipeline import Min

min_amount_30d=Min(of="amount", window="30d")
  • of: numeric field (float or int)
  • Returns: same type as the input field

Memory is O(distinct values in the window).


Max

Rolling maximum of a numeric field.

from thyme.pipeline import Max

max_amount_7d=Max(of="amount", window="7d")
  • of: numeric field (float or int)
  • Returns: same type as the input field

Same structure as Min, inverted.


ApproxPercentile

Percentile rank of an incoming value against its own windowed distribution.

from thyme.pipeline import ApproxPercentile

price_pct_rank_180d=ApproxPercentile(of="price", window="180d", pct=99)
  • of: numeric field (float or int)
  • pct: the percentile to compute (0–100)
  • Returns: float - the value at the given percentile, OR the rank of the latest record, depending on configuration

Backed by tiled t-digest sketches. The window is divided into fixed-duration tiles; each tile stores a sketch; window queries merge the live tiles. Tiles fall out of the window individually in O(1), so a 180-day window is not meaningfully more expensive than a 7-day window.


Complete example

@dataset(index=True)
class UserStats:
    user_id:        str      = field(key=True)
    ts:             datetime = field(timestamp=True)
    avg_amount_7d:  float
    total_spend_30d: float
    txn_count_30d:  int
    max_amount_7d:  float
    min_amount_7d:  float
    p99_amount_30d: float

    @pipeline(version=1)
    @inputs(Transaction)
    def compute(cls, t: Transaction):
        return (
            t.groupby("user_id")
             .aggregate(
                 avg_amount_7d=Avg(of="amount", window="7d"),
                 total_spend_30d=Sum(of="amount", window="30d"),
                 txn_count_30d=Count(of="user_id", window="30d"),
                 max_amount_7d=Max(of="amount", window="7d"),
                 min_amount_7d=Min(of="amount", window="7d"),
                 p99_amount_30d=ApproxPercentile(of="amount", window="30d", pct=99),
             )
        )

On this page