Thyme
Concepts

Sources

Connect external data systems to Thyme datasets.

A source connects an external data system to a Thyme dataset. The engine polls the source on a schedule, converts new rows into events, and publishes them to Kafka for downstream pipelines.

Attaching a source

Use @source to attach a connector to a dataset class. The decorator goes outside @dataset:

from thyme.connectors import IcebergSource, source

@source(
    IcebergSource(catalog="prod", database="events", table="transactions"),
    cursor="ts",
    every="1m",
    disorder="5m",
    cdc="append",
)
@dataset(index=True)
class Transaction:
    user_id: str      = field(key=True)
    amount:  float
    ts:      datetime = field(timestamp=True)

@source parameters

ParameterTypeDefaultDescription
connectorconnector objectrequiredThe data source to read from
cursorstr""Field used for incremental reads (high-water mark)
everystr""Poll interval: "1m", "5m", "1h"
disorderstr""Maximum late-arrival tolerance: "5m", "1h", "1d"
cdcstr"append"Change data capture mode (see below)

cursor

The field the engine uses to track progress. On each poll, the engine fetches rows where cursor > last_seen_value. This enables efficient incremental reads without scanning the full table.

every

How often the engine polls the source. Format: integer + unit (m minutes, h hours, d days).

disorder

The maximum expected delay between event time and ingestion time. Events arriving later than (current_watermark - disorder) are discarded. This bound lets the engine advance the watermark and emit results without waiting forever.

Set this conservatively — too small and you lose late events; too large and your features are unnecessarily delayed.


CDC modes

The cdc parameter tells the engine how to interpret incoming rows.

append (default)

Each row is a new event. The table is insert-only. Use this for event logs, transaction records, click streams:

cdc="append"

upsert

Each row is an upsert keyed on the dataset's key field. The engine interprets rows as the latest state for that key. Use this for tables where rows are updated in place (e.g., a user profile table):

cdc="upsert"

debezium

Rows arrive as full Debezium CDC envelopes with before, after, and op fields. Use this when reading from a Debezium Kafka connector upstream of Thyme:

cdc="debezium"

Debezium mode handles INSERT, UPDATE, and DELETE operations from the envelope metadata.


Connectors

IcebergSource

Reads from an Apache Iceberg table.

from thyme.connectors import IcebergSource

IcebergSource(
    catalog="prod",    # Iceberg catalog name
    database="events", # database/namespace
    table="transactions",  # table name
)
ParameterTypeDescription
catalogstrIceberg catalog name
databasestrDatabase or namespace
tablestrTable name

Additional connector types (Kafka, S3, PostgreSQL) are planned for future releases.

On this page