Thyme
Sources & Integrations

IcebergSource

Polling connector for Apache Iceberg tables - schema-evolved, incremental reads.

IcebergSource reads from an Apache Iceberg table. Iceberg's snapshot model gives the connector incremental, append-aware reads with schema-evolution support out of the box.

Use case

  • Event tables produced by upstream batch or streaming Iceberg writers
  • Lakehouse architectures where Iceberg is the canonical landing zone
  • Schema-evolving event streams where you want column adds to flow through without rewriting the connector config

Example

from thyme.connectors import IcebergSource, source

@source(
    IcebergSource(table="transactions"),
    cursor="ts", every="1m", max_lateness="5m",
)
@dataset(index=True)
class Transaction:
    user_id: Field[str]      = field(key=True)
    amount:  Field[float]    = field()
    ts:      Field[datetime] = field(timestamp=True)

Parameters

ParameterRequiredDefault / env varDescription
tableYes-Table name
catalogNoTHYME_ICEBERG_CATALOGIceberg catalog name
databaseNoTHYME_ICEBERG_DATABASEDatabase / namespace

Authentication

Authentication depends on your catalog backend:

  • Glue catalog - uses the engine pod's AWS identity (irsa, instance role)
  • REST / Nessie / Polaris catalog - pass credentials via the THYME_ICEBERG_* env vars on the engine
  • HMS / Hive Metastore - Kerberos or LDAP credentials via env

Storage credentials (for the underlying object store) follow the same env-driven pattern as S3JsonSource.

Schema evolution

Iceberg-native schema evolution (add column, rename column, widen type) is honoured. New columns added to the source table appear as nullable fields in newly-arriving events. Removing a field from your @dataset while keeping it in the source table is safe - Thyme ignores untyped columns.

Limits

  • The cursor field must be a column the writer guarantees is non-decreasing. A wall-clock ts column written at insert time is the typical choice.
  • Iceberg row-level deletes (v2 delete files) are honoured by the connector but interact with cdc="upsert" semantics - verify behaviour against your specific writer.

On this page