Sources & Integrations
IcebergSource
Polling connector for Apache Iceberg tables - schema-evolved, incremental reads.
IcebergSource reads from an Apache Iceberg table. Iceberg's snapshot model gives the connector incremental, append-aware reads with schema-evolution support out of the box.
Use case
- Event tables produced by upstream batch or streaming Iceberg writers
- Lakehouse architectures where Iceberg is the canonical landing zone
- Schema-evolving event streams where you want column adds to flow through without rewriting the connector config
Example
from thyme.connectors import IcebergSource, source
@source(
IcebergSource(table="transactions"),
cursor="ts", every="1m", max_lateness="5m",
)
@dataset(index=True)
class Transaction:
user_id: Field[str] = field(key=True)
amount: Field[float] = field()
ts: Field[datetime] = field(timestamp=True)Parameters
| Parameter | Required | Default / env var | Description |
|---|---|---|---|
table | Yes | - | Table name |
catalog | No | THYME_ICEBERG_CATALOG | Iceberg catalog name |
database | No | THYME_ICEBERG_DATABASE | Database / namespace |
Authentication
Authentication depends on your catalog backend:
- Glue catalog - uses the engine pod's AWS identity (irsa, instance role)
- REST / Nessie / Polaris catalog - pass credentials via the
THYME_ICEBERG_*env vars on the engine - HMS / Hive Metastore - Kerberos or LDAP credentials via env
Storage credentials (for the underlying object store) follow the same env-driven pattern as S3JsonSource.
Schema evolution
Iceberg-native schema evolution (add column, rename column, widen type) is honoured. New columns added to the source table appear as nullable fields in newly-arriving events. Removing a field from your @dataset while keeping it in the source table is safe - Thyme ignores untyped columns.
Limits
- The cursor field must be a column the writer guarantees is non-decreasing. A wall-clock
tscolumn written at insert time is the typical choice. - Iceberg row-level deletes (
v2delete files) are honoured by the connector but interact withcdc="upsert"semantics - verify behaviour against your specific writer.