Thyme
Sources & Integrations

S3JsonSource

Polling connector for JSON or JSONL files in an S3 bucket.

S3JsonSource reads JSON or JSONL files from an S3 prefix. New objects are picked up on each poll; existing objects are not re-read.

Use case

  • Event logs landed in S3 by another system (e.g. Firehose JSONL output)
  • Periodic exports from data warehouses dumped into S3 prefixes
  • Bridge to a system that already lands JSON files in S3

For Parquet or CSV in S3, prefer landing the data into Iceberg and using IcebergSource - it supports schema evolution and incremental reads natively.

Example

from thyme.connectors import S3JsonSource, source

@source(
    S3JsonSource(bucket="my-data", prefix="events/"),
    cursor="ts", every="5m", max_lateness="1h",
)
@dataset(index=True)
class Event:
    user_id:   Field[str]      = field(key=True)
    action:    Field[str]      = field()
    ts:        Field[datetime] = field(timestamp=True)

Parameters

ParameterRequiredDefault / env varDescription
bucketYes-S3 bucket name
prefixNo""Key prefix filter (per-dataset, not env-defaulted)
regionNoTHYME_S3_REGION ("us-east-1")AWS region

Authentication

Uses the engine pod's IAM identity (irsa, instance role). The role must have s3:ListBucket and s3:GetObject on the bucket and prefix. For cross-account reads, configure a bucket policy on the source bucket and either assume a role explicitly via your application code or grant the engine's role read access directly.

File formats

The connector handles both:

  • JSON - one JSON object per file
  • JSONL - one JSON object per line, multiple records per file

Each record's fields are mapped to the dataset's typed fields. The dataset's timestamp=True field must be present and parseable as ISO-8601, epoch seconds, or a datetime.

Limits

  • Files within the prefix should be append-only. Updating an existing object after it has been read is not detected.
  • New files are surfaced when their key sorts lexicographically after the last seen key - design your prefix layout accordingly (e.g. events/2026/03/15/...).

On this page