Thyme
Guides

Feature Versioning & Schema Evolution

How to evolve datasets, pipelines, and featuresets without breaking online consumers or training pipelines.

Feature definitions evolve. You add a column, change a window, switch an aggregation, or re-key a dataset. Thyme's versioning model is designed so most evolutions are non-breaking - but a few changes still require coordination. This guide is the playbook.

The version field

Every @dataset and @pipeline carries an integer version:

@dataset(version=1, index=True)
class UserOrderStats: ...

@pipeline(version=1)
@inputs(Order)
def compute_stats(cls, orders): ...

Featuresets are name-keyed - they don't carry a version field. Featureset evolution is governed by the underlying datasets and extractor definitions.

Non-breaking changes

These can be made on the same version and committed without coordination:

  • Add an output field to a dataset. Newly arriving events populate the field; existing aggregated state is unaffected, and backfills produce values starting from the next pipeline tick.
  • Add a new featureset referencing existing datasets.
  • Add a new extractor to an existing featureset.
  • Add a new aggregation to an existing pipeline.
  • Tighten an @expectations rule on a dataset (violations are observational, not blocking).

In all of these, online consumers continue to receive the existing fields without disruption while the new field/feature/extractor backfills.

Breaking changes - bump the version

Bump the version integer on the dataset (or pipeline) whenever you change the meaning or shape of existing fields. Examples:

  • Re-keying a dataset (field(key=True) moves to a different column)
  • Changing an existing aggregation's window (window="7d"window="14d")
  • Switching an aggregation operator (SumAvg for the same output field)
  • Removing or renaming an existing field
  • Switching a source's cdc mode (appendupsert)

When you bump the version:

  1. Old and new versions coexist during the migration window. State for the old version remains queryable while the new version backfills.
  2. Update featureset references (feature(ref=...)) to point at the new version.
  3. Once your model code reads only the new version, retire the old one with a follow-up commit.

This pattern lets you migrate without a flag day - old serving paths keep working until the cutover.

Renaming a feature without breaking consumers

To rename is_suspicious to is_high_risk:

  1. Add the new feature alongside the old one. Both extractors run; both values are exposed.
  2. Migrate consumers (model code, dashboards) to read the new feature.
  3. Delete the old feature once no consumer references it.

Featuresets are name-keyed end-to-end (decorator, REST API, UI), so the same pattern works for any feature in any featureset.

Migrating a source

Switching a source connector (e.g. moving from polling Postgres to streaming Kafka via Debezium) is a structural change. The recommended sequence:

  1. Add the new source on a new dataset. Don't disturb the existing dataset.
  2. Define a new pipeline that reads from the new dataset.
  3. Switch consumers (featuresets) to reference the new pipeline's output dataset.
  4. Retire the old dataset/pipeline/source after consumers move.

The lineage UI's blast-radius mode (see Lineage) is helpful here - click the dataset and visually confirm what depends on it before retiring.

Backfill semantics

Bumping a version triggers a backfill on commit for polling sources. The engine reads from the source's beginning (or from the configured backfill range) and rebuilds aggregated state for the new version.

Backfills run alongside live ingestion - they don't pause online queries. While a backfill is running, point-in-time queries against timestamps within the backfill window return the value the new version will have once the backfill completes.

For streaming sources, backfill on commit is not currently supported - rely on the live ingestion path for streaming-sourced datasets.

  1. Sketch the change. Is it additive (new field/feature) or structural (different meaning)?
  2. If additive: modify in place, commit, verify the new field/feature appears in queries.
  3. If structural: bump version, dual-run, migrate consumers, retire old version.
  4. Always verify in the lineage UI before committing structural changes - confirm the blast radius matches what you expect.

On this page