Thyme
Testing

Expectations

Data quality validation for datasets - observational checks that never drop records.

Expectations are declarative data quality checks attached to datasets. Violations are observational - they emit metrics and logs but never drop records. This catches data quality issues early without blocking ingestion during a production incident.

Usage

from thyme.expectations import expectations
from thyme.expectations import (
    expect_column_values_to_be_between,
    expect_column_values_to_not_be_null,
)

@dataset(version=1)
class Order:
    user_id: Field[str] = field(key=True)
    amount: Field[float] = field()
    timestamp: Field[datetime] = field(timestamp=True)

    @expectations
    def get_expectations(cls):
        return [
            expect_column_values_to_be_between(
                column="amount", min_value=0.01, max_value=50_000,
            ),
            expect_column_values_to_not_be_null(column="user_id"),
        ]

The @expectations decorator marks a dataset method that returns a list of expectation dicts. Expectations are evaluated by the engine with no per-record Python overhead.

Expectation functions

expect_column_values_to_be_between

Checks that a column's values fall within a range.

expect_column_values_to_be_between(column="amount", min_value=0.01, max_value=50_000)
ParameterTypeDefaultDescription
columnstrrequiredColumn name to check
min_valuenumeric or NoneNoneMinimum allowed value (inclusive). None = no lower bound.
max_valuenumeric or NoneNoneMaximum allowed value (inclusive). None = no upper bound.
mostlyfloat1.0Fraction of rows that must pass (0.0 to 1.0)

expect_column_values_to_not_be_null

Checks that a column has no null values.

expect_column_values_to_not_be_null(column="user_id")
ParameterTypeDefaultDescription
columnstrrequiredColumn name to check
mostlyfloat1.0Fraction of rows that must pass

expect_column_values_to_be_in_set

Checks that a column's values belong to a predefined set.

expect_column_values_to_be_in_set(column="status", values=["pending", "completed", "cancelled"])
ParameterTypeDefaultDescription
columnstrrequiredColumn name to check
valueslistrequiredAllowed values
mostlyfloat1.0Fraction of rows that must pass

expect_column_values_to_be_of_type

Checks that a column's values match an expected type.

expect_column_values_to_be_of_type(column="amount", type_name="float")
ParameterTypeDefaultDescription
columnstrrequiredColumn name to check
type_namestrrequiredExpected type name (e.g., "str", "float", "int")
mostlyfloat1.0Fraction of rows that must pass

The mostly parameter

All expectation functions accept a mostly parameter (default 1.0). This allows a fraction of rows to violate the expectation without triggering a violation event.

# At least 95% of amounts must be between 0.01 and 50,000
expect_column_values_to_be_between(
    column="amount", min_value=0.01, max_value=50_000, mostly=0.95,
)

Setting mostly=0.95 means the expectation passes as long as at least 95% of rows in the batch satisfy the constraint.

On this page