Assertions
Assert the profiling statistic result
Assertions are the data testing solution in PipeRider. It asserts if the profiling result fulfill a certain rule. There are two type of assertions
- PipeRider assertions
- DBT assertions
PipeRider assertion asserts the profiling result for each run
Assertion files are located in
.piperider/assertions/
Assertion files are YAML files and are named according to the data source table name:
<table>.yml
If you opted to generate 'recommended assertions' by
piperider generate-assertions
, then assertion files will be prepended with 'recommended_':recommended_<table>.yml
The following is an except of an assertions file for a movie database table:
# Auto-generated by PipeRider based on table "movies"
movies: # Table Name
# Test Cases for Table
tests:
- metric: row_count
assert:
gte: 8961
tags:
- RECOMMENDED
columns:
title: # Column Name
# Test Cases for Column
tests:
- name: assert_column_schema_type
assert:
schema_type: VARCHAR
tags:
- RECOMMENDED
- name: assert_column_not_null
tags:
- RECOMMENDED
Profile assertions are the most common way to define an assertion. You can assert if the profiling statistic fulfill certain rule.
Description: Profiling-based assertions are assert the value of a profiling field.
- Assert:
gte
: the value should be greater than or equal togt
: the value should be greater thanlte
: the value should be less than or equal tolt
: the value should be less thaneq
: the value should equal tone
: the value should not equal to
The row count should be <= 1000000
world_city:
tests:
- metric: row_count
assert:
lte: 1000000
The missing percentage should be <= 0.01
world_city:
columns:
country_code:
tests:
- metric: nulls_p
assert:
lte: 0.01
The median should be between [10, 20]
world_city:
columns:
country_code:
tests:
- metric: p50
assert:
gte: 10
lte: 20
Basic assertions are high level assertions to check the if a column is not null, unique. And check if the column value (rather than profiling statistic) fulfill certain rule.
- Description: Assert the column value should be in the range.
- Assert:
gte
: the value should be greater than or equal togt
: the value should be greater thanlte
: the value should be less than or equal tolt
: the value should be less thanin
: the value should belong to the set
The value should be between [0,10000)
world_city:
columns:
population:
tests:
- name: assert_column_value
assert:
gte: 0
lt: 10000
The value of a datetime type column should be
>= '2022-01-01'
world_city:
columns:
create_at:
tests:
- name: assert_column_value
assert:
gte: '2022-01-01;
The value of the column should belong to ["male", "female"] set
TITANIC:
columns:
Sex:
tests:
- name: assert_column_value
assert:
in: ["male", "female"]
PipeRider can also integrate with the dbt test result. To integrate the test result, run piperider with the
--dbt-run-results
option then the latest run results would be integrated in the run report.dbt build #or dbt test
piperider run --dbt-run-results
Last modified 1mo ago