Comment on page
Assertions (deprecated)
Assert the profiling statistic result
PipeRider Assertions are deprecated since v0.25.0. Please replace assertions with the relevant testing methods offered by dbt tests.
Assertions are the data testing solution in PipeRider. It asserts if the profiling result fulfill a certain rule. There are two types of assertions
- PipeRider assertions
- DBT assertions
PipeRider assertion asserts the profiling result for each run
Assertion files are located in
.piperider/assertions/
Assertion files are YAML files and are named according to the data source table name:
<table>.yml
If you opted to generate 'recommended assertions' by
piperider generate-assertions
, then assertion files will be prepended with 'recommended_':recommended_<table>.yml
The following is an except of an assertions file for a movie database table:
# Auto-generated by PipeRider based on table "movies"
movies: # Table Name
# Test Cases for Table
tests:
- metric: row_count
assert:
gte: 8961
tags:
- RECOMMENDED
columns:
title: # Column Name
# Test Cases for Column
tests:
- name: assert_column_schema_type
assert:
schema_type: VARCHAR
tags:
- RECOMMENDED
- name: assert_column_not_null
tags:
- RECOMMENDED
Profile assertions are the most common way to define an assertion. You can assert if the profiling statistic fulfill certain rule.
Description: Profiling-based assertions are assert the value of a profiling field.
- Assert:
gte
: the value should be greater than or equal togt
: the value should be greater thanlte
: the value should be less than or equal tolt
: the value should be less thaneq
: the value should equal tone
: the value should not equal to
The row count should be <= 1000000
world_city:
tests:
- metric: row_count
assert:
lte: 1000000
The missing percentage should be <= 0.01
world_city:
columns:
country_code:
tests:
- metric: nulls_p
assert:
lte: 0.01
The median should be between [10, 20]
world_city:
columns:
country_code:
tests:
- metric: p50
assert:
gte: 10
lte: 20
Basic assertions are high level assertions to check the if a column is not null, unique. And check if the column value (rather than profiling statistic) fulfill certain rule.
- Description: Assert the column value should be in the range.
- Assert:
gte
: the value should be greater than or equal togt
: the value should be greater thanlte
: the value should be less than or equal tolt
: the value should be less thanin
: the value should belong to the set
The value should be between [0,10000)
world_city:
columns:
population:
tests:
- name: assert_column_value
assert:
gte: 0
lt: 10000
The value of a datetime type column should be
>= '2022-01-01'
world_city:
columns:
create_at:
tests:
- name: assert_column_value
assert:
gte: '2022-01-01;
The value of the column should belong to ["male", "female"] set
TITANIC:
columns:
Sex:
tests:
- name: assert_column_value
assert:
in: ["male", "female"]
PipeRider can also integrate with the dbt test result. To integrate the test result, run piperider with the
--dbt-run-results
option then the latest run results would be integrated in the run report.dbt build #or dbt test
piperider run --dbt-run-results
From version 0.26.0 dbt test results are included by default and it is not neccessary to use the
--dbt-run-results
option.Last modified 4mo ago