Getting Started Run Assertions (deprecated) Assert the profiling statistic result
PipeRider Assertions are deprecated since v0.25.0. Please replace assertions with the relevant testing methods offered by dbt tests .
Assertions are the data testing solution in PipeRider. It asserts if the profiling result fulfill a certain rule. There are two types of assertions
PipeRider Assertions
PipeRider assertion asserts the profiling result for each run
Assertion files
Assertion files are located in .piperider/assertions/
File naming convention
Assertion files are YAML files and are named according to the data source table name:
<table>.yml
If you opted to generate 'recommended assertions' by piperider generate-assertions
, then assertion files will be prepended with 'recommended_':
recommended_<table>.yml
Example assertion file
The following is an except of an assertions file for a movie database table:
Copy # Auto-generated by PipeRider based on table "movies"
movies: # Table Name
# Test Cases for Table
tests:
- metric: row_count
assert:
gte: 8961
tags:
- RECOMMENDED
columns:
title: # Column Name
# Test Cases for Column
tests:
- name: assert_column_schema_type
assert:
schema_type: VARCHAR
tags:
- RECOMMENDED
- name: assert_column_not_null
tags:
- RECOMMENDED
Profile Assertions
Profile assertions are the most common way to define an assertion. You can assert if the profiling statistic fulfill certain rule.
Assertion expressionsDescription: Profiling-based assertions are assert the value of a profiling field.
Metric: The profile field defined in profling
Assert:
gte
: the value should be greater than or equal to
gt
: the value should be greater than
lte
: the value should be less than or equal to
lt
: the value should be less than
eq
: the value should equal to
ne
: the value should not equal to
The row count should be <= 1000000
Copy world_city:
tests:
- metric: row_count
assert:
lte: 1000000
The missing percentage should be <= 0.01
Copy world_city:
columns:
country_code:
tests:
- metric: nulls_p
assert:
lte: 0.01
The median should be between [10, 20]
Copy world_city:
columns:
country_code:
tests:
- metric: p50
assert:
gte: 10
lte: 20
Basic Assertions
Basic assertions are high level assertions to check the if a column is not null, unique . And check if the column value (rather than profiling statistic) fulfill certain rule.
assert_column_uniqueDescription: The values of column must be unique.
Copy world_city:
columns:
country_code:
tests:
- name: assert_column_unique
tags:
- dialing code
assert_column_not_nullDescription: The values of the column must not be null.
Copy world_city:
columns:
name:
tests:
- name: assert_column_not_null
tags:
- city name
assert_column_valueDescription: Assert the column value should be in the range.
Assert:
gte
: the value should be greater than or equal to
gt
: the value should be greater than
lte
: the value should be less than or equal to
lt
: the value should be less than
in
: the value should belong to the set
The value should be between [0,10000)
Copy world_city:
columns:
population:
tests:
- name: assert_column_value
assert:
gte: 0
lt: 10000
The value of a datetime type column should be >= '2022-01-01'
Copy world_city:
columns:
create_at:
tests:
- name: assert_column_value
assert:
gte: '2022-01-01;
The value of the column should belong to ["male", "female"] set
Copy TITANIC:
columns:
Sex:
tests:
- name: assert_column_value
assert:
in: ["male", "female"]
Schema Assertions
assert_column_existDescription: The column must exist.
Copy world_city: #Table Name
columns:
country_code:
tests:
- name: assert_column_exist
tags:
- dialing code
assert_column_typeDescription: The type of the column must match the specified type.
Assert:
type: numeric, string, datetime
Copy world_city:
columns:
name:
tests:
- name: assert_column_type
assert:
type: string
tags:
- city name
assert_column_schema_typeDescription: The column schema type should match the specific schema type.
Assert:
schema_type: the schema type in data source. (e.g. TEXT
, DATE
, VARCHAR(128)
, ...)
Copy world_city:
columns:
name:
tests:
- name: assert_column_schema_type
assert:
schema_type: TEXT
assert_column_in_typesDescription: The type of the column must be contained in the list.
Assert:
types: [string, integer, numeric, datetime, boolean, other]
Copy world_city: #Table Name
columns:
country_code:
tests:
- name: assert_column_in_types
assert:
types: [string]
tags:
- dialing code
DBT Assertions
PipeRider can also integrate with the dbt test result. To integrate the test result, run piperider with the --dbt-run-results
option then the latest run results would be integrated in the run report.
Copy dbt build #or dbt test
piperider run --dbt-run-results
From version 0.26.0 dbt test results are included by default and it is not neccessary to use the --dbt-run-results
option.