PipeRider
Search
⌃K

Assertions (deprecated)

Assert the profiling statistic result
Assertion is deprecated since v0.25.0. The assertion functionality overlaps too much with dbt test. Please replace the assertions with the relevant functionalities offered by dbt test.
Assertions are the data testing solution in PipeRider. It asserts if the profiling result fulfill a certain rule. There are two types of assertions
  • PipeRider assertions
  • DBT assertions

PipeRider Assertions

PipeRider assertion asserts the profiling result for each run

Assertion files

Assertion files are located in .piperider/assertions/

File naming convention

Assertion files are YAML files and are named according to the data source table name:
<table>.yml
If you opted to generate 'recommended assertions' by piperider generate-assertions, then assertion files will be prepended with 'recommended_':
recommended_<table>.yml

Example assertion file

The following is an except of an assertions file for a movie database table:
# Auto-generated by PipeRider based on table "movies"
movies: # Table Name
# Test Cases for Table
tests:
- metric: row_count
assert:
gte: 8961
tags:
- RECOMMENDED
columns:
title: # Column Name
# Test Cases for Column
tests:
- name: assert_column_schema_type
assert:
schema_type: VARCHAR
tags:
- RECOMMENDED
- name: assert_column_not_null
tags:
- RECOMMENDED

Profile Assertions

Profile assertions are the most common way to define an assertion. You can assert if the profiling statistic fulfill certain rule.
Assertion expressions
Description: Profiling-based assertions are assert the value of a profiling field.
  • Metric: The profile field defined in profling
  • Assert:
    • gte: the value should be greater than or equal to
    • gt: the value should be greater than
    • lte: the value should be less than or equal to
    • lt: the value should be less than
    • eq: the value should equal to
    • ne: the value should not equal to
The row count should be <= 1000000
world_city:
tests:
- metric: row_count
assert:
lte: 1000000
The missing percentage should be <= 0.01
world_city:
columns:
country_code:
tests:
- metric: nulls_p
assert:
lte: 0.01
The median should be between [10, 20]
world_city:
columns:
country_code:
tests:
- metric: p50
assert:
gte: 10
lte: 20

Basic Assertions

Basic assertions are high level assertions to check the if a column is not null, unique. And check if the column value (rather than profiling statistic) fulfill certain rule.
assert_column_unique
  • Description: The values of column must be unique.
  • Assert: None
  • Tags:
world_city:
columns:
country_code:
tests:
- name: assert_column_unique
tags:
- dialing code
assert_column_not_null
  • Description: The values of the column must not be null.
  • Assert: None
  • Tags:
world_city:
columns:
name:
tests:
- name: assert_column_not_null
tags:
- city name
assert_column_value
  • Description: Assert the column value should be in the range.
  • Assert:
    • gte: the value should be greater than or equal to
    • gt: the value should be greater than
    • lte: the value should be less than or equal to
    • lt: the value should be less than
    • in: the value should belong to the set
The value should be between [0,10000)
world_city:
columns:
population:
tests:
- name: assert_column_value
assert:
gte: 0
lt: 10000
The value of a datetime type column should be >= '2022-01-01'
world_city:
columns:
create_at:
tests:
- name: assert_column_value
assert:
gte: '2022-01-01;
The value of the column should belong to ["male", "female"] set
TITANIC:
columns:
Sex:
tests:
- name: assert_column_value
assert:
in: ["male", "female"]

Schema Assertions

assert_column_exist
  • Description: The column must exist.
  • Assert: None
  • Tags:
world_city: #Table Name
columns:
country_code:
tests:
- name: assert_column_exist
tags:
- dialing code
assert_column_type
  • Description: The type of the column must match the specified type.
  • Assert:
    • type: numeric, string, datetime
  • Tags:
world_city:
columns:
name:
tests:
- name: assert_column_type
assert:
type: string
tags:
- city name
assert_column_schema_type
  • Description: The column schema type should match the specific schema type.
  • Assert:
    • schema_type: the schema type in data source. (e.g. TEXT, DATE, VARCHAR(128), ...)
world_city:
columns:
name:
tests:
- name: assert_column_schema_type
assert:
schema_type: TEXT
assert_column_in_types
  • Description: The type of the column must be contained in the list.
  • Assert:
    • types: [string, integer, numeric, datetime, boolean, other]
  • Tags:
world_city: #Table Name
columns:
country_code:
tests:
- name: assert_column_in_types
assert:
types: [string]
tags:
- dialing code

DBT Assertions

PipeRider can also integrate with the dbt test result. To integrate the test result, run piperider with the --dbt-run-results option then the latest run results would be integrated in the run report.
dbt build #or dbt test
piperider run --dbt-run-results
From version 0.26.0 dbt test results are included by default and it is not neccessary to use the --dbt-run-results option.