PipeRider
Search…
⌃K

Assertions

Assert the profiling statistic result
Assertions are the data testing solution in PipeRider. It asserts if the profiling result fulfill a certain rule. There are two type of assertions
  • PipeRider assertions
  • DBT assertions

PipeRider Assertions

PipeRider assertion asserts the profiling result for each run

Assertion files

Assertion files are located in .piperider/assertions/

File naming convention

Assertion files are YAML files and are named according to the data source table name:
<table>.yml
If you opted to generate 'recommended assertions' by piperider generate-assertions, then assertion files will be prepended with 'recommended_':
recommended_<table>.yml

Example assertion file

The following is an except of an assertions file for a movie database table:
# Auto-generated by PipeRider based on table "movies"
movies: # Table Name
# Test Cases for Table
tests:
- metric: row_count
assert:
gte: 8961
tags:
- RECOMMENDED
columns:
title: # Column Name
# Test Cases for Column
tests:
- name: assert_column_schema_type
assert:
schema_type: VARCHAR
tags:
- RECOMMENDED
- name: assert_column_not_null
tags:
- RECOMMENDED

Profile Assertions

Profile assertions are the most common way to define an assertion. You can assert if the profiling statistic fulfill certain rule.
Assertion expressions
Description: Profiling-based assertions are assert the value of a profiling field.
  • Metric: The profile field defined in profling
  • Assert:
    • gte: the value should be greater than or equal to
    • gt: the value should be greater than
    • lte: the value should be less than or equal to
    • lt: the value should be less than
    • eq: the value should equal to
    • ne: the value should not equal to
The row count should be <= 1000000
world_city:
tests:
- metric: row_count
assert:
lte: 1000000
The missing percentage should be <= 0.01
world_city:
columns:
country_code:
tests:
- metric: nulls_p
assert:
lte: 0.01
The median should be between [10, 20]
world_city:
columns:
country_code:
tests:
- metric: p50
assert:
gte: 10
lte: 20

Basic Assertions

Basic assertions are high level assertions to check the if a column is not null, unique. And check if the column value (rather than profiling statistic) fulfill certain rule.
assert_column_unique
  • Description: The values of column must be unique.
  • Assert: None
  • Tags:
world_city:
columns:
country_code:
tests:
- name: assert_column_unique
tags:
- dialing code
assert_column_not_null
  • Description: The values of the column must not be null.
  • Assert: None
  • Tags:
world_city:
columns:
name:
tests:
- name: assert_column_not_null
tags:
- city name
assert_column_value
  • Description: Assert the column value should be in the range.
  • Assert:
    • gte: the value should be greater than or equal to
    • gt: the value should be greater than
    • lte: the value should be less than or equal to
    • lt: the value should be less than
    • in: the value should belong to the set
The value should be between [0,10000)
world_city:
columns:
population:
tests:
- name: assert_column_value
assert:
gte: 0
lt: 10000
The value of a datetime type column should be >= '2022-01-01'
world_city:
columns:
create_at:
tests:
- name: assert_column_value
assert:
gte: '2022-01-01;
The value of the column should belong to ["male", "female"] set
TITANIC:
columns:
Sex:
tests:
- name: assert_column_value
assert:
in: ["male", "female"]

Schema Assertions

assert_column_exist
  • Description: The column must exist.
  • Assert: None
  • Tags:
world_city: #Table Name
columns:
country_code:
tests:
- name: assert_column_exist
tags:
- dialing code
assert_column_type
  • Description: The type of the column must match the specified type.
  • Assert:
    • type: numeric, string, datetime
  • Tags:
world_city:
columns:
name:
tests:
- name: assert_column_type
assert:
type: string
tags:
- city name
assert_column_schema_type
  • Description: The column schema type should match the specific schema type.
  • Assert:
    • schema_type: the schema type in data source. (e.g. TEXT, DATE, VARCHAR(128), ...)
world_city:
columns:
name:
tests:
- name: assert_column_schema_type
assert:
schema_type: TEXT
assert_column_in_types
  • Description: The type of the column must be contained in the list.
  • Assert:
    • types: [string, integer, numeric, datetime, boolean, other]
  • Tags:
world_city: #Table Name
columns:
country_code:
tests:
- name: assert_column_in_types
assert:
types: [string]
tags:
- dialing code

DBT Assertions

PipeRider can also integrate with the dbt test result. To integrate the test result, run piperider with the --dbt-run-results option then the latest run results would be integrated in the run report.
dbt build #or dbt test
piperider run --dbt-run-results