Run
Run is the single invocation of the piperider on the dbt project. It produces observed results of the dbt project such as profiling statistic, metric query results, and test results (assertions)
To execute, use the
run
commandpiperider run
It will
- Connect to the data warehouse by the default target in your dbt profile
- Profile all table models in the dbt project and produce the schema information and profiling statistic
- Query the dbt metrics
- Assert profiling statistic to check if the value fulfill certain rule
Just as dbt target, you can change the connection target for PipeRider to execute. In PipeRider, we call it data source. we can change the data source by
--datasource
optionpiperider run --datasource <datasource>
Data source is explicitly defined in the config.yml. If you run PipeRider on dbt project, the data sources is automatically derived from the dbt profile settings. You can use the command to check the current available data sources
piperider config list-datasource
By default, PipeRider provide all the table models in your dbt project. However, profiling is an expansive and time consuming operation. PipeRider provides several mechanism to control the models to profile.
Profile one model
The simplest way is use the
--table <model>
optionpiperider run --table <you-model-name>
Use dbt list output to select models
dbt list --select <selector> | piperider run --dbt-list
Use tag to mark selected models
#.piperider/config.yml
dataSources: []
dbt:
projectDir: .
tag: 'piperider'
Once the
dbt.tag
is set, PipeRider profile only models with the specified tag#models/staging/stg_payments.sql
{{ config(
tags=["piperider"]
) }}
select ...
dbt build
piperider run --dbt-run-results
PipeRider can also integrate this information, the run will
- Run only the executed models
- Integrate the test result of dbt
If the
dbt.tag
is configured in the PipeRider config. PipeRider only run the tagged models which is executed in the latest dbt run.A use case for the dbt run results integration is to leverage the dbt state and deferral. Dbt can run only the new and changed model then PipeRider profile only on these changed models.
# Run only the new and changed models
dbt run \
--select result:<status>+ state:modified+ \
--defer \
--state ./<dbt-artifact-path>
# Run only the excuted models by latest dbt run
piperider run --dbt-run-results
After the execution of a run, two artifacts are generated under the output directory
- JSON run result (
run.json
) - HTML report (
index.html
)
The default output directory is located at
.piperider/outputs/<datasource>-<datetime>/
. For ease of use, the latest run would also be sym-linked by .piperider/outputs/latest
You can use the
--output
to change the output directorypiperider run --output /tmp/myrun
PipeRider is more than a profiler. In a dbt project, especially for analytics purpose project, it's common to have several metrics defined for visualization. (e.g. revenue, active users). PipeRider can query the metrics are visualize it in the run report. For more detail, please see Metrics.
PipeRider also provides the testing mechanism to assert if the profiling statistics fulfill certain rules. For more detail, please see Assertions
Last modified 1mo ago