PipeRider
Search
K
Comment on page

Run

"Run" is a single execution of PipeRider on a dbt project. It generates observed results of the dbt project, such as table schema, profiling statistics, metric query results. They are used as the basis for comparison.
The run command execute
  • Collect Metadata: To collect the column names and types for all models, sources, and seeds.
  • Profile statistics: To obtain statistics for a model/seed/source and its columns, including row counts, null values, sum, average, text length, and other information, in order to gain insights into the data distribution of the model. By default, this feature is disabled since it can be resource-intensive. To enable profiling, you need to manually activate it. Please see the profiling document
  • Query metric: A metric represents the computation of a specific column within a time interval, typically calculated on a daily or monthly basis. For example, daily revenue is a simple example of a metric. Metric queries allow you to perform basic queries on a DBT metric, such as retrieving the daily reports for the past 30 days or the monthly reports for the last 12 months. Similar to model profiling, you need to manually enable metric queries. Please see the metric document

Execute a run

To execute, use the run command
piperider run
After the execution of a run, two artifacts are generated under the output directory
  • JSON run result (run.json)
  • HTML report (index.html)
The default output directory is located at .piperider/outputs/<datasource>-<datetime>/ . For ease of use, the latest run would also be sym-linked by .piperider/outputs/latest
You can use the --output to change the output directory
piperider run --output /tmp/myrun

Run with profiling statistics

To profile a model, source, or seed, add the piperider tag to your resource. Then, check if the resource is configured correctly.
--- models/staging/stg_customers.sql
+{{ config(
+ tags=["piperider"]
+)}}
select ...
The following command would list the model you just modified.
dbt list -s tag:piperider
Afterwards, when running piperider run, all models with the piperider tag will be profiled by default.
For more detail, please see Profiling

Run with metric queries

In a dbt project, especially for analytics purpose project, it's common to have several metrics defined for visualization. (e.g. revenue, active users). PipeRider can query the metrics are visualize it in the run report.
To enable a metric query, there are two steps
  1. 1.
    Define a dbt metric
  2. 2.
    Add piperider tag on this metric
Here is an metric example
metrics:
- name: active_users
label: Active Users
model: ref('stg_events')
description: "The active user"
calculation_method: count_distinct
expression: user_id
timestamp: event_time
time_grains: [day, week, month, year]
+ tags: ['piperider']
For more detail, please see Metrics.

Run with selection

By default, PipeRider profiles models, sources, and seeds with piperider tag, and query metrics with piperider tag. However, you can also use additional options to select the specific resources that should be processed.
Use dbt list to select resources
dbt list is a dbt subcommand which allows to select dbt resources by node selection.
dbt list --select <selector> | piperider run --dbt-list
select a model by file path
dbt list -s models/customers.sql| piperider run --dbt-list
select resource with tag piperider-dev
dbt list -s 'tag:piperider-dev' | piperider run --dbt-list
Select the resource to profile
You can also use --table to profile specific resource.
piperider run --table <resource_name>

Advanced: Select data source (The dbt target)

Just as dbt target, you can change the connection target for PipeRider to execute. In PipeRider, we call it data source. we can change the data source by --datasource option
piperider run --datasource <dbt-tareget>
Data source is explicitly defined in the config.yml. If you run PipeRider on dbt project, the data sources is automatically derived from the dbt profile settings. You can use the command to check the current available data sources
piperider config list-datasource