Run
"Run" is a single execution of PipeRider on a dbt project. It generates observed results of the dbt project, such as table schema, profiling statistics, metric query results. They are used as the basis for comparison.
The run
command execute
Collect Metadata: To collect the column names and types for all models, sources, and seeds.
Profile statistics: To obtain statistics for a model/seed/source and its columns, including row counts, null values, sum, average, text length, and other information, in order to gain insights into the data distribution of the model. By default, this feature is disabled since it can be resource-intensive. To enable profiling, you need to manually activate it. Please see the profiling document
Query metric: A metric represents the computation of a specific column within a time interval, typically calculated on a daily or monthly basis. For example, daily revenue is a simple example of a metric. Metric queries allow you to perform basic queries on a DBT metric, such as retrieving the daily reports for the past 30 days or the monthly reports for the last 12 months. Similar to model profiling, you need to manually enable metric queries. Please see the metric document
Execute a run
To execute, use the run
command
After the execution of a run, two artifacts are generated under the output directory
JSON run result (
run.json
)HTML report (
index.html
)
The default output directory is located at .piperider/outputs/<datasource>-<datetime>/
. For ease of use, the latest run would also be sym-linked by .piperider/outputs/latest
You can use the --output
to change the output directory
Run with profiling statistics
To profile a model, source, or seed, add the piperider
tag to your resource. Then, check if the resource is configured correctly.
The following command would list the model you just modified.
Afterwards, when running piperider run
, all models with the piperider
tag will be profiled by default.
For more detail, please see Profiling
Run with metric queries
In a dbt project, especially for analytics purpose project, it's common to have several metrics defined for visualization. (e.g. revenue, active users). PipeRider can query the metrics are visualize it in the run report.
To enable a metric query, there are two steps
Define a dbt metric
Add
piperider
tag on this metric
Here is an metric example
For more detail, please see Metrics.
Run with selection
By default, PipeRider profiles models, sources, and seeds with piperider
tag, and query metrics with piperider
tag. However, you can also use additional options to select the specific resources that should be processed.
Use dbt list to select resources
dbt list
is a dbt subcommand which allows to select dbt resources by node selection.
select a model by file path
select resource with tag piperider-dev
Select the resource to profile
You can also use --table
to profile specific resource.
Advanced: Select data source (The dbt target)
Just as dbt target, you can change the connection target for PipeRider to execute. In PipeRider, we call it data source. we can change the data source by --datasource
option
Data source is explicitly defined in the config.yml. If you run PipeRider on dbt project, the data sources is automatically derived from the dbt profile settings. You can use the command to check the current available data sources
Last updated