AWS S3 + GitHub CI
This is a HOW-TO of generating PipeRider reports, save it in AWS S3 and comparing the latest report with previously saved report in S3 by GitHub CI.
The PipeRider profiling report could give you an overview of your data from time to time. Integrating PipeRider into your CI workflow could benefit you with the auto-generated data profiling report, furthermore, you could save the report of each run in AWS S3 and have a comparison report of latest two run. Moreover, adding the Slack incoming webhook in the workflow to receive these reports from Slack in real time.
In this HowTo, we will show the scenario below
- 1.Generate a PipeRider latest run report and upload to S3
- 2.Download a previously saved report from S3
- 3.Compare these two report and upload the comparison report to S3
- 4.Push a notification to Slack with the links of latest run report and the comparison report
Prepare a S3 bucket. Enable ACL under Permissions tab and Static website hosting under Properties tab.
Enable Static website hosting
Prepare a user account for the aws-cli and save the generated key pair.
You can create your own or just replace
DOC-EXAMPLE-BUCKET1with your bucket in the following context and import it.
Prepare your repository with following configurations and files.
- AWS_ACCESS_KEY_ID The Access key ID for the aws-cli
- AWS_SECRET_ACCESS_KEY The Secret access key for the aws-cli
- AWS_DEFAULT_REGION The default AWS region
- PIPERIDER_BUCKET_NAME The S3 bucket name
- SLACK_INCOMING_WEBHOOK The WebHook URL. You will need to install the Incoming WebHooks integration into Slack and create a configuration that specifies a channel where notifications go to. Then you will have the url.
A workflow yaml is required to GitHub CI. In your repo, create the path, necessary directories and the file of
.github/workflows/piperider.yml. In this file, we define a event of pushing to main branch will trigger the workflow.
There are two major steps in the workflow:
Step: Install PipeRider and check tools
We need PipeRider, aws-cli, and curl tools. The Ubuntu provided by GitHub has installed aws-cli and curl tools. So in this step, PipeRider installation is the only required
Step: get-started project
Create the file with the following script and put it at the root of the repo. It will run through the scenario.
- decide the output path and create the necessary directory/sub-directory
- decide the name of the output directory by the current datetime
- create the directory/sub-directory
- fetch the name of the
previousreport from S3
- generate a single latest report by
-oto specify where to save the copy of generated report
- upload the latest report to S3
- download the
previousreport from S3 to the default
- make a comparison by
—lastwill compare the latest two reports. One we generated, the other is one we downloaded from S3.
-owill save a copy of the comparison report where you specify
- upload the comparison report to S3
- Publish and notify
- form two URLs for the latest run report and the comparison report
- upload reports to S3 by
aws s3 syncwith
--acl public-readfor public accessibility
- push a notification containing links of two reports hosted in S3 to Slack
Try to push a commit to the main branch of your repository to trigger the workflow, then check the S3 bucket.