Pipelines

Introduction

Pipelines are a unit of work in Bedrock. Each pipeline requires a Git repository, which holds the source code and bedrock.hcl configuration file.

A pipeline can be created via either the UI or API. Configuration for a pipeline such as base image, library dependencies and run command are encapsulated in configuration hcl file in the same repository.

With this configuration as code and version control, pipeline runs are reproducible and fit to use with a scheduler via bedrock API.

Pipelines have a maximum run time of 72 hours. This is to avoid pipelines hogging resources and generating potentially large cloud service provider bills. If this is too short for you, please contact our team at support@basis-ai.com to enable higher limits for your account.

Types of pipelines

There are 2 types of pipelines: training pipeline and batch scoring pipeline.

Training pipeline

Training pipeline is meant to output a Model Version after each run.

Configuration

This type of pipeline is configured using train stanza in bedrock.hcl.

In addition to Git repository as input, each training pipeline has a target Model to hold the result of different runs.

Storing model binary

There is a special directory mounted at runtime of training pipeline: /artefact . This folder is meant to store model binary generated after each training run. It is compressed and upload to corresponding cloud provider's blob storage service (e.g S3, GCS).

The compressed binary can be found and is downloadable from pipeline's corresponding Model page. It is also accessible via API.

Batch scoring pipeline

Batch scoring pipeline make use of a Model Version to make batch inference given new data.

Configuration

This type of pipeline is configured using batch_scoresection of bedrock.hcl.

When running a batch scoring pipeline, id of Model Version that is used for inferencing is required. You can specify this via API or UI.

Retrieving model binary

The id of Model Version specified when running batch scoring pipeline will be used to download and decompress model binary to /artefact directory, much like how it is for Training pipeline.

Storing inference results

Inference results can be written to bigquery or other data storage. For example, Bigquery Python client library can be used to write to Bigquery.

Credentials to set up data storage can be passed in via pipeline's Secrets.