The aim of this tutorial is to perform a standardized infrastructure benchmark in the Seqera Platform. At the end of this tutorial you will have ran a number of pipelines and collected performance metrics with which you can evaluate your infrastructure.
This repository provides YAML templates to set up and run infrastructure benchmarks on the Seqera Platform. These templates are designed to streamline the process of creating and executing standardized benchmarks across different computing environments.
Note: Users need to customize these templates for their specific infrastructure:
-
Compute Environment: Modify configurations to match your setup (buckets, computing regions, networking, etc.).
-
Pipeline: Update pipeline configurations with specific details for the workflow you will benchmark (URL, revision, profile, parameters, etc.).
These customizations ensure the benchmarks accurately reflect your infrastructure's performance for the workflow you will assess.
This tutorial has been split up into 5 main components that you will need to complete in order:
- Introduction to using Seqerakit and setting up your environment
- Setup compute environments
- Setup pipelines for benchmarking
- Run benchmarks
- Generate benchmarking reports
Before starting this tutorial, ensure you have the following prerequisites in place:
-
Access to a Seqera Platform instance with:
- A Workspace
- Maintain user permissions or higher within the Workspace
- An Access token for the Seqera Platform CLI
-
Software dependencies installed:
Before continuing with the tutorial, please refer to the installation guide to ensure you have access to all of the required software dependencies and established connectivity to the Seqera Platform via the
seqerakit
command-line interface. -
AWS resources, data and configurations:
- AWS credentials set up in the Seqera Platform workspace
- Correct IAM permissions for Batch Forge (if using)
- An S3 bucket for the Nextflow work directory
- An S3 bucket for saving workflow outputs
- An S3 bucket containing the input samplesheet (or uploaded to the workspace as a Dataset)
- Split Cost Allocation tracking set up in your AWS account with activated tags (see this guide)
Note: Ensure that the
taskHash
label has also been activated. The guide was recently amended to include this label to enable retrieval of task costs for each unique hash without relying on the task names themselves. -
If using private repositories, add your GitHub (or other VCS provider) credentials to the Seqera Platform workspace
-
Familiarity with:
- Basic YAML file format
- Environment variables
- Linux command line and common shell operations
- Seqera Platform and its features
After ensuring all these prerequisites are met, you'll be ready to proceed with the tutorial steps for setting up and running infrastructure benchmarks on the Seqera Platform.
We will perform this analysis in an automated manner using a Python package called seqerakit
, an infrastructure-as-code tool for configuring Seqera Platform resources.
seqerakit
is a Python wrapper for the Seqera Platform CLI which can be leveraged to automate the creation of all of the entities in Seqera Platform via a simple configuration file in YAML format.
Seqerakit offers simple YAML-based configuration, infrastructure-as-code capabilities, and end-to-end automation for creating entities within Seqera Platform. For a demonstration of Seqerakit in action, watch the 'Automation on the Seqera Platform' talk by Harshil Patel at the Nextflow Summit, Barcelona 2023.
- Seqera website
- nf-core website
- Seqera Platform docs
- Seqera Platform API
- Seqera Platform CLI
seqerakit
If you have further questions, comments or suggestions please don't hesitate to reach out to us by:
- Contacting your Account Executive
- Contacting us at [email protected]