Skip to main content

03 - Cloud & Cloud-Native Engineering

Cloud Computing

  • Computing & Storage (IaaS)
    • Amazon EC2 - A web service that provides secure, resizable compute capacity in the cloud
    • Amazon EBS - An easy-to-use, high-performance block storage service designed for use with Amazon Elastic Compute Cloud
    • Azure Virtual Machines - A service to provision Windows and Linux virtual machines in seconds
    • Azure Disk Storage - A high-performance, durable block storage for Azure Virtual Machines
    • Google Cloud Compute Engine - A customizable compute service that lets you create and run virtual machines on Google's infrastructure
  • Networking
    • Amazon VPC - A service that lets you launch AWS resources in a logically isolated virtual network that you define
    • Amazon ELB - A service that automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions
    • Azure Virtual Network - The fundamental building block for your private network in Azure access to high-performance networking
    • Azure Load Balancer - A service that allows you to distribute traffic to your backend virtual machines
    • Google Cloud VPC - A virtual version of a physical network that is implemented inside of Google's production network by using Andromeda
    • Cloud Load Balancing - A fully distributed, software-defined, managed service for all your traffic
  • Application Hosting Platform (PaaS)
    • Azure App Service - An HTTP-based service for hosting web applications, REST APIs, and mobile back ends
    • AWS Elastic Beanstalk - An easy-to-use service for deploying and scaling web applications and services
    • Google Cloud App Engine - A fully managed, serverless platform for developing and hosting web applications at scale
  • Cloud Emulators
    • LocalStack - A fully functional local cloud stack to develop and test your cloud and serverless apps offline

Infrastructure as Code (IaC)

  • Infrastructure as Code
    • Hashicorp Terraform - An infrastructure as code tool that lets you build, change, and version infrastructure safely and efficiently
    • OpenTofu - An open-source alternative to Terraform
    • Pulumi - An infrastructure as code platform that allows you to use familiar programming languages and tools to build, deploy, and manage cloud infrastructure
  • Configuration Management & Automation
    • Ansible - An open source IT automation engine that automates provisioning, configuration management, application deployment, orchestration, and many other IT processes
    • cloud-init - The standard for customising cloud instances
  • Image Building
    • Hashicorp Packer - A tool for creating identical machine images for multiple platforms from a single source configuration
  • Terraform/OpenTofu Ecosystem
    • Terraform/OpenTofu Provider: Core Functions - A Terraform/OpenTofu provider for performing core functions
    • TerraGrant - A thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state
    • TerraTest - A Go library that provides patterns and helper functions for testing infrastructure
    • Atmos - A universal tool for DevOps and Cloud Engineering that orchestrates workflows and simplifies the management of infrastructure
    • GitLab-managed Terraform/OpenTofu state - A feature that allows you to store your Terraform state files in GitLab
    • tf.libsonnet - A collection of Jsonnet libraries for generating Terraform code
    • terraform-docs - A utility to generate documentation from Terraform modules in various output formats
    • Terraformer - A CLI tool to generate terraform files from existing infrastructure
  • Vender-specific Tools
    • AWS CloudFormation - A service that helps you model and set up your Amazon Web Services resources
    • AWS CDK - An open source software development framework to define your cloud application resources using familiar programming languages
    • AWS SAM - An open-source framework for building serverless applications
    • Azure Resource Manager - The deployment and management service for Azure
      • Bicep language - A domain-specific language (DSL) that uses declarative syntax to deploy Azure resources

Version Control System

  • Distributed Version Control - A form of version control where the complete codebase, including its full history, is mirrored on every developer's computer
    • Git - A free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
      • local repository, remote repository
      • branch, tag, worktree
      • push, pull, fetch, rebase, reset, stash
      • staging, commit
    • git lfs - An open source Git extension for versioning large files
    • Informative git prompt for bash and fish - A bash prompt that displays information about the current git repository
    • lazygit - A simple terminal UI for git commands
    • Git Interactive Rebase Tool - An improved sequence editor for Git
    • BFG Repo-Cleaner - A simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history
    • git filter-repo - A versatile tool for rewriting history
    • degit - Straightforward project scaffolding
    • Git Lint - A command line interface for linting Git commits by ensuring you maintain a clean, easy to read, debuggable, and maintainable project history
    • git cliff - A highly customizable changelog generator
    • pre-commit - A framework for managing and maintaining multi-language pre-commit hooks
    • TortoiseGit - A Windows Shell Interface to Git and based on TortoiseSVN
  • Git hosting services
    • GitLab SCM - The single source of truth for collaborating on code and projects
    • Gitea - A painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
    • Codeberg - A community-led effort that provides Git hosting and other services for free and open source projects
    • Forgejo - A self-hosted lightweight software forge
    • Soft Serve - A tasty, self-hostable Git server for the command line
    • Azure Repos - A set of version control tools that you can use to manage your code
    • GitHub - The AI-powered developer platform to build, scale, and deliver secure software
  • Practices
    • Trunk Based Development - A source-control branching model, where developers collaborate on code in a single branch called 'trunk', resist any pressure to create other long-lived development branches by employing documented techniques
  • Conventions
    • keep a changelog - A file which contains a curated, chronologically ordered list of notable changes for each version of a project
    • Conventional Commits - A lightweight convention on top of commit messages
    • Semantic Versioning - A simple set of rules and requirements that dictate how version numbers are assigned and incremented
      • semver - A semantic versioner for npm
  • AI commit tools
    • OpenCommit - Auto-generate meaningful commits in a second
    • AI Commits - A CLI that writes your git commit messages for you with AI

Containerization

Standards & Utilities

  • Containerization - A form of operating-system-level virtualization
  • Linux Distros for Containers
    • Alpine Linux - A security-oriented, lightweight Linux distribution based on musl libc and busybox
      • apk-tools - A package manager originally built for Alpine Linux
    • Flatcar Container Linux - An immutable Linux distribution for containers
  • Utilities in Containers
    • busybox - A single small executable that combines tiny versions of many common UNIX utilities
  • The Open Container Initiative (OCI) - An open governance structure for the express purpose of creating open industry standards around container formats and runtimes
  • Containers for Development
    • Development Containers - An open specification for enriching containers with development-specific settings, tools, and configuration

Runtimes & Tools

  • Container Engines
    • Docker Engine - An open source containerization technology for building and containerizing your applications
      • docker-compose - A tool for defining and running multi-container Docker applications
    • containerd - An industry-standard container runtime with an emphasis on simplicity, robustness and portability
      • nerdctl - A Docker-compatible CLI for containerd
      • ctr - An unsupported debug and administrative client for interacting with the containerd daemon
    • podman - A powerful container engine for building, managing, and running containers and pods
  • Image Building Tools
    • Docker Build - A part of the Docker Engine that automates the process of creating a Docker image from a Dockerfile and a context
    • buildah - A tool that facilitates building Open Container Initiative (OCI) container images
    • Kaniko - A tool to build container images from a Dockerfile, inside a container or Kubernetes cluster
  • Image Inspection & Management Tools
    • skopeo - A command line utility that performs various operations on container images and image repositories
    • dive - A tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image
  • TUI & Helper Tools
    • lazydocker - A terminal UI for both docker and docker-compose
  • Local Environment Provisioners
    • Colima - A tool that provides container runtimes on macOS (and Linux) with minimal setup

Registries

  • Container Registries
    • GitLab Container Registry - A secure and private registry for Docker images
    • Nexus Repository Manager 3 - A sophisticated repository manager
    • Amazon ECR - A fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts
    • Azure Container Registry - A private registry for managing container images and related artifacts
    • Harbor - An open source registry that secures artifacts with policies and role-based access control

WebAssembly

  • Standards
    • WebAssembly - A binary instruction format for a stack-based virtual machine
    • WebAssembly System Interface (WASI) - A modular system interface for WebAssembly
    • WASIX - The long term stabilization and support of the existing WASI ABI plus additional non-invasive syscall extensions
  • WebAssembly Runtimes
    • wazero - The only zero dependency WebAssembly runtime written in Go
    • Wasmtime - A fast and secure runtime for WebAssembly
    • Wasmer - A blazing fast and secure WebAssembly runtime that enables incredibly lightweight containers to run anywhere

Kubernetes

  • Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications
  • Architecture
    • Master node
      • kube-apiserver - Responsible for API services
      • kube-scheduler - Responsible for scheduling
      • kube-controller-manager - Responsible for container orchestration
    • Compute node
      • kubelet - watches the API server for pods on that node and makes sure they are running
      • cAdvisor - collects metrics about pods running on that particular node
      • kube-proxy - watches the API server for pods/services changes in order to maintain the network up to date
      • container runtime - responsible for managing container images and running containers on that node
  • Interface Standards
    • CNI (Container Networking Interface)
      • Calico - A networking and security solution that enables Kubernetes workloads and non-Kubernetes/legacy workloads to communicate seamlessly and securely
      • Cilium - An open source, cloud native solution for providing, securing, and observing network connectivity between workloads, fueled by the revolutionary Kernel technology eBPF
    • CSI (Container Storage Interface)
    • CRI (Container Runtime Interface)
      • cri-o - An implementation of the Kubernetes CRI (Container Runtime Interface) to enable using OCI (Open Container Initiative) compatible runtimes
      • cri-tools - A set of tools for CRI
  • Workloads - The objects you use to manage and run your containers on the cluster
    • Pod
      • assignment - The process of constraining a Pod so that it is restricted to run on particular nodes, or to prefer to run on particular nodes
      • taint and toleration - A mechanism that allows you to ensure that pods are not placed on inappropriate nodes
      • lifecycle - The lifecycle of a Pod
      • liveness probe - A probe the kubelet uses to know when to restart a container
      • requests and limits
      • eviction
    • Deployment, ReplicaSet, StatefulSet, DaemonSet
  • Services, Load Balancing & Networking
    • Kubernetes network model - A set of fundamental requirements and principles for networking in a Kubernetes cluster
    • Service, Ingress, Ingress Controllers
  • Storage - A powerful volume subsystem with an API that abstracts how storage is provided and consumed
    • PersistentVolume, PVC, StorageClass
  • Configuration - A range of mechanisms that let you inject configuration data into the Pods that run your applications
    • Secret, ConfigMap
  • Security & Policy
    • Kubernetes RBAC - A method of regulating access to computer or network resources based on the roles of individual users within an enterprise
    • PodDisruptionBudget - An object that limits the number of concurrent disruptions that your application experiences, allowing for high availability
    • Security context - A definition of privilege and access control settings for a Pod or Container
  • Autoscaling
    • HPA - The component that automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
    • Cluster Autoscaler - A tool that automatically adjusts the size of the Kubernetes cluster

Kubernetes Ecosystem

  • Application Packaging & Configuration
    • Helm - The package manager for Kubernetes
    • Kustomize - A standalone tool to customize Kubernetes objects through a kustomization file
  • Developer Workflow Tools
    • Skaffold - A command line tool that facilitates continuous development for container-based applications
  • Platform Extensions
    • kube-fencing - A solution for fencing of stateful application's nodes in kubernetes
    • KubeVirt - A virtual machine management add-on for Kubernetes
  • Operator & Controller Development
    • Kubebuilder - A framework for building Kubernetes APIs using custom resource definitions (CRDs)
  • CLI Plugin Management
    • Krew - The plugin manager for kubectl command-line tool
      • kubectl-node-shell - A kubectl plugin to run a root shell on a node
      • kubectl-tree - A kubectl plugin to explore ownership relationships between Kubernetes objects
      • kubectl-pod-inspect - A kubectl plugin to view pod and container status at a glance
      • kubepug - A pre-flight checking tool for Kubernetes APIs
      • rakkess - A kubectl plugin to show an access matrix for all available resources
      • ketall - A kubectl plugin to get all resources
  • Resource Optimization
    • Goldilocks - A utility that can help you identify a starting point for resource requests and limits
  • Vendor-specific Tools
    • eksctl - The official CLI for Amazon EKS
  • Dashboards
    • Kubernetes Lens IDE - The Kubernetes IDE
    • k9s - A terminal based UI to interact with your Kubernetes cluster
    • KDash - A simple terminal dashboard for Kubernetes built with Rust
    • k1s - A minimalistic Kubernetes dashboard
    • Seabird - The native desktop app that simplifies working with Kubernetes
    • Headlamp - A user-friendly Kubernetes UI focused on extensibility
  • Local K8s
    • Minikube - A tool that lets you run Kubernetes locally
    • Kind - A tool for running local Kubernetes clusters using Docker container “nodes”
  • K8s Operators
    • Prometheus Operator - The operator that creates/configures/manages Prometheus clusters atop Kubernetes
      • kube-prometheus - A collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring
    • OpenTelemetry Operator - An implementation of a Kubernetes Operator for OpenTelemetry
    • Elastic Cloud on Kubernetes (ECK) - The official operator for the Elastic Stack on Kubernetes
    • Rook - An open source cloud-native storage orchestrator for Kubernetes

Cloud-Native Runtimes & Patterns

Cloud-Native Computing

  • Concepts
    • Serverless Computing - A cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers
  • Container as a Service (CaaS)
  • Function as a Service (FaaS)
    • AWS Lambda - A serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers
    • Azure Functions - An event-driven, serverless compute platform that helps you develop more efficiently using the programming language of your choice
    • Google Cloud Run Functions - A serverless execution environment for building and connecting cloud services

Cloud-Native Infrastructure

  • App Runtimes & Scaling
    • KEDA (Kubernetes Event-driven Autoscaling) - A single-purpose and lightweight component that can be added into any cluster to provide event-driven scale for any container running in the environment
    • Dapr (Distributed Application Runtime) - A portable, event-driven runtime that makes it easy for any developer to build resilient, stateless, and stateful applications that run on the cloud and edge and embraces the diversity of languages and developer frameworks
  • Serverless Computing
    • OpenFaaS - A framework that makes it easy for developers to deploy event-driven functions and microservices to Kubernetes
    • Knative - A Kubernetes-based platform to build, deploy, and manage modern serverless workloads
  • Service Mesh & Discovery
    • Istio - An open source service mesh that layers transparently onto existing distributed applications
      • Kiali - The service mesh observability and configuration tool for Istio
    • Linkerd - An ultralight, security-first service mesh for Kubernetes
    • Hashicorp Consul - A service networking solution to connect and secure services across any runtime platform and public or private cloud
    • Traefik Mesh - A straight-forward, easy to configure, and non-invasive service mesh
  • Edge Proxies & Ingress
    • Envoy Proxy - An open source edge and service proxy
    • Traefik proxy - A leading modern open source reverse proxy and ingress controller

CI/CD & GitOps

  • Continuous Delivery Tools
    • Jenkins - An open source automation server which enables developers around the world to reliably build, test, and deploy their software
    • GitLab CI/CD - A part of GitLab that you can use to automate the builds, integration, and verification of your source code
    • GitHub Actions - A feature that makes it easy to automate all your software workflows
    • Concourse CI - An automation system written in Go
    • Azure Pipelines - A cloud service that you can use to automatically build and test your code project and make it available to other users
  • Terraform Integration
    • Atrantis - A self-hosted golang application that listens for Terraform pull request events via webhooks
  • Private Package Registries
    • GitLab Package Registry - A feature that allows you to publish and share packages for a variety of supported package managers
    • GitHub Packages - A software package hosting service that allows you to host your software packages privately or publicly
    • Nexus Repository Manager 3 - A sophisticated repository manager
    • Azure Artifacts - A service that enables you to create and share Maven, npm, NuGet, and Python package feeds from public and private sources
  • GitOps Style CD
    • ArgoCD - A declarative, GitOps continuous delivery tool for Kubernetes
    • FluxCD - A tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to configuration when there is new code to deploy
  • Cloud-Native Application Delivery
    • Open Application Model - A specification for describing applications so that they can be deployed and managed across any platform
    • KubeVela - A modern software delivery platform that makes deploying and operating applications across today's hybrid, multi-cloud environments easier, faster and more reliable
    • Flagger - A progressive delivery tool that automates the release process for applications running on Kubernetes

SRE (Site Reliability Engineering)

  • Site Reliability Engineering - A discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems
  • Ishikawa diagram - A causal diagram created by Kaoru Ishikawa that shows the potential causes of a specific event

Fleet Management & Operations

  • Fleet Management
    • AWS Systems Manager - A secure end-to-end management solution for resources on AWS and in multicloud and hybrid environments
    • Azure Automation - A cloud-based automation and configuration service that supports consistent management across your Azure and non-Azure environments
  • Backup
    • Vendor-specific Tools
      • AWS Backup - A fully managed service that centralizes and automates data protection across AWS services, in the cloud, and on premises
      • Azure Backup - A service that provides simple, secure, and cost-effective solutions to back up your data and recover it from the Microsoft Azure cloud
    • K8s-specific Tools
      • Velero - An open source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes
    • Generic
      • Restic - A fast, secure, efficient backup program
  • Runbook Automation
    • RunDeck - An open source automation platform that helps you automate routine operational procedures in data center or cloud environments
    • SaltStack - A Python-based, open-source software for event-driven IT automation, remote task execution, and configuration management

System Observability

  • Concepts
    • Observability - A measure of how well internal states of a system can be inferred from knowledge of its external outputs
  • Instrumentation Libraries
    • OpenTelemetry - A vendor-neutral open source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs
    • Micrometer - A metrics instrumentation library for JVM-based applications
  • Tools
    • Uptime Kuma - An easy-to-use self-hosted monitoring tool

Telemetry Shipment

  • Data Shippers
    • Prometheus exporters - The services that expose Prometheus metrics
      • node-exporter - An exporter for hardware and OS metrics exposed by *NIX kernels
      • blackbox-exporter - A tool that allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP and gRPC
    • Grafana Alloy - An open source OpenTelemetry collector with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles
    • Fluent Bit - A super fast, lightweight, and highly scalable logging, metrics, and traces processor and forwarder
    • Fluentd - An open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data
    • Filebeat - A lightweight shipper for forwarding and centralizing log data
    • Logstash - An open source server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash"
    • Telegraf - An open source server agent that helps you collect metrics from your stacks, sensors, and systems
    • Metricbeat - A lightweight shipper that you can install on your servers to periodically collect metrics from the operating system and from services running on the server
    • rsyslog - The rocket-fast system for log processing
  • Vendor-specific Tools
    • Azure Monitor Agent - The agent that collects monitoring data from the guest operating system of Azure and hybrid virtual machines
    • Cloudwatch Agent - The agent you can use to collect both system-level metrics and log files from Amazon EC2 instances and on-premises servers

Telemetry Collection

  • Datastore and Alerting Tools
    • Prometheus - An open-source systems monitoring and alerting toolkit
      • PromQL - The Prometheus Query Language
      • promtool - The command line utility for the Prometheus server
    • Alertmanager - A tool that handles alerts sent by client applications such as the Prometheus server
      • amtool - A cli tool for interacting with the Alertmanager API
    • InfluxDB - A time series database built from the ground up to handle high write and query loads
      • InfluxQL - An SQL-like query language for interacting with data in InfluxDB
      • influx cli - The command line interface for InfluxDB 2.0
    • Grafana Mimir - An open source, horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus
    • Grafana Loki - A horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus
      • LogQL - The query language for Loki
        • LogCLI - The command line interface for Loki
    • Grafana Tempo - An open source, easy-to-use and high-scale distributed tracing backend
      • TraceQL - A query language designed for selecting traces
    • ElasticSearch - An open source distributed, RESTful search and analytics engine, scalable data store, and vector database
      • Elastic Common Schema - An open source specification, developed with support from the Elastic user community
      • Ingest pipelines - A feature that lets you perform common transformations on your data before indexing
      • Dissect and Grok - The processors that let you extract structured fields out of a single text field
    • Graphite - A highly scalable real-time graphing system
    • Grafana Alerting - A feature that allows you to create and manage alerts for your data
    • OpenObserve - An open-source observability platform designed for modern applications
  • Vendor-specific Tools
    • Azure Monitor - A comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments
      • Kusto Query Language - A powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical models, and more
      • App Insights - A feature of Azure Monitor, is an extensible Application Performance Management (APM) service for developers and DevOps professionals
    • AWS CloudWatch - A monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers
  • Visualization Tools
    • Grafana - The open source data visualization and monitoring solution
      • Grafonnet - A Jsonnet library for generating Grafana dashboards
    • Kibana - A free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack

Chaos Engineering

  • Concepts
    • Chaos Engineering - The practice of experimenting on a system in order to build confidence in the system's capability to withstand turbulent conditions in production
    • Principles of chaos engineering - The principles that define the practice of chaos engineering
  • Chaos Engineering Tools
    • Chaos Monkey - A resiliency tool that helps applications tolerate random instance failures
    • Litmus - A cloud-native chaos engineering framework for Kubernetes
    • Chaos Mesh - A cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments
    • Toxiproxy - A TCP proxy to simulate network and system conditions for chaos and resiliency testing
    • kubeinvaders - A gamified chaos engineering tool for Kubernetes

FinOps

  • Concepts
    • FinOps principles - The cultural practice of bringing financial accountability to the variable spend model of cloud
  • FinOps Tools
    • FinOps toolkit - A collection of tools, resources, and best practices for implementing FinOps in your organization
    • AWS Cost Explorer - A tool that enables you to view and analyze your costs and usage
    • OpenCost - The open source solution for monitoring Kubernetes spend
    • Karpenter - A flexible, high-performance Kubernetes cluster autoscaler
    • Cloud Custodian - A rules engine for managing public cloud accounts and resources