Skip to main content

03 - Cloud & Cloud-Native Computing

Cloud Computing

Computing & Storage (IaaS)

  • Amazon EC2 - A web service that provides secure, resizable compute capacity in the cloud
  • Amazon EBS - An easy-to-use, high-performance block storage service designed for use with Amazon Elastic Compute Cloud
  • Azure Virtual Machines - A service to provision Windows and Linux virtual machines in seconds
  • Azure Disk Storage - A high-performance, durable block storage for Azure Virtual Machines
  • Google Cloud Compute Engine - A customizable compute service that lets you create and run virtual machines on Google's infrastructure

Networking

  • Amazon VPC - A service that lets you launch AWS resources in a logically isolated virtual network that you define
  • Amazon ELB - A service that automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions
  • Azure Virtual Network - The fundamental building block for your private network in Azure access to high-performance networking
  • Azure Load Balancer - A service that allows you to distribute traffic to your backend virtual machines
  • Azure Application Gateway - A platform-managed, scalable, and highly available application delivery controller as a service
  • Google Cloud VPC - A virtual version of a physical network that is implemented inside of Google's production network by using Andromeda
  • Cloud Load Balancing - A fully distributed, software-defined, managed service for all your traffic

Application Hosting Platform (PaaS)

  • Azure App Service - An HTTP-based service for hosting web applications, REST APIs, and mobile back ends
  • AWS Elastic Beanstalk - An easy-to-use service for deploying and scaling web applications and services
  • Google Cloud App Engine - A fully managed, serverless platform for developing and hosting web applications at scale

Command Line Interfaces

  • AWS CLI - A unified tool to manage your AWS services
  • Azure CLI - A cross-platform command-line tool for managing Azure resources with interactive commands or scripts
  • Azure Developer CLI (azd) - An open-source tool that accelerates your path from a local development environment to Azure
  • Google Cloud CLI (gcloud) - A set of tools to create and manage Google Cloud resources and services

Cloud Emulators

  • LocalStack - A fully functional local cloud stack to develop and test your cloud and serverless apps offline

Cloud Architecture Frameworks

  • Azure Architecture Center - A set of guidance, patterns, and best practices for building secure, high-performing, resilient, and efficient infrastructure on Azure
  • Azure Well-Architected Framework - A set of quality-driven tenets, architectural decision points, and review tools intended to help solution architects build a technical foundation for their workloads

Configuration as Code

Infrastructure as Code (IaC)

  • Hashicorp Terraform - An infrastructure as code tool that lets you build, change, and version infrastructure safely and efficiently
  • OpenTofu - An open-source, community-driven fork of Terraform that provides a stable, drop-in replacement for building and managing infrastructure
  • Pulumi - An infrastructure as code platform that allows you to use familiar programming languages and tools to build, deploy, and manage cloud infrastructure

Configuration Management & Automation

  • Ansible - An open source IT automation engine that automates provisioning, configuration management, application deployment, orchestration, and many other IT processes
  • cloud-init - The standard for customising cloud instances

Image Building

  • Hashicorp Packer - A tool for creating identical machine images for multiple platforms from a single source configuration

Ecosystem & Vendor Tools

  • Terraform/OpenTofu Ecosystem
    • Terraform/OpenTofu Provider: Core Functions - A Terraform/OpenTofu provider for performing core functions
    • TerraGrant - A thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state
    • TerraTest - A Go library that provides patterns and helper functions for testing infrastructure
    • Atmos - A universal tool for DevOps and Cloud Engineering that orchestrates workflows and simplifies the management of infrastructure
    • GitLab-managed Terraform/OpenTofu state - A feature that allows you to store your Terraform state files in GitLab
    • tf.libsonnet - A collection of Jsonnet libraries for generating Terraform code
    • terraform-docs - A utility to generate documentation from Terraform modules in various output formats
    • Terraformer - A CLI tool to generate terraform files from existing infrastructure
  • Vendor-specific Tools
    • AWS CloudFormation - A service that helps you model and set up your Amazon Web Services resources
    • AWS CDK - An open source software development framework to define your cloud application resources using familiar programming languages
    • AWS SAM - An open-source framework for building serverless applications
    • Azure Resource Manager - The deployment and management service for Azure
      • Bicep language - A domain-specific language (DSL) that uses declarative syntax to deploy Azure resources
      • Azure Resource Graph - A powerful management tool to query, explore, and analyze your cloud resources at scale

Containerization

Fundamentals

  • Linux Distros for Containers
    • Alpine Linux - A security-oriented, lightweight Linux distribution based on musl libc and busybox
      • apk-tools - A package manager originally built for Alpine Linux
    • Fedora CoreOS - An automatically updating, minimal operating system for running containerized workloads securely and at scale
    • Flatcar Container Linux - An immutable Linux distribution for containers
  • Utilities in Containers
    • busybox - A single small executable that combines tiny versions of many common UNIX utilities
  • Standards
    • The Open Container Initiative (OCI) - An open governance structure for the express purpose of creating open industry standards around container formats and runtimes
    • Compose Specification - A developer-focused standard for defining cloud and platform agnostic container-based applications
    • Development Containers - An open specification for enriching containers with development-specific settings, tools, and configuration

Engines & Runtimes

  • Container Engines
    • Docker Engine - An open source containerization technology for building and containerizing your applications
      • Docker Rootless mode - A feature that allows the Docker daemon and containers to run as a non-root user, mitigating potential vulnerabilities
    • podman - A powerful container engine for building, managing, and running containers and pods
      • podman-static - Alpine-based container images and statically linked (rootless) binaries for Linux
  • Container Runtimes
    • containerd - An industry-standard container runtime with an emphasis on simplicity, robustness and portability
      • nerdctl - A Docker-compatible CLI for containerd
      • ctr - An unsupported debug and administrative client for interacting with the containerd daemon
    • CRI-O - An implementation of the Kubernetes CRI (Container Runtime Interface) to enable using OCI (Open Container Initiative) compatible runtimes
  • OCI Runtimes
    • runc - A CLI tool for spawning and running containers according to the OCI specification
    • crun - A fast and lightweight fully featured OCI runtime and C library for running containers

Image Management

  • Image Building Tools
    • Docker Build - A part of the Docker Engine that automates the process of creating a Docker image from a Dockerfile and a context
    • buildah - A tool that facilitates building Open Container Initiative (OCI) container images
    • podman build - A command that constructs OCI-compatible container images by interpreting instructions from a Containerfile or Dockerfile, leveraging Buildah for the underlying operations
    • Kaniko - A tool to build container images from a Dockerfile, inside a container or Kubernetes cluster
  • Image Inspection & Management Tools
    • skopeo - A command line utility that performs various operations on container images and image repositories
    • dive - A tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image
    • regclient - A suite of command-line tools (regctl, regsync, regbot) for managing and inspecting OCI registries and images, supporting advanced features like multi-platform images and mirroring
  • Container Registries
    • GitLab Container Registry - A secure and private registry for Docker images
    • Nexus Repository Manager 3 - A sophisticated repository manager
    • Project Quay - An open-source, container-native image registry designed for building, organizing, distributing, and deploying containers
    • Docker Hub - A cloud-based registry service that allows developers and teams to store, share, and distribute Docker container images
    • Amazon ECR - A fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts
    • Azure Container Registry - A private registry for managing container images and related artifacts
    • Harbor - An open source registry that secures artifacts with policies and role-based access control

Environment & Management

  • Container Management Tools
    • Podman Desktop - The best free and open source tool for developers to work with containers and Kubernetes, simplifying container management, streamlining Kubernetes workflows, and transitioning from local development to production with ease
    • lazydocker - A terminal UI for both docker and docker-compose
    • Docker Compose - A tool for defining and running multi-container Docker applications
  • Local Environment Provisioners (for Mac)
    • Colima - A tool that provides container runtimes on macOS (and Linux) with minimal setup
    • Lima - A tool that launches Linux virtual machines with automatic file sharing and port forwarding

WebAssembly

  • Standards

    • WebAssembly - A binary instruction format for a stack-based virtual machine
    • WebAssembly System Interface (WASI) - A modular system interface for WebAssembly
    • WASIX - The long term stabilization and support of the existing WASI ABI plus additional non-invasive syscall extensions
  • Runtimes

    • wazero - The only zero dependency WebAssembly runtime written in Go
    • Wasmtime - A fast and secure runtime for WebAssembly
    • Wasmer - A blazing fast and secure WebAssembly runtime that enables incredibly lightweight containers to run anywhere

Kubernetes

  • Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications
  • Master node
    • kube-apiserver - Responsible for API services
    • kube-scheduler - Responsible for scheduling
    • kube-controller-manager - Responsible for container orchestration
  • Compute node
    • kubelet - watches the API server for pods on that node and makes sure they are running
    • cAdvisor - collects metrics about pods running on that particular node
    • kube-proxy - watches the API server for pods/services changes in order to maintain the network up to date
    • container runtime - responsible for managing container images and running containers on that node
  • Interface Standards
    • CNI (Container Networking Interface)
    • CSI (Container Storage Interface)
    • CRI (Container Runtime Interface)

Core Concepts & Components

  • K8s Internals
    • Workloads - The objects you use to manage and run your containers on the cluster
    • Pod
      • assignment - The process of constraining a Pod so that it is restricted to run on particular nodes, or to prefer to run on particular nodes
      • taint and toleration - A mechanism that allows you to ensure that pods are not placed on inappropriate nodes
      • lifecycle - The lifecycle of a Pod
      • liveness probe - A probe the kubelet uses to know when to restart a container
      • requests and limits
      • eviction
    • Deployment, ReplicaSet, StatefulSet, DaemonSet
    • Kubernetes network model - A set of fundamental requirements and principles for networking in a Kubernetes cluster
      • Service, Ingress, Ingress Controllers
    • Storage - A powerful volume subsystem with an API that abstracts how storage is provided and consumed
      • PersistentVolume, PVC, StorageClass
    • Configuration - A range of mechanisms that let you inject configuration data into the Pods that run your applications
      • Secret, ConfigMap
    • Security & Policy
      • Kubernetes RBAC - A method of regulating access to computer or network resources based on the roles of individual users within an enterprise
      • PodDisruptionBudget - An object that limits the number of concurrent disruptions that your application experiences, allowing for high availability
      • Security context - A definition of privilege and access control settings for a Pod or Container
  • Autoscaling
    • HPA - The component that automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
    • Cluster Autoscaler - A tool that automatically adjusts the size of the Kubernetes cluster

Operations & Management

  • K8s Operators
    • Prometheus Operator - The operator that creates/configures/manages Prometheus clusters atop Kubernetes
      • kube-prometheus - A collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring
    • OpenTelemetry Operator - An implementation of a Kubernetes Operator for OpenTelemetry
    • Elastic Cloud on Kubernetes (ECK) - The official operator for the Elastic Stack on Kubernetes
    • Rook - An open source cloud-native storage orchestrator for Kubernetes
  • Dashboards
    • Kubernetes Lens IDE - The Kubernetes IDE
    • k9s - A terminal based UI to interact with your Kubernetes cluster
    • KDash - A simple terminal dashboard for Kubernetes built with Rust
    • Seabird - The native desktop app that simplifies working with Kubernetes
    • Headlamp - A user-friendly Kubernetes UI focused on extensibility

CLI & Local Environments

  • CLI Plugin Management
    • Krew - The plugin manager for kubectl command-line tool
      • kubectl-node-shell - A kubectl plugin to run a root shell on a node
      • kubectl-tree - A kubectl plugin to explore ownership relationships between Kubernetes objects
      • kubectl-pod-inspect - A kubectl plugin to view pod and container status at a glance
      • kubepug - A pre-flight checking tool for Kubernetes APIs
      • rakkess - A kubectl plugin to show an access matrix for all available resources
      • ketall - A kubectl plugin to get all resources
  • Local K8s Tools
    • Minikube - A tool that lets you run Kubernetes locally
    • Kind - A tool for running local Kubernetes clusters using Docker container “nodes”

Ecosystem & Extensions

  • Application Packaging & Configuration
    • Helm - The package manager for Kubernetes
    • Kustomize - A standalone tool to customize Kubernetes objects through a kustomization file
    • Artifact Hub - A web-based application designed to facilitate the finding, installing, and publishing of Cloud Native packages and configurations
  • Cloud Resource Management
    • Crossplane - A cloud-native framework for platform engineering that enables users to build their own APIs and services with control planes, extending Kubernetes to manage any resource anywhere
  • Developer Workflow Tools
    • Skaffold - A command line tool that facilitates continuous development for container-based applications
  • Platform Extensions
    • kube-fencing - A solution for fencing of stateful application's nodes in kubernetes
    • KubeVirt - A virtual machine management add-on for Kubernetes
  • Operator & Controller Development
    • Kubebuilder - A framework for building Kubernetes APIs using custom resource definitions (CRDs)
  • Resource Optimization
    • Goldilocks - A utility that can help you identify a starting point for resource requests and limits
  • Vendor-specific Tools
    • eksctl - The official CLI for Amazon EKS

Cloud-Native Computing

  • Serverless Computing - A cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers

Container as a Service (CaaS)

  • Managed Kubernetes
  • Simplified Container Hosting
    • Amazon Elastic Container Service - A fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications
    • AWS Fargate - A serverless compute engine for containers that works with both ECS and EKS
    • AWS App Runner - A fully managed service that makes it easy for developers to quickly deploy containerized web applications and APIs, at scale and with no prior infrastructure experience required
    • Azure Container Apps - A fully managed serverless container service built on Kubernetes, integrating KEDA, Dapr, and Envoy for microservices and event-driven workloads
    • Google Cloud Run - A managed compute platform that lets you run containers that are automatically scaled

Function as a Service (FaaS)

  • AWS Lambda - A serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers
  • Azure Functions - An event-driven, serverless compute platform that helps you develop more efficiently using the programming language of your choice
  • Google Cloud Run Functions - A serverless execution environment for building and connecting cloud services

Advanced Runtimes & Isolation

  • Sandboxed Runtimes
    • Kata Containers - An open-source project building a standard implementation of lightweight virtual machines that feel and perform like containers, but provide the workload isolation and security of virtual machines
    • gVisor - A Linux-compatible sandbox that implements the Linux kernel and its network stack, intercepting system calls to protect the host from containerized applications
    • libkrun - A dynamic library providing virtualization-based process isolation capabilities
    • Cloud Hypervisor - An open source Virtual Machine Monitor (VMM) implemented in Rust that focuses on running modern, cloud workloads, with minimal hardware emulation
    • Firecracker - An open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services
    • QEMU microvm - A minimalist machine type without PCI nor ACPI support, designed for short-lived guests, and optimized for both boot time and footprint
    • Docker Sandboxes - The isolated, disposable environments designed to run AI coding agents in lightweight microVMs for enhanced security and system protection
  • Virtualization & Container Storage
    • virtiofs - A shared file system that lets virtual machines access a directory tree on the host
  • Image Services & Distribution
    • Nydus - A powerful opensource filesystem solution to form a high-efficiency image distribution system for Cloud Native workloads, such as container images, software packages, etc

Cloud-Native Infrastructure

  • App Runtimes & Scaling
    • KEDA (Kubernetes Event-driven Autoscaling) - A single-purpose and lightweight component that can be added into any cluster to provide event-driven scale for any container running in the environment
    • Dapr (Distributed Application Runtime) - A portable, event-driven runtime that makes it easy for any developer to build resilient, stateless, and stateful applications that run on the cloud and edge and embraces the diversity of languages and developer frameworks
    • V8 isolates - An independent instance of the engine with its own heap and its own garbage collector
  • Serverless Computing
    • OpenFaaS - A framework that makes it easy for developers to deploy event-driven functions and microservices to Kubernetes
    • Knative - A Kubernetes-based platform to build, deploy, and manage modern serverless workloads
  • Service Mesh & Discovery
    • Istio - An open source service mesh that layers transparently onto existing distributed applications
      • Kiali - The service mesh observability and configuration tool for Istio
    • Linkerd - An ultralight, security-first service mesh for Kubernetes
    • Hashicorp Consul - A service networking solution to connect and secure services across any runtime platform and public or private cloud
    • Traefik Mesh - A straight-forward, easy to configure, and non-invasive service mesh
  • Edge Proxies & Ingress
    • Envoy Proxy - An open source edge and service proxy
    • Traefik proxy - A leading modern open source reverse proxy and ingress controller
  • Cloud-Native Networking
    • Project Calico - An open-source project that provides secure network connectivity, network security, and observability for containers, virtual machines, and native host-based workloads
    • Cilium - An open-source project that provides networking, security, and observability for cloud-native environments

CI/CD & GitOps

Delivery & Deployment

  • Continuous Delivery Tools
    • Jenkins - An open source automation server which enables developers around the world to reliably build, test, and deploy their software
    • GitLab CI/CD - A part of GitLab that you can use to automate the builds, integration, and verification of your source code
    • GitHub Actions - A feature that makes it easy to automate all your software workflows
    • Azure Pipelines - A cloud service that you can use to automatically build and test your code project and make it available to other users
  • Application Deployment
    • Kamal - A tool to deploy web apps anywhere

GitOps & Cloud-Native

  • GitOps Style CD
    • ArgoCD - A declarative, GitOps continuous delivery tool for Kubernetes
    • FluxCD - A tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to configuration when there is new code to deploy
  • Cloud-Native Application Delivery
    • Open Application Model - A specification for describing applications so that they can be deployed and managed across any platform
    • KubeVela - A modern software delivery platform that makes deploying and operating applications across today's hybrid, multi-cloud environments easier, faster and more reliable
    • Flagger - A progressive delivery tool that automates the release process for applications running on Kubernetes

Integrations & Registries

  • Terraform Integration
    • Atrantis - A self-hosted golang application that listens for Terraform pull request events via webhooks
  • Private Package Registries
    • GitLab Package Registry - A feature that allows you to publish and share packages for a variety of supported package managers
    • GitHub Packages - A software package hosting service that allows you to host your software packages privately or publicly
    • Nexus Repository Manager 3 - A sophisticated repository manager
    • Azure Artifacts - A service that enables you to create and share Maven, npm, NuGet, and Python package feeds from public and private sources

System Observability

Instrumentation & Platforms

  • Concepts
    • Observability - A measure of how well internal states of a system can be inferred from knowledge of its external outputs
  • Instrumentation Libraries
    • OpenTelemetry - A vendor-neutral open source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs
    • Micrometer - A metrics instrumentation library for JVM-based applications
  • Monitoring Tools
    • Uptime Kuma - An easy-to-use self-hosted monitoring tool
  • Managed Platforms
    • Azure Monitor - A comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments
      • Kusto Query Language - A powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical models, and more
      • App Insights - A feature of Azure Monitor, is an extensible Application Performance Management (APM) service for developers and DevOps professionals
    • AWS CloudWatch - A monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers
    • Datadog - The integrated platform for monitoring & security

Telemetry Shipment

  • Data Shippers
    • Prometheus exporters - The services that expose Prometheus metrics
      • node-exporter - An exporter for hardware and OS metrics exposed by *NIX kernels
      • blackbox-exporter - A tool that allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP and gRPC
    • Grafana Alloy - An open source OpenTelemetry collector with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles
    • Fluent Bit - A super fast, lightweight, and highly scalable logging, metrics, and traces processor and forwarder
    • Fluentd - An open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data
    • Filebeat - A lightweight shipper for forwarding and centralizing log data
    • Logstash - An open source server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash"
    • Telegraf - An open source server agent that helps you collect metrics from your stacks, sensors, and systems
    • Metricbeat - A lightweight shipper that you can install on your servers to periodically collect metrics from the operating system and from services running on the server
    • rsyslog - The rocket-fast system for log processing
  • Vendor-specific Tools
    • Azure Monitor Agent - The agent that collects monitoring data from the guest operating system of Azure and hybrid virtual machines
    • Cloudwatch Agent - The agent you can use to collect both system-level metrics and log files from Amazon EC2 instances and on-premises servers

Telemetry Collection & Storage

  • Datastore and Alerting Tools
    • Prometheus - An open-source systems monitoring and alerting toolkit
      • PromQL - The Prometheus Query Language
      • promtool - The command line utility for the Prometheus server
    • Alertmanager - A tool that handles alerts sent by client applications such as the Prometheus server
      • amtool - A cli tool for interacting with the Alertmanager API
    • InfluxDB - A time series database built from the ground up to handle high write and query loads
      • InfluxQL - An SQL-like query language for interacting with data in InfluxDB
      • influx cli - The command line interface for InfluxDB 2.0
    • Grafana Mimir - An open source, horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus
    • Grafana Loki - A horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus
      • LogQL - The query language for Loki
        • LogCLI - The command line interface for Loki
    • Grafana Tempo - An open source, easy-to-use and high-scale distributed tracing backend
      • TraceQL - A query language designed for selecting traces
    • ElasticSearch - An open source distributed, RESTful search and analytics engine, scalable data store, and vector database
      • Elastic Common Schema - An open source specification, developed with support from the Elastic user community
      • Ingest pipelines - A feature that lets you perform common transformations on your data before indexing
      • Dissect and Grok - The processors that let you extract structured fields out of a single text field
    • Graphite - A highly scalable real-time graphing system
    • Grafana Alerting - A feature that allows you to create and manage alerts for your data
    • OpenObserve - An open-source observability platform designed for modern applications

Visualization

  • Visualization Tools
    • Grafana - The open source data visualization and monitoring solution
      • Grafonnet - A Jsonnet library for generating Grafana dashboards
    • Kibana - A free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack

SRE (Site Reliability Engineering)

  • Site Reliability Engineering - A discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems
  • Ishikawa diagram - A causal diagram created by Kaoru Ishikawa that shows the potential causes of a specific event

Fleet Management & Operations

  • Fleet Management
    • AWS Systems Manager - A secure end-to-end management solution for resources on AWS and in multicloud and hybrid environments
    • Azure Automation - A cloud-based automation and configuration service that supports consistent management across your Azure and non-Azure environments
  • Backup
    • Vendor-specific Tools
      • AWS Backup - A fully managed service that centralizes and automates data protection across AWS services, in the cloud, and on premises
      • Azure Backup - A service that provides simple, secure, and cost-effective solutions to back up your data and recover it from the Microsoft Azure cloud
    • K8s-specific Tools
      • Velero - An open source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes
    • Generic
      • Barman - A disaster recovery solution for PostgreSQL databases, designed to ensure business continuity by simplifying online hot backups
      • Restic - A fast, secure, efficient backup program
  • Runbook Automation
    • RunDeck - An open source automation platform that helps you automate routine operational procedures in data center or cloud environments
    • SaltStack - A Python-based, open-source software for event-driven IT automation, remote task execution, and configuration management
  • AIOps & Autonomous Agents
    • Azure SRE Agent - An AI-powered service designed to automate Site Reliability Engineering practices by monitoring, diagnosing, and helping resolve incidents

Chaos Engineering

  • Concepts
    • Chaos Engineering - The practice of experimenting on a system in order to build confidence in the system's capability to withstand turbulent conditions in production
    • Principles of chaos engineering - The principles that define the practice of chaos engineering
  • Chaos Engineering Tools
    • Litmus - A cloud-native chaos engineering framework for Kubernetes
    • Chaos Mesh - A cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments
    • Toxiproxy - A TCP proxy to simulate network and system conditions for chaos and resiliency testing
    • kubeinvaders - A gamified chaos engineering tool for Kubernetes

FinOps

  • Concepts
    • FinOps principles - The cultural practice of bringing financial accountability to the variable spend model of cloud
  • FinOps Tools
    • FinOps toolkit - A collection of tools, resources, and best practices for implementing FinOps in your organization
    • AWS Cost Explorer - A tool that enables you to view and analyze your costs and usage
    • OpenCost - The open source solution for monitoring Kubernetes spend
    • Karpenter - A flexible, high-performance Kubernetes cluster autoscaler
    • Cloud Custodian - A rules engine for managing public cloud accounts and resources