200 - System Administration and SRE
Gemini says "This is an exceptionally well-classified and comprehensive list. The structure is logical, progressing from fundamental concepts to specialized and modern operational practices. It's clear, detailed, and reflects a current understanding of the System Administration and SRE landscape."
200 - Operating Systems, Networking, and Modern Infrastructure
Note: Please see also Class 103 - Concurrency and Parallelism.
200 - Core OS Concepts
- Core Concepts
- System call - The programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed
- Protection ring - A mechanism to protect data and functionality from faults and malicious behavior
- Daemon - A computer program that runs as a background process, rather than being under the direct control of an interactive user
- Environment variable - A named variable whose value is set outside the program, typically through functionality built into the operating system or a microservice
- POSIX standard - A family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems
- Process Management
- Process - The instance of a computer program that is being executed by one or more threads
- Thread - The smallest sequence of programmed instructions that can be managed independently by a scheduler
- Scheduling - The action of assigning resources to perform tasks
- Context switch - The process of storing the state of a process or thread, so that it can be restored and resume execution at a later point
- Interrupt - A request for the processor to interrupt currently executing code, so that the event can be processed in a timely manner
- Process - The instance of a computer program that is being executed by one or more threads
- Inter-Process Communication (IPC)
- Pipes
- Anonymous pipe - A simplex FIFO communication channel that may be used for one-way interprocess communication
- Named pipe - An extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication
- Shared memory - A memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies
- Signal - An asynchronous notification sent to a process or to a specific thread within the same process in order to notify it of an event that occurred
- Unix domain socket - A data communications endpoint for exchanging data between processes executing on the same host operating system
- Pipes
- Memory Management
- Virtual memory - A memory management technique that provides an idealized abstraction of the storage resources that are actually available on a given machine
- Memory paging - A memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory
- Page fault - A type of exception raised by computer hardware when a running program accesses a memory page that is not currently mapped by the memory management unit into the virtual address space of a process
- Resident set size (RSS) - The portion of memory occupied by a process that is held in main memory
- Working set size (WSS) - The set of pages in the virtual address space of the process that are currently resident in main memory
- Page cache - A hardware or software component that stores data so that future requests for that data can be served faster
- Virtual memory - A memory management technique that provides an idealized abstraction of the storage resources that are actually available on a given machine
- Storage Management
- Disk partitioning - The creation of one or more regions on a secondary storage device, so that each region can be managed separately
- Loop device - A pseudo-device that makes a file accessible as a block device
- File system - A method and data structure that the operating system uses to control how data is stored and retrieved
- Journaling file system - A file system that keeps a journal, a circular log of changes that have not yet been committed to the main part of the file system
- Path - The general form of the name of a file or directory, specifies a unique location in a file system
- Glob pattern - A pattern that specifies sets of filenames with wildcard characters
- File handle/descriptor - A unique identifier for a file or other input/output resource, such as a pipe or network socket
- Symbolic link - A term for any file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution
- Permissions - A feature of many modern file systems which control the ability of the users of a computer to view, change, navigate, and execute the contents of the file system
- Setuid - A Unix access rights flag that allows users to run an executable with the permissions of the executable's owner or group
- Sticky bit - A user ownership access right flag that can be assigned to files and directories on Unix-like systems
- Inode - A data structure in a Unix-style file system that describes a file-system object such as a file or a directory
- RAID - A data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both
201 - Base Network Concepts & Protocols
Note: Please see also 220 - Domain Name System and Email, 300 - Web and API Style Standards, and 630 - PKI and Secure Communications
- The OSI Model - A conceptual model that provides a common basis for the coordination of standards development for the purpose of systems interconnection
- The Internet - The global system of interconnected computer networks that uses the Internet protocol suite to communicate between networks and devices
- IP - The network layer communications protocol in the Internet protocol suite
- Link-local address - A network address that is valid only for communications within the network segment or the broadcast domain that the host is connected to
- IP-multicast - A method of sending Internet Protocol datagrams to a group of interested receivers in a single transmission
- DHCP - A network management protocol used on Internet Protocol networks for automatically assigning IP addresses and other communication parameters to devices connected to the network
- ICMP - A supporting protocol in the Internet protocol suite
- NAT - A method of mapping an IP address space into another by modifying network address information in the IP header of packets while they are in transit across a traffic routing device
- IPv6 - The most recent version of the Internet Protocol, the communications protocol that provides an identification and location system for computers on networks and routes traffic across the Internet
- Unique local address - An IPv6 address in the address block fc00::/7
- DHCPv6 - A network protocol for configuring Internet Protocol version 6 hosts with IP addresses, IP prefixes and other configuration data required to operate in an IPv6 network
- ICMPv6 - The implementation of the Internet Control Message Protocol for Internet Protocol version 6
- NAT64 - An IPv6 transition mechanism that facilitates communication between IPv6 and IPv4 hosts
- NDP - A protocol in the Internet protocol suite used with Internet Protocol Version 6
- Routing table - A data table stored in a router or a network host that lists the routes to particular network destinations
- CIDR - A method for allocating IP addresses and for IP routing
- Network socket - A software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network
- TCP - A main protocol of the Internet protocol suite
- UDP - A core member of the Internet protocol suite
- QUIC - A UDP-based, stream-multiplexing, encrypted transport protocol
- IP - The network layer communications protocol in the Internet protocol suite
- Ethernet - A family of wired computer networking technologies
- ARP - A communication protocol used for discovering the link layer address, such as a MAC address, associated with a given internet layer address
- MAC address - A unique identifier assigned to a network interface controller for use as a network address in communications within a network segment
- VLAN - A broadcast domain that is partitioned and isolated in a computer network at the data link layer
- ARP - A communication protocol used for discovering the link layer address, such as a MAC address, associated with a given internet layer address
202 - Linux Kernel Features
Note: Please see also 210 - Linux Administration
- The Linux Kernel - The main component of a Linux operating system and is the core interface between a computer's hardware and its processes
- Threads
- Pthreads - An execution model that exists independently from a programming language, as well as a parallel execution model
- Filesystems
- ext4 - The default file system for many major Linux distributions
- xfs - A high-performance journaling file system created by Silicon Graphics, Inc
- overlayfs - A union mount filesystem implementation for Linux
- proc.5 - A virtual filesystem that provides an interface to kernel data structures
- sysfs.5 - A virtual filesystem that exports information about various kernel subsystems, hardware devices, and associated device drivers
- Container Support
- cgroups - A Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored
- namespaces - A wrapper for a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource
- lxc/rootfs - The userspace interface for the Linux kernel containment features
- nsenter - A command that executes a program in the namespaces of other processes
- FUSE (Filesystem in Userspace) - An interface for userspace programs to export a filesystem to the Linux kernel
- s3fs - A FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem
- eBPF (Extended Berkeley Packet Filter) - A revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context
- Threads
203 - Virtualization
- Virtualization - The act of creating a virtual version of something, including virtual computer hardware platforms, storage devices, and computer network resources
- Type-1 Hypervisors
- KVM - A full virtualization solution for Linux on x86 hardware containing virtualization extensions
- Hyper-V - A hardware virtualization product from Microsoft
- Proxmox VE - A complete, open-source server management platform for enterprise virtualization
- Type-2 Hypervisors
- VirtualBox - A powerful x86 and AMD64/Intel64 virtualization product for enterprise as well as home use
- QEMU - A generic and open source machine emulator and virtualizer
- Vitrualization Management
- libvert - A toolkit to manage virtualization platforms
- CPU Emulators
- QEMU - A generic and open source machine emulator and virtualizer
204 - Applied & Distributed Systems
- Distributed Computing - A field of computer science that studies distributed systems
- Single point of failure - A part of a system that, if it fails, will stop the entire system from working
- Fault tolerance - The property that enables a system to continue operating properly in the event of the failure of some of its components
- Load balancing - The process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient
- Fallacies of distributed computing - A set of assertions describing false assumptions that programmers new to distributed applications invariably make
- Byzantine fault - A condition of a distributed system, where components may fail and there is imperfect information about whether a component has failed
- Consensus - A fault-tolerant mechanism that is used in distributed systems to achieve the necessary agreement on a single data value among distributed processes or systems
- Cloud Computing Services
- Amazon EC2 - A web service that provides secure, resizable compute capacity in the cloud
- Amazon EBS - An easy-to-use, high-performance block storage service designed for use with Amazon Elastic Compute Cloud
- Amazon VPC - A service that lets you launch AWS resources in a logically isolated virtual network that you define
- Amazon ELB - A service that automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions
- Azure Virtual Machines - A service to provision Windows and Linux virtual machines in seconds
- Azure Disk Storage - A high-performance, durable block storage for Azure Virtual Machines
- Azure Virtual Network - The fundamental building block for your private network in Azure
- Azure Load Balancer - A service that allows you to distribute traffic to your backend virtual machines
- Google Cloud Compute Engine - A service that delivers configurable virtual machines running in Google’s data centers with access to high-performance networking
- Amazon EC2 - A web service that provides secure, resizable compute capacity in the cloud
- Cloud Emulators
- LocalStack - A fully functional local cloud stack to develop and test your cloud and serverless apps offline
205 - Computer Hardware
- CPU Architectures
210 - Linux Administration
210 - Linux Core Components
- Linux distros for hosts
- Ubuntu server - The standard platform for public clouds, on-premises, and IoT devices
- Debian - A complete Free Operating System
- Arch Linux - A simple, lightweight distribution
- linux-pam - A system of libraries that handle the authentication tasks of applications and services in a Linux system
- Systemd - A system and service manager for Linux operating systems
- journald - A system service that collects and stores logging data
- hostnamed - A system service that may be used to control the hostname and related machine metadata from user programs
- networkd - A system service that manages networks
- resolved - A system service that provides network name resolution to local applications
- timesyncd - A system service that may be used to synchronize the local system clock with a remote Network Time Protocol server
211 - Host Administration
- Basic Tools
- util-linux - A random collection of Linux utilities
- rsync - An open source utility that provides fast incremental file transfer
- Vixie Cron - An open source implementation of POSIX Cron
- logrotate - allows for the automatic rotation compression, removal and mailing of log files
- Syslog - A standard for message logging
- procps - A set of command line and full-screen utilities that provide information out of the pseudo-filesystem most commonly located at /proc
- ps - A command that displays information about a selection of the active processes
- top - A program that provides a dynamic real-time view of a running system
- free - A command that displays the total amount of free and used physical and swap memory in the system
- vmstat - A command that reports information about processes, memory, paging, block IO, traps, disks and cpu activity
- psmisc - A package of small utilities that use the proc file-system
- lsof - A command for LiSting Open Files
- sudo - A system administrator to delegate authority to give certain users the ability to run some commands as root or another user
- shadow-utils - includes the necessary programs for converting UNIX password files to the shadow password format, plus programs for managing user and group accounts
- useradd - A low level utility for adding users
- strace - A diagnostic, debugging and instructional userspace utility for Linux
- inxi - A full featured system information script
- Monitors
- Monit - A small Open Source utility for managing and monitoring Unix systems
- atop - An ASCII full-screen performance monitor for Linux
- sysstat - A collection of performance monitoring tools for Linux
- iostat - A command used for monitoring system input/output device loading
- smem - A tool that can give numerous reports on memory usage on Linux systems
- Clock syncing
- NTP - A networking protocol for clock synchronization between computer systems over packet-switched, variable-latency data networks
- chrony - A versatile implementation of the Network Time Protocol
- pool.ntp.org - A big virtual cluster of timeservers providing reliable easy to use NTP service for millions of clients
- jc - A CLI tool and Python library that converts the output of popular command-line tools and file-types to JSON or Dictionaries
- aha - An Ansi HTML Adapter
- NO_COLOR - An environment variable to disable ANSI color in command-line software
- Modern Tools
- lsd - A rewrite of GNU ls with a lot of added features like colors, icons, tree-view, and more formatting options
- eza - A modern replacement for ls
- broot - A new way to see and navigate directory trees
- bat - A cat(1) clone with wings
- dust - A more intuitive version of du in rust
- dua - A tool to view disk space usage and delete unwanted data, fast
- duf - A better 'df' alternative
- procs - A modern replacement for ps written in Rust
- htop - An interactive process viewer for Unix systems
- btop++ - A resource monitor for Linux, macOS, and FreeBSD
- glances - A cross-platform monitoring tool which aims to present a large amount of monitoring information through a curses or Web based interface
- neofetch - A command-line system information tool
212 - Network Administration
- Basic Tools
- iproute2 - A collection of utilities for controlling TCP / IP networking and traffic control in Linux
- net-tools (legacy) - A collection of programs for controlling the network subsystem of the Linux kernel
- traceroute - A computer network diagnostic tool for displaying the route and measuring transit delays of packets across an Internet Protocol network
- NetworkManager - A daemon that sits on top of libudev and other Linux kernel interfaces and provides a high-level interface for network configuration
- Ubuntu NetPlan - A network configuration abstraction renderer
- tcpdump - A powerful command-line packet analyzer
- wireshark - The world's foremost network protocol analyzer
- nmap - An open source tool for network exploration and security auditing
- ncat - A feature-packed networking utility which reads and writes data across networks from the command line
- Layer 5 Gateway
- SOCKS Proxy - An Internet protocol that exchanges network packets between a client and server through a proxy server
- Dante - A SOCKS server and SOCKS client, implementing RFC 1928 and related standards
- tun2socks - A SOCKS proxy for TCP and UDP, that handles all connections from a TUN device
- proxychains - A tool that forces any TCP connection made by any given application to follow through proxy like TOR or any other SOCKS4, SOCKS5 or HTTP(S) proxy
- SOCKS Proxy - An Internet protocol that exchanges network packets between a client and server through a proxy server
213 - OS Package Management
- Package Management Tools
- dpkg - The base package management system for Debian
- apt - A command-line utility for installing, updating, removing, and otherwise managing deb packages on Ubuntu, Debian, and related Linux distributions
- Pacman - A utility which manages software packages in Linux
- Yay - An AUR Helper Written in Go
- Homebrew - The Missing Package Manager for macOS (or Linux)
- pipx - A tool to install and run Python applications in isolated environments
- Flatpak - A system for building, distributing, and running sandboxed desktop applications on Linux
- Snapcraft - A software packaging and deployment system developed by Canonical for operating systems that use the Linux kernel
- arkade - A portable marketplace for downloading your favourite devops CLIs and installing helm charts to your Kubernetes cluster
- dpkg - The base package management system for Debian
214 - File Sharing and Remote Access
- File Servers and Protocols
- SMB - A network communication protocol for providing shared access to files, printers, and serial ports between nodes on a network
- Samba - The standard Windows interoperability suite of programs for Linux and Unix
- FTP - A standard communication protocol used for the transfer of computer files from a server to a client on a computer network
- vsftpd - A GPL licensed FTP server for UNIX-like systems, including Linux
- SFTP - A network protocol that provides file access, file transfer, and file management over any reliable data stream
- SMB - A network communication protocol for providing shared access to files, printers, and serial ports between nodes on a network
- Remote Access Servers and Protocols
- SSH - A cryptographic network protocol for operating network services securely over an unsecured network
- openssh - The premier connectivity tool for remote login with the SSH protocol
- RDP - A proprietary protocol developed by Microsoft which provides a user with a graphical interface to connect to another computer over a network connection
- xrdp - An open-source Remote Desktop Protocol server
- RFB - A simple protocol for remote access to graphical user interfaces
- Mosh - A replacement for interactive SSH terminals
- SSH - A cryptographic network protocol for operating network services securely over an unsecured network
220 - Domain Name System and Email
221 - Domain Name System
- Core Concepts & Protocols
- Domain Registration & Lookup
- IANA WHOIS Service - A service to look up the registration data of a domain name or IP address
- Registration Data Access Protocol (RDAP) - A computer network communications protocol that delivers registration data from Domain Name Registries and Regional Internet Registries
- Server & Resolver Implementations
- BIND (dnsutils) - A very flexible, full-featured DNS system
- dnsmasq - A lightweight, easy to configure DNS forwarder, DHCP and router advertisement server
- CoreDNS - A DNS server that chains plugins
- systemd-resolved - A system service that provides network name resolution to local applications
- mDNS Implementations
- Client Tools
- Cloud Services
- Amazon Route53 - A highly available and scalable cloud Domain Name System web service
- Google Cloud DNS - A high-performance, resilient, global Domain Name System service that publishes your domain names to the global DNS in a cost-effective way
222 - Email System
- Core Concepts & Protocols
- Email - A method of exchanging messages between people using electronic devices
- SMTP - A communication protocol for electronic mail transmission
- POP - An application-layer Internet standard protocol used by e-mail clients to retrieve e-mail from a mail server
- IMAP - An Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection
- MIME - A standard that extends the format of email messages to support text in character sets other than ASCII
- Quoted-printable encoding - An encoding that represents data in the 8-bit ASCII character set, so that it can be sent using a 7-bit data path
- Base64 - A group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation
- Mailbox Formats
- Unix Mbox
- Maildir
- Server Software (MTA/MDA)
- Postfix - A mail server that started life at IBM research as an alternative to the widely-used Sendmail program
- Maddy Mail Server - An all-in-one mail server that implements all functionality required to run a mail service
- IMAP
- Cyrus IMAP - A highly scalable enterprise mail system designed for use in small to large enterprise environments
- Dovecot - An open source IMAP and POP3 email server for Linux/UNIX-like systems
- Client Software & Utilities
- TUI Clients & Utilities
- mailutils - A set of libraries and utilities for handling electronic mail
- mail command - A command to send and receive mail
- Mutt - A small but very powerful text based program for reading and sending electronic mail under unix operating systems
- swaks - A featureful, flexible, scriptable, transaction-oriented SMTP test tool
- Pop - A library for sending emails from your terminal
- GNU sharutils - A set of utilities for creating and unpacking shell archives
- mailutils - A set of libraries and utilities for handling electronic mail
- Libraries
- go-mail - A simple to use, yet feature rich mail library for Go
- GUI Clients
- Thunderbird - A free email application that’s easy to set up and customize
- Sylpheed - A simple, lightweight but featureful, and easy-to-use e-mail client
- TUI Clients & Utilities
- Spam Test and Reputation
- mail-tester - A free online service that allows you to test your emails for Spam, Malformed Content and Mail Server Configuration problems
- Spamhaus Project - A non-profit organization that tracks spam and related cyber threats
- Cloud Services
- Amazon SES - A cost-effective, flexible, and scalable email service that enables developers to send mail from within any application
- Twilio SendGrid - A cloud-based email delivery service that helps businesses with email delivery
230 - Linux Container and WebAssembly
230 - Container Standards and Utilities
- Containerization - A form of operating-system-level virtualization
- Linux Distros for Containers
- Alpine Linux - A security-oriented, lightweight Linux distribution based on musl libc and busybox
- apk-tools - A package manager originally built for Alpine Linux
- Flatcar Container Linux - An immutable Linux distribution for containers
- Alpine Linux - A security-oriented, lightweight Linux distribution based on musl libc and busybox
- Utilities in Containers
- busybox - A single small executable that combines tiny versions of many common UNIX utilities
- The Open Container Initiative (OCI) - An open governance structure for the express purpose of creating open industry standards around container formats and runtimes
- Containers for Development
- Development Containers - An open specification for enriching containers with development-specific settings, tools, and configuration
231 - Container Runtimes and Tools
- Container Engines
- Docker Engine - An open source containerization technology for building and containerizing your applications
- docker-compose - A tool for defining and running multi-container Docker applications
- containerd - An industry-standard container runtime with an emphasis on simplicity, robustness and portability
- podman - A powerful container engine for building, managing, and running containers and pods
- Docker Engine - An open source containerization technology for building and containerizing your applications
- Image Building Tools
- Docker Build - A part of the Docker Engine that automates the process of creating a Docker image from a Dockerfile and a context
- buildah - A tool that facilitates building Open Container Initiative (OCI) container images
- Kaniko - A tool to build container images from a Dockerfile, inside a container or Kubernetes cluster
- Image Inspection & Management Tools
- TUI & Helper Tools
- lazydocker - A terminal UI for both docker and docker-compose
- Local Environment Provisioners
- Colima - A tool that provides container runtimes on macOS (and Linux) with minimal setup
232 - Container Registries
- Container Registries
- GitLab Container Registry - A secure and private registry for Docker images
- Nexus Repository Manager 3 - A sophisticated repository manager
- Amazon ECR - A fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts
- Azure Container Registry - A private registry for managing container images and related artifacts
- Harbor - An open source registry that secures artifacts with policies and role-based access control
234 - WebAssembly
- Standards
- WebAssembly - A binary instruction format for a stack-based virtual machine
- WebAssembly System Interface (WASI) - A modular system interface for WebAssembly
- WASIX - The long term stabilization and support of the existing WASI ABI plus additional non-invasive syscall extensions
- WebAssembly Runtimes
240 - Kubernetes Administration
240 - Core Kubernetes
- Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications
- Managed K8s Services
- GKE - A managed, production-ready environment for deploying containerized applications
- Azure Kubernetes Service - A managed container orchestration service based on the open source Kubernetes system
- AWS EKS - A managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers
- Architecture
- Master node
- kube-apiserver - Responsible for API services
- kube-scheduler - Responsible for scheduling
- kube-controller-manager - Responsible for container orchestration
- Compute node
- kubelet - watches the API server for pods on that node and makes sure they are running
- cAdvisor - collects metrics about pods running on that particular node
- kube-proxy - watches the API server for pods/services changes in order to maintain the network up to date
- container runtime - responsible for managing container images and running containers on that node
- Master node
- Interface Standards
- CNI (Container Networking Interface)
- Calico - A networking and security solution that enables Kubernetes workloads and non-Kubernetes/legacy workloads to communicate seamlessly and securely
- Cilium - An open source, cloud native solution for providing, securing, and observing network connectivity between workloads, fueled by the revolutionary Kernel technology eBPF
- CSI (Container Storage Interface)
- CRI (Container Runtime Interface)
- CNI (Container Networking Interface)
- Workloads - The objects you use to manage and run your containers on the cluster
- Pod
- assignment - The process of constraining a Pod so that it is restricted to run on particular nodes, or to prefer to run on particular nodes
- taint and toleration - A mechanism that allows you to ensure that pods are not placed on inappropriate nodes
- lifecycle - The lifecycle of a Pod
- liveness probe - A probe the kubelet uses to know when to restart a container
- requests and limits
- eviction
- Deployment, ReplicaSet, StatefulSet, DaemonSet
- Pod
- Services, Load Balancing & Networking
- Kubernetes network model - A set of fundamental requirements and principles for networking in a Kubernetes cluster
- Service, Ingress, Ingress Controllers
- Storage - A powerful volume subsystem with an API that abstracts how storage is provided and consumed
- PersistentVolume, PVC, StorageClass
- Configuration - A range of mechanisms that let you inject configuration data into the Pods that run your applications
- Secret, ConfigMap
- Security & Policy
- Kubernetes RBAC - A method of regulating access to computer or network resources based on the roles of individual users within an enterprise
- PodDisruptionBudget - An object that limits the number of concurrent disruptions that your application experiences, allowing for high availability
- Security context - A definition of privilege and access control settings for a Pod or Container
- Autoscaling
- HPA - The component that automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
- Cluster Autoscaler - A tool that automatically adjusts the size of the Kubernetes cluster
241 - Kubernetes Ecosystem
- Application Packaging & Configuration
- Developer Workflow Tools
- Skaffold - A command line tool that facilitates continuous development for container-based applications
- Platform Extensions
- kube-fencing - A solution for fencing of stateful application's nodes in kubernetes
- KubeVirt - A virtual machine management add-on for Kubernetes
- Operator & Controller Development
- Kubebuilder - A framework for building Kubernetes APIs using custom resource definitions (CRDs)
- CLI Plugin Management
- Krew - The plugin manager for kubectl command-line tool
- kubectl-node-shell - A kubectl plugin to run a root shell on a node
- kubectl-tree - A kubectl plugin to explore ownership relationships between Kubernetes objects
- kubectl-pod-inspect - A kubectl plugin to view pod and container status at a glance
- kubepug - A pre-flight checking tool for Kubernetes APIs
- rakkess - A kubectl plugin to show an access matrix for all available resources
- ketall - A kubectl plugin to get all resources
- Krew - The plugin manager for kubectl command-line tool
- Resource Optimization
- Goldilocks - A utility that can help you identify a starting point for resource requests and limits
- Vendor-specific Tools
- eksctl - The official CLI for Amazon EKS
- Dashboards
- Kubernetes Lens IDE - The Kubernetes IDE
- k9s - A terminal based UI to interact with your Kubernetes cluster
- KDash - A simple terminal dashboard for Kubernetes built with Rust
- k1s - A minimalistic Kubernetes dashboard
- Seabird - The native desktop app that simplifies working with Kubernetes
- Headlamp - A user-friendly Kubernetes UI focused on extensibility
- Local K8s
- FaaS on K8s
- K8s Operators
- Prometheus Operator - The operator that creates/configures/manages Prometheus clusters atop Kubernetes
- kube-prometheus - A collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring
- OpenTelemetry Operator - An implementation of a Kubernetes Operator for OpenTelemetry
- Elastic Cloud on Kubernetes (ECK) - The official operator for the Elastic Stack on Kubernetes
- Rook - An open source cloud-native storage orchestrator for Kubernetes
- Prometheus Operator - The operator that creates/configures/manages Prometheus clusters atop Kubernetes
250 - IaC, Continuous Delivery, and Operations
251 - Infrastructure and Configuration as Code
- Infrastructure as Code
- Hashicorp Terraform - An infrastructure as code tool that lets you build, change, and version infrastructure safely and efficiently
- OpenTofu - An open-source alternative to Terraform
- Pulumi - An infrastructure as code platform that allows you to use familiar programming languages and tools to build, deploy, and manage cloud infrastructure
- Configuration Management & Automation
- Ansible - An open source IT automation engine that automates provisioning, configuration management, application deployment, orchestration, and many other IT processes
- cloud-init - The standard for customising cloud instances
- Image Building
- Hashicorp Packer - A tool for creating identical machine images for multiple platforms from a single source configuration
- Terraform/OpenTofu Ecosystem
- Terraform/OpenTofu Provider: Core Functions - A Terraform/OpenTofu provider for performing core functions
- TerraGrant - A thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state
- TerraTest - A Go library that provides patterns and helper functions for testing infrastructure
- Atmos - A universal tool for DevOps and Cloud Engineering that orchestrates workflows and simplifies the management of infrastructure
- GitLab-managed Terraform/OpenTofu state - A feature that allows you to store your Terraform state files in GitLab
- tf.libsonnet - A collection of Jsonnet libraries for generating Terraform code
- terraform-docs - A utility to generate documentation from Terraform modules in various output formats
- Terraformer - A CLI tool to generate terraform files from existing infrastructure
- Vender-specific Tools
- AWS CloudFormation - A service that helps you model and set up your Amazon Web Services resources
- AWS CDK - An open source software development framework to define your cloud application resources using familiar programming languages
- AWS SAM - An open-source framework for building serverless applications
- Azure Resource Manager - The deployment and management service for Azure
- Bicep language - A domain-specific language (DSL) that uses declarative syntax to deploy Azure resources
252 - Continuous Delivery
Note: Many package registries support multiple artifact types, including container images (see 232) and OS packages.
- Continuous Delivery Tools
- Jenkins - An open source automation server which enables developers around the world to reliably build, test, and deploy their software
- Blue Ocean for Jenkins Pipelines - A project that rethinks the user experience of Jenkins
- Python Jenkins - A python wrapper for the Jenkins REST API
- GitLab CI/CD - A part of GitLab that you can use to automate the builds, integration, and verification of your source code
- GitHub Actions - A feature that makes it easy to automate all your software workflows
- Concourse CI - An automation system written in Go
- Azure Pipelines - A cloud service that you can use to automatically build and test your code project and make it available to other users
- Jenkins - An open source automation server which enables developers around the world to reliably build, test, and deploy their software
- GitOps Style CD
- Cloud Native Application Delivery
- Open Application Model - A specification for describing applications so that they can be deployed and managed across any platform
- KubeVela - A modern software delivery platform that makes deploying and operating applications across today's hybrid, multi-cloud environments easier, faster and more reliable
- Flagger - A progressive delivery tool that automates the release process for applications running on Kubernetes
- Terraform Integration
- Atrantis - A self-hosted golang application that listens for Terraform pull request events via webhooks
- Private Package Registries
- GitLab Package Registry - A feature that allows you to publish and share packages for a variety of supported package managers
- GitHub Packages - A software package hosting service that allows you to host your software packages privately or publicly
- Nexus Repository Manager 3 - A sophisticated repository manager
- Azure Artifacts - A service that enables you to create and share Maven, npm, NuGet, and Python package feeds from public and private sources
- Version Conventions
- Semantic Versioning - A simple set of rules and requirements that dictate how version numbers are assigned and incremented
- semver - A semantic versioner for npm
- Semantic Versioning - A simple set of rules and requirements that dictate how version numbers are assigned and incremented
253 - Fleet Management & Operations
Please see also the Security class.
- Fleet Management
- AWS Systems Manager - A secure end-to-end management solution for resources on AWS and in multicloud and hybrid environments
- Azure Automation - A cloud-based automation and configuration service that supports consistent management across your Azure and non-Azure environments
- Azure Update Manager - A unified service to help manage and govern updates for all your machines
- Backup
- Vendor-specific Tools
- AWS Backup - A fully managed service that centralizes and automates data protection across AWS services, in the cloud, and on premises
- Azure Backup - A service that provides simple, secure, and cost-effective solutions to back up your data and recover it from the Microsoft Azure cloud
- K8s-specific Tools
- Velero - An open source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes
- Generic
- Restic - A fast, secure, efficient backup program
- Vendor-specific Tools
- Runbook Automation
260 - System Testing, Chaos Engineering, and FinOps
261 - Performance & Load Testing
- Concepts
- Performance Testing - The practice of evaluating how a system performs in terms of responsiveness and stability under a particular workload
- Performance Testing Tools
- Grafana k6 - The open-source load testing tool that makes performance testing easy and productive for engineering teams
- Gatling - The load testing tool for programmers that helps engineering teams shift performance concerns left
- Apache Jmeter - A pure Java application designed to load test functional behavior and measure performance
- ab - A tool for benchmarking your Apache Hypertext Transfer Protocol (HTTP) server
- stress-ng - A tool that imposes configurable amounts of CPU, memory, I/O, and disk stress on the system
- sysbench - A scriptable multi-threaded benchmark tool based on LuaJIT
- fio - A tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user
- iPerf - The ultimate speed test tool for TCP, UDP and SCTP
- plow - A high-performance HTTP benchmarking tool
262 - Chaos Engineering
- Concepts
- Chaos Engineering - The practice of experimenting on a system in order to build confidence in the system's capability to withstand turbulent conditions in production
- Principles of chaos engineering - The principles that define the practice of chaos engineering
- Chaos Engineering Tools
- Chaos Monkey - A resiliency tool that helps applications tolerate random instance failures
- Litmus - A cloud-native chaos engineering framework for Kubernetes
- Chaos Mesh - A cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments
- Toxiproxy - A TCP proxy to simulate network and system conditions for chaos and resiliency testing
263 - FinOps
- Concepts
- FinOps principles - The cultural practice of bringing financial accountability to the variable spend model of cloud
- FinOps Tools
- FinOps toolkit - A collection of tools, resources, and best practices for implementing FinOps in your organization
- AWS Cost Explorer - A tool that enables you to view and analyze your costs and usage
- OpenCost - The open source solution for monitoring Kubernetes spend
- Karpenter - A flexible, high-performance Kubernetes cluster autoscaler
- Cloud Custodian - A rules engine for managing public cloud accounts and resources
270 - System Observability
270 - Common Concepts and Software
- Concepts
- Observability - A measure of how well internal states of a system can be inferred from knowledge of its external outputs
- Instrumentation Libraries
- OpenTelemetry - A vendor-neutral open source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs
- Micrometer - A metrics instrumentation library for JVM-based applications
- Tools
- Uptime Kuma - An easy-to-use self-hosted monitoring tool
271 - Telemetry Shipment
- Data Shippers
- Prometheus exporters - The services that expose Prometheus metrics
- node-exporter - An exporter for hardware and OS metrics exposed by *NIX kernels
- blackbox-exporter - A tool that allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP and gRPC
- Grafana Alloy - An open source OpenTelemetry collector with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles
- Fluent Bit - A super fast, lightweight, and highly scalable logging, metrics, and traces processor and forwarder
- Fluentd - An open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data
- Filebeat - A lightweight shipper for forwarding and centralizing log data
- Logstash - An open source server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash"
- Telegraf - An open source server agent that helps you collect metrics from your stacks, sensors, and systems
- Metricbeat - A lightweight shipper that you can install on your servers to periodically collect metrics from the operating system and from services running on the server
- rsyslog - The rocket-fast system for log processing
- Prometheus exporters - The services that expose Prometheus metrics
- Vendor-specific Tools
- Azure Monitor Agent - The agent that collects monitoring data from the guest operating system of Azure and hybrid virtual machines
- Cloudwatch Agent - The agent you can use to collect both system-level metrics and log files from Amazon EC2 instances and on-premises servers
272 - Telemetry Collection
- Datastore and Alerting Tools
- Prometheus - An open-source systems monitoring and alerting toolkit
- Alertmanager - A tool that handles alerts sent by client applications such as the Prometheus server
- amtool - A cli tool for interacting with the Alertmanager API
- InfluxDB - A time series database built from the ground up to handle high write and query loads
- InfluxQL - An SQL-like query language for interacting with data in InfluxDB
- influx cli - The command line interface for InfluxDB 2.0
- Grafana Mimir - An open source, horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus
- Grafana Loki - A horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus
- Grafana Tempo - An open source, easy-to-use and high-scale distributed tracing backend
- TraceQL - A query language designed for selecting traces
- ElasticSearch - An open source distributed, RESTful search and analytics engine, scalable data store, and vector database
- Elastic Common Schema - An open source specification, developed with support from the Elastic user community
- Ingest pipelines - A feature that lets you perform common transformations on your data before indexing
- Dissect and Grok - The processors that let you extract structured fields out of a single text field
- Graphite - A highly scalable real-time graphing system
- Grafana Alerting - A feature that allows you to create and manage alerts for your data
- OpenObserve - An open-source observability platform designed for modern applications
- Vendor-specific Tools
- Azure Monitor - A comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments
- Kusto Query Language - A powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical models, and more
- App Insights - A feature of Azure Monitor, is an extensible Application Performance Management (APM) service for developers and DevOps professionals
- AWS CloudWatch - A monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers
- Azure Monitor - A comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments
- Visualization Tools