Skip to main content

200 - System Administration and SRE

Gemini says "This is an exceptionally well-classified and comprehensive list. The structure is logical, progressing from fundamental concepts to specialized and modern operational practices. It's clear, detailed, and reflects a current understanding of the System Administration and SRE landscape."

200 - Operating Systems, Networking, and Modern Infrastructure

Note: Please see also Class 103 - Concurrency and Parallelism.

200 - Core OS Concepts

  • Core Concepts
    • System call - The programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed
    • Protection ring - A mechanism to protect data and functionality from faults and malicious behavior
    • Daemon - A computer program that runs as a background process, rather than being under the direct control of an interactive user
    • Environment variable - A named variable whose value is set outside the program, typically through functionality built into the operating system or a microservice
    • POSIX standard - A family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems
  • Process Management
    • Process - The instance of a computer program that is being executed by one or more threads
      • Thread - The smallest sequence of programmed instructions that can be managed independently by a scheduler
      • Scheduling - The action of assigning resources to perform tasks
      • Context switch - The process of storing the state of a process or thread, so that it can be restored and resume execution at a later point
      • Interrupt - A request for the processor to interrupt currently executing code, so that the event can be processed in a timely manner
  • Inter-Process Communication (IPC)
    • Pipes
      • Anonymous pipe - A simplex FIFO communication channel that may be used for one-way interprocess communication
      • Named pipe - An extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication
    • Shared memory - A memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies
    • Signal - An asynchronous notification sent to a process or to a specific thread within the same process in order to notify it of an event that occurred
    • Unix domain socket - A data communications endpoint for exchanging data between processes executing on the same host operating system
  • Memory Management
    • Virtual memory - A memory management technique that provides an idealized abstraction of the storage resources that are actually available on a given machine
      • Memory paging - A memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory
      • Page fault - A type of exception raised by computer hardware when a running program accesses a memory page that is not currently mapped by the memory management unit into the virtual address space of a process
      • Resident set size (RSS) - The portion of memory occupied by a process that is held in main memory
      • Working set size (WSS) - The set of pages in the virtual address space of the process that are currently resident in main memory
    • Page cache - A hardware or software component that stores data so that future requests for that data can be served faster
  • Storage Management
    • Disk partitioning - The creation of one or more regions on a secondary storage device, so that each region can be managed separately
    • Loop device - A pseudo-device that makes a file accessible as a block device
    • File system - A method and data structure that the operating system uses to control how data is stored and retrieved
      • Journaling file system - A file system that keeps a journal, a circular log of changes that have not yet been committed to the main part of the file system
      • Path - The general form of the name of a file or directory, specifies a unique location in a file system
      • Glob pattern - A pattern that specifies sets of filenames with wildcard characters
      • File handle/descriptor - A unique identifier for a file or other input/output resource, such as a pipe or network socket
      • Symbolic link - A term for any file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution
      • Permissions - A feature of many modern file systems which control the ability of the users of a computer to view, change, navigate, and execute the contents of the file system
        • Setuid - A Unix access rights flag that allows users to run an executable with the permissions of the executable's owner or group
        • Sticky bit - A user ownership access right flag that can be assigned to files and directories on Unix-like systems
      • Inode - A data structure in a Unix-style file system that describes a file-system object such as a file or a directory
    • RAID - A data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both

201 - Base Network Concepts & Protocols

Note: Please see also 220 - Domain Name System and Email, 300 - Web and API Style Standards, and 630 - PKI and Secure Communications

  • The OSI Model - A conceptual model that provides a common basis for the coordination of standards development for the purpose of systems interconnection
  • The Internet - The global system of interconnected computer networks that uses the Internet protocol suite to communicate between networks and devices
    • IP - The network layer communications protocol in the Internet protocol suite
      • Link-local address - A network address that is valid only for communications within the network segment or the broadcast domain that the host is connected to
      • IP-multicast - A method of sending Internet Protocol datagrams to a group of interested receivers in a single transmission
      • DHCP - A network management protocol used on Internet Protocol networks for automatically assigning IP addresses and other communication parameters to devices connected to the network
      • ICMP - A supporting protocol in the Internet protocol suite
      • NAT - A method of mapping an IP address space into another by modifying network address information in the IP header of packets while they are in transit across a traffic routing device
      • IPv6 - The most recent version of the Internet Protocol, the communications protocol that provides an identification and location system for computers on networks and routes traffic across the Internet
        • Unique local address - An IPv6 address in the address block fc00::/7
        • DHCPv6 - A network protocol for configuring Internet Protocol version 6 hosts with IP addresses, IP prefixes and other configuration data required to operate in an IPv6 network
        • ICMPv6 - The implementation of the Internet Control Message Protocol for Internet Protocol version 6
        • NAT64 - An IPv6 transition mechanism that facilitates communication between IPv6 and IPv4 hosts
        • NDP - A protocol in the Internet protocol suite used with Internet Protocol Version 6
    • Routing table - A data table stored in a router or a network host that lists the routes to particular network destinations
      • CIDR - A method for allocating IP addresses and for IP routing
    • Network socket - A software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network
    • TCP - A main protocol of the Internet protocol suite
    • UDP - A core member of the Internet protocol suite
    • QUIC - A UDP-based, stream-multiplexing, encrypted transport protocol
  • Ethernet - A family of wired computer networking technologies
    • ARP - A communication protocol used for discovering the link layer address, such as a MAC address, associated with a given internet layer address
      • MAC address - A unique identifier assigned to a network interface controller for use as a network address in communications within a network segment
    • VLAN - A broadcast domain that is partitioned and isolated in a computer network at the data link layer

202 - Linux Kernel Features

Note: Please see also 210 - Linux Administration

  • The Linux Kernel - The main component of a Linux operating system and is the core interface between a computer's hardware and its processes
    • Threads
      • Pthreads - An execution model that exists independently from a programming language, as well as a parallel execution model
    • Filesystems
      • ext4 - The default file system for many major Linux distributions
      • xfs - A high-performance journaling file system created by Silicon Graphics, Inc
      • overlayfs - A union mount filesystem implementation for Linux
      • proc.5 - A virtual filesystem that provides an interface to kernel data structures
      • sysfs.5 - A virtual filesystem that exports information about various kernel subsystems, hardware devices, and associated device drivers
    • Container Support
      • cgroups - A Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored
      • namespaces - A wrapper for a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource
      • lxc/rootfs - The userspace interface for the Linux kernel containment features
      • nsenter - A command that executes a program in the namespaces of other processes
    • FUSE (Filesystem in Userspace) - An interface for userspace programs to export a filesystem to the Linux kernel
      • s3fs - A FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem
    • eBPF (Extended Berkeley Packet Filter) - A revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context

203 - Virtualization

  • Virtualization - The act of creating a virtual version of something, including virtual computer hardware platforms, storage devices, and computer network resources
  • Type-1 Hypervisors
    • KVM - A full virtualization solution for Linux on x86 hardware containing virtualization extensions
    • Hyper-V - A hardware virtualization product from Microsoft
    • Proxmox VE - A complete, open-source server management platform for enterprise virtualization
  • Type-2 Hypervisors
    • VirtualBox - A powerful x86 and AMD64/Intel64 virtualization product for enterprise as well as home use
    • QEMU - A generic and open source machine emulator and virtualizer
  • Vitrualization Management
    • libvert - A toolkit to manage virtualization platforms
  • CPU Emulators
    • QEMU - A generic and open source machine emulator and virtualizer

204 - Applied & Distributed Systems

  • Distributed Computing - A field of computer science that studies distributed systems
    • Single point of failure - A part of a system that, if it fails, will stop the entire system from working
    • Fault tolerance - The property that enables a system to continue operating properly in the event of the failure of some of its components
    • Load balancing - The process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient
    • Fallacies of distributed computing - A set of assertions describing false assumptions that programmers new to distributed applications invariably make
    • Byzantine fault - A condition of a distributed system, where components may fail and there is imperfect information about whether a component has failed
      • Consensus - A fault-tolerant mechanism that is used in distributed systems to achieve the necessary agreement on a single data value among distributed processes or systems
  • Cloud Computing Services
    • Amazon EC2 - A web service that provides secure, resizable compute capacity in the cloud
      • Amazon EBS - An easy-to-use, high-performance block storage service designed for use with Amazon Elastic Compute Cloud
    • Amazon VPC - A service that lets you launch AWS resources in a logically isolated virtual network that you define
    • Amazon ELB - A service that automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions
    • Azure Virtual Machines - A service to provision Windows and Linux virtual machines in seconds
      • Azure Disk Storage - A high-performance, durable block storage for Azure Virtual Machines
    • Azure Virtual Network - The fundamental building block for your private network in Azure
    • Azure Load Balancer - A service that allows you to distribute traffic to your backend virtual machines
    • Google Cloud Compute Engine - A service that delivers configurable virtual machines running in Google’s data centers with access to high-performance networking
  • Cloud Emulators
    • LocalStack - A fully functional local cloud stack to develop and test your cloud and serverless apps offline

205 - Computer Hardware

  • CPU Architectures
    • x86-64 - A 64-bit version of the x86 instruction set
    • ARM64 - The 64-bit extension of the ARM architecture family

210 - Linux Administration

210 - Linux Core Components

  • Linux distros for hosts
    • Ubuntu server - The standard platform for public clouds, on-premises, and IoT devices
    • Debian - A complete Free Operating System
    • Arch Linux - A simple, lightweight distribution
  • linux-pam - A system of libraries that handle the authentication tasks of applications and services in a Linux system
  • Systemd - A system and service manager for Linux operating systems
    • journald - A system service that collects and stores logging data
    • hostnamed - A system service that may be used to control the hostname and related machine metadata from user programs
    • networkd - A system service that manages networks
    • resolved - A system service that provides network name resolution to local applications
    • timesyncd - A system service that may be used to synchronize the local system clock with a remote Network Time Protocol server

211 - Host Administration

  • Basic Tools
    • util-linux - A random collection of Linux utilities
      • lsblk - A command that lists information about all available or the specified block devices
      • lsns - A command that lists information about all the currently accessible namespaces or about the given namespace
      • swapon - A command used to specify devices on which paging and swapping are to take place
    • rsync - An open source utility that provides fast incremental file transfer
    • Vixie Cron - An open source implementation of POSIX Cron
    • logrotate - allows for the automatic rotation compression, removal and mailing of log files
    • Syslog - A standard for message logging
    • procps - A set of command line and full-screen utilities that provide information out of the pseudo-filesystem most commonly located at /proc
      • ps - A command that displays information about a selection of the active processes
      • top - A program that provides a dynamic real-time view of a running system
      • free - A command that displays the total amount of free and used physical and swap memory in the system
      • vmstat - A command that reports information about processes, memory, paging, block IO, traps, disks and cpu activity
    • psmisc - A package of small utilities that use the proc file-system
      • pstree - A command that shows running processes as a tree
      • killall - A command that sends a signal to all processes running any of the specified commands
    • lsof - A command for LiSting Open Files
    • sudo - A system administrator to delegate authority to give certain users the ability to run some commands as root or another user
    • shadow-utils - includes the necessary programs for converting UNIX password files to the shadow password format, plus programs for managing user and group accounts
      • useradd - A low level utility for adding users
    • strace - A diagnostic, debugging and instructional userspace utility for Linux
    • inxi - A full featured system information script
    • Monitors
      • Monit - A small Open Source utility for managing and monitoring Unix systems
      • atop - An ASCII full-screen performance monitor for Linux
      • sysstat - A collection of performance monitoring tools for Linux
        • iostat - A command used for monitoring system input/output device loading
      • smem - A tool that can give numerous reports on memory usage on Linux systems
    • Clock syncing
      • NTP - A networking protocol for clock synchronization between computer systems over packet-switched, variable-latency data networks
      • chrony - A versatile implementation of the Network Time Protocol
      • pool.ntp.org - A big virtual cluster of timeservers providing reliable easy to use NTP service for millions of clients
    • jc - A CLI tool and Python library that converts the output of popular command-line tools and file-types to JSON or Dictionaries
    • aha - An Ansi HTML Adapter
      • NO_COLOR - An environment variable to disable ANSI color in command-line software
  • Modern Tools
    • lsd - A rewrite of GNU ls with a lot of added features like colors, icons, tree-view, and more formatting options
    • eza - A modern replacement for ls
    • broot - A new way to see and navigate directory trees
    • bat - A cat(1) clone with wings
    • dust - A more intuitive version of du in rust
    • dua - A tool to view disk space usage and delete unwanted data, fast
    • duf - A better 'df' alternative
    • procs - A modern replacement for ps written in Rust
    • htop - An interactive process viewer for Unix systems
    • btop++ - A resource monitor for Linux, macOS, and FreeBSD
    • glances - A cross-platform monitoring tool which aims to present a large amount of monitoring information through a curses or Web based interface
    • neofetch - A command-line system information tool

212 - Network Administration

  • Basic Tools
    • iproute2 - A collection of utilities for controlling TCP / IP networking and traffic control in Linux
      • ip - The main command to show / manipulate routing, network devices, interfaces and tunnels
      • ss - A utility to investigate sockets
    • net-tools (legacy) - A collection of programs for controlling the network subsystem of the Linux kernel
      • ifconfig - A command used to configure a network interface
      • netstat - A command that prints network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
    • traceroute - A computer network diagnostic tool for displaying the route and measuring transit delays of packets across an Internet Protocol network
    • NetworkManager - A daemon that sits on top of libudev and other Linux kernel interfaces and provides a high-level interface for network configuration
    • Ubuntu NetPlan - A network configuration abstraction renderer
    • tcpdump - A powerful command-line packet analyzer
    • wireshark - The world's foremost network protocol analyzer
    • nmap - An open source tool for network exploration and security auditing
      • ncat - A feature-packed networking utility which reads and writes data across networks from the command line
  • Layer 5 Gateway
    • SOCKS Proxy - An Internet protocol that exchanges network packets between a client and server through a proxy server
      • Dante - A SOCKS server and SOCKS client, implementing RFC 1928 and related standards
      • tun2socks - A SOCKS proxy for TCP and UDP, that handles all connections from a TUN device
      • proxychains - A tool that forces any TCP connection made by any given application to follow through proxy like TOR or any other SOCKS4, SOCKS5 or HTTP(S) proxy

213 - OS Package Management

  • Package Management Tools
    • dpkg - The base package management system for Debian
      • apt - A command-line utility for installing, updating, removing, and otherwise managing deb packages on Ubuntu, Debian, and related Linux distributions
    • Pacman - A utility which manages software packages in Linux
      • Yay - An AUR Helper Written in Go
    • Homebrew - The Missing Package Manager for macOS (or Linux)
    • pipx - A tool to install and run Python applications in isolated environments
    • Flatpak - A system for building, distributing, and running sandboxed desktop applications on Linux
    • Snapcraft - A software packaging and deployment system developed by Canonical for operating systems that use the Linux kernel
    • arkade - A portable marketplace for downloading your favourite devops CLIs and installing helm charts to your Kubernetes cluster

214 - File Sharing and Remote Access

  • File Servers and Protocols
    • SMB - A network communication protocol for providing shared access to files, printers, and serial ports between nodes on a network
      • Samba - The standard Windows interoperability suite of programs for Linux and Unix
    • FTP - A standard communication protocol used for the transfer of computer files from a server to a client on a computer network
      • vsftpd - A GPL licensed FTP server for UNIX-like systems, including Linux
    • SFTP - A network protocol that provides file access, file transfer, and file management over any reliable data stream
  • Remote Access Servers and Protocols
    • SSH - A cryptographic network protocol for operating network services securely over an unsecured network
      • openssh - The premier connectivity tool for remote login with the SSH protocol
    • RDP - A proprietary protocol developed by Microsoft which provides a user with a graphical interface to connect to another computer over a network connection
      • xrdp - An open-source Remote Desktop Protocol server
    • RFB - A simple protocol for remote access to graphical user interfaces
      • x11vnc - A VNC server for X11
      • TightVNC - A free remote desktop application
    • Mosh - A replacement for interactive SSH terminals

220 - Domain Name System and Email

221 - Domain Name System

  • Core Concepts & Protocols
    • DNS - The hierarchical and decentralized naming system used to identify computers, services, and other resources reachable through the Internet or other Internet Protocol networks
    • mDNS - A protocol that resolves hostnames to IP addresses within small networks that do not include a local name server
  • Domain Registration & Lookup
  • Server & Resolver Implementations
    • BIND (dnsutils) - A very flexible, full-featured DNS system
    • dnsmasq - A lightweight, easy to configure DNS forwarder, DHCP and router advertisement server
    • CoreDNS - A DNS server that chains plugins
    • systemd-resolved - A system service that provides network name resolution to local applications
    • mDNS Implementations
      • Avahi - A system which facilitates service discovery on a local network via the mDNS/DNS-SD protocol suite
      • Bonjour - Apple's implementation of zero-configuration networking
  • Client Tools
    • Part of BIND
      • dig - A flexible tool for interrogating DNS name servers
      • nslookup - A program to query Internet domain name servers
    • dog - A command-line DNS client
    • Doggo - A modern command-line DNS client (like dig) written in Go
  • Cloud Services
    • Amazon Route53 - A highly available and scalable cloud Domain Name System web service
    • Google Cloud DNS - A high-performance, resilient, global Domain Name System service that publishes your domain names to the global DNS in a cost-effective way

222 - Email System

  • Core Concepts & Protocols
    • Email - A method of exchanging messages between people using electronic devices
    • SMTP - A communication protocol for electronic mail transmission
    • POP - An application-layer Internet standard protocol used by e-mail clients to retrieve e-mail from a mail server
    • IMAP - An Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection
    • MIME - A standard that extends the format of email messages to support text in character sets other than ASCII
      • Quoted-printable encoding - An encoding that represents data in the 8-bit ASCII character set, so that it can be sent using a 7-bit data path
      • Base64 - A group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation
  • Mailbox Formats
    • Unix Mbox
    • Maildir
  • Server Software (MTA/MDA)
    • Postfix - A mail server that started life at IBM research as an alternative to the widely-used Sendmail program
    • Maddy Mail Server - An all-in-one mail server that implements all functionality required to run a mail service
    • IMAP
      • Cyrus IMAP - A highly scalable enterprise mail system designed for use in small to large enterprise environments
      • Dovecot - An open source IMAP and POP3 email server for Linux/UNIX-like systems
  • Client Software & Utilities
    • TUI Clients & Utilities
      • mailutils - A set of libraries and utilities for handling electronic mail
      • Mutt - A small but very powerful text based program for reading and sending electronic mail under unix operating systems
      • swaks - A featureful, flexible, scriptable, transaction-oriented SMTP test tool
      • Pop - A library for sending emails from your terminal
      • GNU sharutils - A set of utilities for creating and unpacking shell archives
    • Libraries
      • go-mail - A simple to use, yet feature rich mail library for Go
    • GUI Clients
      • Thunderbird - A free email application that’s easy to set up and customize
      • Sylpheed - A simple, lightweight but featureful, and easy-to-use e-mail client
  • Spam Test and Reputation
    • mail-tester - A free online service that allows you to test your emails for Spam, Malformed Content and Mail Server Configuration problems
    • Spamhaus Project - A non-profit organization that tracks spam and related cyber threats
  • Cloud Services
    • Amazon SES - A cost-effective, flexible, and scalable email service that enables developers to send mail from within any application
    • Twilio SendGrid - A cloud-based email delivery service that helps businesses with email delivery

230 - Linux Container and WebAssembly

230 - Container Standards and Utilities

  • Containerization - A form of operating-system-level virtualization
  • Linux Distros for Containers
    • Alpine Linux - A security-oriented, lightweight Linux distribution based on musl libc and busybox
      • apk-tools - A package manager originally built for Alpine Linux
    • Flatcar Container Linux - An immutable Linux distribution for containers
  • Utilities in Containers
    • busybox - A single small executable that combines tiny versions of many common UNIX utilities
  • The Open Container Initiative (OCI) - An open governance structure for the express purpose of creating open industry standards around container formats and runtimes
  • Containers for Development
    • Development Containers - An open specification for enriching containers with development-specific settings, tools, and configuration

231 - Container Runtimes and Tools

  • Container Engines
    • Docker Engine - An open source containerization technology for building and containerizing your applications
      • docker-compose - A tool for defining and running multi-container Docker applications
    • containerd - An industry-standard container runtime with an emphasis on simplicity, robustness and portability
      • nerdctl - A Docker-compatible CLI for containerd
      • ctr - An unsupported debug and administrative client for interacting with the containerd daemon
    • podman - A powerful container engine for building, managing, and running containers and pods
  • Image Building Tools
    • Docker Build - A part of the Docker Engine that automates the process of creating a Docker image from a Dockerfile and a context
    • buildah - A tool that facilitates building Open Container Initiative (OCI) container images
    • Kaniko - A tool to build container images from a Dockerfile, inside a container or Kubernetes cluster
  • Image Inspection & Management Tools
    • skopeo - A command line utility that performs various operations on container images and image repositories
    • dive - A tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image
  • TUI & Helper Tools
    • lazydocker - A terminal UI for both docker and docker-compose
  • Local Environment Provisioners
    • Colima - A tool that provides container runtimes on macOS (and Linux) with minimal setup

232 - Container Registries

  • Container Registries
    • GitLab Container Registry - A secure and private registry for Docker images
    • Nexus Repository Manager 3 - A sophisticated repository manager
    • Amazon ECR - A fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts
    • Azure Container Registry - A private registry for managing container images and related artifacts
    • Harbor - An open source registry that secures artifacts with policies and role-based access control

234 - WebAssembly

  • Standards
    • WebAssembly - A binary instruction format for a stack-based virtual machine
    • WebAssembly System Interface (WASI) - A modular system interface for WebAssembly
    • WASIX - The long term stabilization and support of the existing WASI ABI plus additional non-invasive syscall extensions
  • WebAssembly Runtimes
    • wazero - The only zero dependency WebAssembly runtime written in Go
    • Wasmtime - A fast and secure runtime for WebAssembly
    • Wasmer - A blazing fast and secure WebAssembly runtime that enables incredibly lightweight containers to run anywhere

240 - Kubernetes Administration

240 - Core Kubernetes

  • Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications
  • Managed K8s Services
    • GKE - A managed, production-ready environment for deploying containerized applications
    • Azure Kubernetes Service - A managed container orchestration service based on the open source Kubernetes system
    • AWS EKS - A managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers
  • Architecture
    • Master node
      • kube-apiserver - Responsible for API services
      • kube-scheduler - Responsible for scheduling
      • kube-controller-manager - Responsible for container orchestration
    • Compute node
      • kubelet - watches the API server for pods on that node and makes sure they are running
      • cAdvisor - collects metrics about pods running on that particular node
      • kube-proxy - watches the API server for pods/services changes in order to maintain the network up to date
      • container runtime - responsible for managing container images and running containers on that node
  • Interface Standards
    • CNI (Container Networking Interface)
      • Calico - A networking and security solution that enables Kubernetes workloads and non-Kubernetes/legacy workloads to communicate seamlessly and securely
      • Cilium - An open source, cloud native solution for providing, securing, and observing network connectivity between workloads, fueled by the revolutionary Kernel technology eBPF
    • CSI (Container Storage Interface)
    • CRI (Container Runtime Interface)
      • cri-o - An implementation of the Kubernetes CRI (Container Runtime Interface) to enable using OCI (Open Container Initiative) compatible runtimes
      • cri-tools - A set of tools for CRI
  • Workloads - The objects you use to manage and run your containers on the cluster
    • Pod
      • assignment - The process of constraining a Pod so that it is restricted to run on particular nodes, or to prefer to run on particular nodes
      • taint and toleration - A mechanism that allows you to ensure that pods are not placed on inappropriate nodes
      • lifecycle - The lifecycle of a Pod
      • liveness probe - A probe the kubelet uses to know when to restart a container
      • requests and limits
      • eviction
    • Deployment, ReplicaSet, StatefulSet, DaemonSet
  • Services, Load Balancing & Networking
    • Kubernetes network model - A set of fundamental requirements and principles for networking in a Kubernetes cluster
    • Service, Ingress, Ingress Controllers
  • Storage - A powerful volume subsystem with an API that abstracts how storage is provided and consumed
    • PersistentVolume, PVC, StorageClass
  • Configuration - A range of mechanisms that let you inject configuration data into the Pods that run your applications
    • Secret, ConfigMap
  • Security & Policy
    • Kubernetes RBAC - A method of regulating access to computer or network resources based on the roles of individual users within an enterprise
    • PodDisruptionBudget - An object that limits the number of concurrent disruptions that your application experiences, allowing for high availability
    • Security context - A definition of privilege and access control settings for a Pod or Container
  • Autoscaling
    • HPA - The component that automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
    • Cluster Autoscaler - A tool that automatically adjusts the size of the Kubernetes cluster

241 - Kubernetes Ecosystem

  • Application Packaging & Configuration
    • Helm - The package manager for Kubernetes
    • Kustomize - A standalone tool to customize Kubernetes objects through a kustomization file
  • Developer Workflow Tools
    • Skaffold - A command line tool that facilitates continuous development for container-based applications
  • Platform Extensions
    • kube-fencing - A solution for fencing of stateful application's nodes in kubernetes
    • KubeVirt - A virtual machine management add-on for Kubernetes
  • Operator & Controller Development
    • Kubebuilder - A framework for building Kubernetes APIs using custom resource definitions (CRDs)
  • CLI Plugin Management
    • Krew - The plugin manager for kubectl command-line tool
      • kubectl-node-shell - A kubectl plugin to run a root shell on a node
      • kubectl-tree - A kubectl plugin to explore ownership relationships between Kubernetes objects
      • kubectl-pod-inspect - A kubectl plugin to view pod and container status at a glance
      • kubepug - A pre-flight checking tool for Kubernetes APIs
      • rakkess - A kubectl plugin to show an access matrix for all available resources
      • ketall - A kubectl plugin to get all resources
  • Resource Optimization
    • Goldilocks - A utility that can help you identify a starting point for resource requests and limits
  • Vendor-specific Tools
    • eksctl - The official CLI for Amazon EKS
  • Dashboards
    • Kubernetes Lens IDE - The Kubernetes IDE
    • k9s - A terminal based UI to interact with your Kubernetes cluster
    • KDash - A simple terminal dashboard for Kubernetes built with Rust
    • k1s - A minimalistic Kubernetes dashboard
    • Seabird - The native desktop app that simplifies working with Kubernetes
    • Headlamp - A user-friendly Kubernetes UI focused on extensibility
  • Local K8s
    • Minikube - A tool that lets you run Kubernetes locally
    • Kind - A tool for running local Kubernetes clusters using Docker container “nodes”
  • FaaS on K8s
    • OpenFaaS - A framework that makes it easy for developers to deploy event-driven functions and microservices to Kubernetes
    • Knative - A Kubernetes-based platform to build, deploy, and manage modern serverless workloads
  • K8s Operators
    • Prometheus Operator - The operator that creates/configures/manages Prometheus clusters atop Kubernetes
      • kube-prometheus - A collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring
    • OpenTelemetry Operator - An implementation of a Kubernetes Operator for OpenTelemetry
    • Elastic Cloud on Kubernetes (ECK) - The official operator for the Elastic Stack on Kubernetes
    • Rook - An open source cloud-native storage orchestrator for Kubernetes

250 - IaC, Continuous Delivery, and Operations

251 - Infrastructure and Configuration as Code

  • Infrastructure as Code
    • Hashicorp Terraform - An infrastructure as code tool that lets you build, change, and version infrastructure safely and efficiently
    • OpenTofu - An open-source alternative to Terraform
    • Pulumi - An infrastructure as code platform that allows you to use familiar programming languages and tools to build, deploy, and manage cloud infrastructure
  • Configuration Management & Automation
    • Ansible - An open source IT automation engine that automates provisioning, configuration management, application deployment, orchestration, and many other IT processes
    • cloud-init - The standard for customising cloud instances
  • Image Building
    • Hashicorp Packer - A tool for creating identical machine images for multiple platforms from a single source configuration
  • Terraform/OpenTofu Ecosystem
    • Terraform/OpenTofu Provider: Core Functions - A Terraform/OpenTofu provider for performing core functions
    • TerraGrant - A thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state
    • TerraTest - A Go library that provides patterns and helper functions for testing infrastructure
    • Atmos - A universal tool for DevOps and Cloud Engineering that orchestrates workflows and simplifies the management of infrastructure
    • GitLab-managed Terraform/OpenTofu state - A feature that allows you to store your Terraform state files in GitLab
    • tf.libsonnet - A collection of Jsonnet libraries for generating Terraform code
    • terraform-docs - A utility to generate documentation from Terraform modules in various output formats
    • Terraformer - A CLI tool to generate terraform files from existing infrastructure
  • Vender-specific Tools
    • AWS CloudFormation - A service that helps you model and set up your Amazon Web Services resources
    • AWS CDK - An open source software development framework to define your cloud application resources using familiar programming languages
    • AWS SAM - An open-source framework for building serverless applications
    • Azure Resource Manager - The deployment and management service for Azure
      • Bicep language - A domain-specific language (DSL) that uses declarative syntax to deploy Azure resources

252 - Continuous Delivery

Note: Many package registries support multiple artifact types, including container images (see 232) and OS packages.

  • Continuous Delivery Tools
    • Jenkins - An open source automation server which enables developers around the world to reliably build, test, and deploy their software
    • GitLab CI/CD - A part of GitLab that you can use to automate the builds, integration, and verification of your source code
    • GitHub Actions - A feature that makes it easy to automate all your software workflows
    • Concourse CI - An automation system written in Go
    • Azure Pipelines - A cloud service that you can use to automatically build and test your code project and make it available to other users
  • GitOps Style CD
    • ArgoCD - A declarative, GitOps continuous delivery tool for Kubernetes
    • FluxCD - A tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to configuration when there is new code to deploy
  • Cloud Native Application Delivery
    • Open Application Model - A specification for describing applications so that they can be deployed and managed across any platform
    • KubeVela - A modern software delivery platform that makes deploying and operating applications across today's hybrid, multi-cloud environments easier, faster and more reliable
    • Flagger - A progressive delivery tool that automates the release process for applications running on Kubernetes
  • Terraform Integration
    • Atrantis - A self-hosted golang application that listens for Terraform pull request events via webhooks
  • Private Package Registries
    • GitLab Package Registry - A feature that allows you to publish and share packages for a variety of supported package managers
    • GitHub Packages - A software package hosting service that allows you to host your software packages privately or publicly
    • Nexus Repository Manager 3 - A sophisticated repository manager
    • Azure Artifacts - A service that enables you to create and share Maven, npm, NuGet, and Python package feeds from public and private sources
  • Version Conventions
    • Semantic Versioning - A simple set of rules and requirements that dictate how version numbers are assigned and incremented
      • semver - A semantic versioner for npm

253 - Fleet Management & Operations

Please see also the Security class.

  • Fleet Management
    • AWS Systems Manager - A secure end-to-end management solution for resources on AWS and in multicloud and hybrid environments
    • Azure Automation - A cloud-based automation and configuration service that supports consistent management across your Azure and non-Azure environments
  • Backup
    • Vendor-specific Tools
      • AWS Backup - A fully managed service that centralizes and automates data protection across AWS services, in the cloud, and on premises
      • Azure Backup - A service that provides simple, secure, and cost-effective solutions to back up your data and recover it from the Microsoft Azure cloud
    • K8s-specific Tools
      • Velero - An open source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes
    • Generic
      • Restic - A fast, secure, efficient backup program
  • Runbook Automation
    • RunDeck - An open source automation platform that helps you automate routine operational procedures in data center or cloud environments
    • SaltStack - A Python-based, open-source software for event-driven IT automation, remote task execution, and configuration management

260 - System Testing, Chaos Engineering, and FinOps

261 - Performance & Load Testing

  • Concepts
    • Performance Testing - The practice of evaluating how a system performs in terms of responsiveness and stability under a particular workload
  • Performance Testing Tools
    • Grafana k6 - The open-source load testing tool that makes performance testing easy and productive for engineering teams
    • Gatling - The load testing tool for programmers that helps engineering teams shift performance concerns left
    • Apache Jmeter - A pure Java application designed to load test functional behavior and measure performance
    • ab - A tool for benchmarking your Apache Hypertext Transfer Protocol (HTTP) server
    • stress-ng - A tool that imposes configurable amounts of CPU, memory, I/O, and disk stress on the system
    • sysbench - A scriptable multi-threaded benchmark tool based on LuaJIT
    • fio - A tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user
    • iPerf - The ultimate speed test tool for TCP, UDP and SCTP
    • plow - A high-performance HTTP benchmarking tool

262 - Chaos Engineering

  • Concepts
    • Chaos Engineering - The practice of experimenting on a system in order to build confidence in the system's capability to withstand turbulent conditions in production
    • Principles of chaos engineering - The principles that define the practice of chaos engineering
  • Chaos Engineering Tools
    • Chaos Monkey - A resiliency tool that helps applications tolerate random instance failures
    • Litmus - A cloud-native chaos engineering framework for Kubernetes
    • Chaos Mesh - A cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments
    • Toxiproxy - A TCP proxy to simulate network and system conditions for chaos and resiliency testing

263 - FinOps

  • Concepts
    • FinOps principles - The cultural practice of bringing financial accountability to the variable spend model of cloud
  • FinOps Tools
    • FinOps toolkit - A collection of tools, resources, and best practices for implementing FinOps in your organization
    • AWS Cost Explorer - A tool that enables you to view and analyze your costs and usage
    • OpenCost - The open source solution for monitoring Kubernetes spend
    • Karpenter - A flexible, high-performance Kubernetes cluster autoscaler
    • Cloud Custodian - A rules engine for managing public cloud accounts and resources

270 - System Observability

270 - Common Concepts and Software

  • Concepts
    • Observability - A measure of how well internal states of a system can be inferred from knowledge of its external outputs
  • Instrumentation Libraries
    • OpenTelemetry - A vendor-neutral open source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs
    • Micrometer - A metrics instrumentation library for JVM-based applications
  • Tools
    • Uptime Kuma - An easy-to-use self-hosted monitoring tool

271 - Telemetry Shipment

  • Data Shippers
    • Prometheus exporters - The services that expose Prometheus metrics
      • node-exporter - An exporter for hardware and OS metrics exposed by *NIX kernels
      • blackbox-exporter - A tool that allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP and gRPC
    • Grafana Alloy - An open source OpenTelemetry collector with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles
    • Fluent Bit - A super fast, lightweight, and highly scalable logging, metrics, and traces processor and forwarder
    • Fluentd - An open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data
    • Filebeat - A lightweight shipper for forwarding and centralizing log data
    • Logstash - An open source server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash"
    • Telegraf - An open source server agent that helps you collect metrics from your stacks, sensors, and systems
    • Metricbeat - A lightweight shipper that you can install on your servers to periodically collect metrics from the operating system and from services running on the server
    • rsyslog - The rocket-fast system for log processing
  • Vendor-specific Tools
    • Azure Monitor Agent - The agent that collects monitoring data from the guest operating system of Azure and hybrid virtual machines
    • Cloudwatch Agent - The agent you can use to collect both system-level metrics and log files from Amazon EC2 instances and on-premises servers

272 - Telemetry Collection

  • Datastore and Alerting Tools
    • Prometheus - An open-source systems monitoring and alerting toolkit
      • PromQL - The Prometheus Query Language
      • promtool - The command line utility for the Prometheus server
    • Alertmanager - A tool that handles alerts sent by client applications such as the Prometheus server
      • amtool - A cli tool for interacting with the Alertmanager API
    • InfluxDB - A time series database built from the ground up to handle high write and query loads
      • InfluxQL - An SQL-like query language for interacting with data in InfluxDB
      • influx cli - The command line interface for InfluxDB 2.0
    • Grafana Mimir - An open source, horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus
    • Grafana Loki - A horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus
      • LogQL - The query language for Loki
        • LogCLI - The command line interface for Loki
    • Grafana Tempo - An open source, easy-to-use and high-scale distributed tracing backend
      • TraceQL - A query language designed for selecting traces
    • ElasticSearch - An open source distributed, RESTful search and analytics engine, scalable data store, and vector database
      • Elastic Common Schema - An open source specification, developed with support from the Elastic user community
      • Ingest pipelines - A feature that lets you perform common transformations on your data before indexing
      • Dissect and Grok - The processors that let you extract structured fields out of a single text field
    • Graphite - A highly scalable real-time graphing system
    • Grafana Alerting - A feature that allows you to create and manage alerts for your data
    • OpenObserve - An open-source observability platform designed for modern applications
  • Vendor-specific Tools
    • Azure Monitor - A comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments
      • Kusto Query Language - A powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical models, and more
      • App Insights - A feature of Azure Monitor, is an extensible Application Performance Management (APM) service for developers and DevOps professionals
    • AWS CloudWatch - A monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers
  • Visualization Tools
    • Grafana - The open source data visualization and monitoring solution
      • Grafonnet - A Jsonnet library for generating Grafana dashboards
    • Kibana - A free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack