Big data - Data sets that are too large or complex to be dealt with by traditional data-processing application software
Data model - An abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities
Data orientation - A perspective of data that emphasizes the data itself, rather than the applications that use the data
DIKW pyramid - A class of models representing purported structural and/or functional relationships between data, information, knowledge, and wisdom
Garbage in, garbage out - A concept in computer science and information and communications technology that the quality of the output is determined by the quality of the input
Data cleansing - The process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database
Concurrency control - The mechanism ensuring that correct results for concurrent operations are generated efficiently
CRUD operations - The four basic operations of persistent storage: create, read, update, and delete
Shard - A horizontal partition of data in a database or search engine
ETL - A three-phase process where data is extracted from an input source, transformed, and loaded into an output data container
ELT - A data integration process where raw data is moved from a source system to a destination resource, such as a data warehouse, and then transformed for use
Distributed Computing - A field of computer science that studies distributed systems
Single point of failure - A part of a system that, if it fails, will stop the entire system from working
Fault tolerance - The property that enables a system to continue operating properly in the event of the failure of some of its components
Load balancing - The process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient
Fallacies of distributed computing - A set of assertions describing false assumptions that programmers new to distributed applications invariably make
Byzantine fault - A condition of a distributed system, where components may fail and there is imperfect information about whether a component has failed
Consensus - A fault-tolerant mechanism that is used in distributed systems to achieve the necessary agreement on a single data value among distributed processes or systems
CAP theorem - A theorem stating that any distributed data store can provide only two of the following three guarantees: Consistency, Availability, and Partition tolerance
BASE properties - A database model that prioritizes availability over consistency
Algebra - A branch of mathematics that deals with abstract systems, known as algebraic structures, and the manipulation of expressions within those systems