Scalability is the property of a system to handle a growing amount of work by adding resources to the system.
In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a package delivery system is scalable because more packages can be delivered by adding more delivery vehicles. However, if all packages had to first pass through a single warehouse for sorting, the system would not be as scalable, because one warehouse can handle only a limited number of packages.
In computing, scalability is a characteristic of computers, networks, algorithms, networking protocols, programs and applications. An example is a search engine, which must support increasing numbers of users, and the number of topics it indexes. Webscale is a computer architectural approach that brings the capabilities of large-scale cloud computing companies into enterprise data centers.
In distributed systems, there are several definitions according to the authors, some considering the concepts of scalability a sub-part of elasticity, others as being distinct.
In mathematics, scalability mostly refers to closure under scalar multiplication.
In industrial engineering and manufacturing ..
The Incident Command System (ICS) is used by emergency response agencies in the United States. ICS can scale resource coordination from a single-engine roadside brushfire to an interstate wildfire. The first resource on scene establishes command, with authority to order resources and delegate responsibility (managing five to seven officers, who will again delegate to up to seven, and on as the incident grows). As an incident expands, more senior officers assume command.
Scalability can be measured over multiple dimensions, such as:
Resources fall into two broad categories: horizontal and vertical.
Scaling horizontally (out/in) means adding more nodes to (or removing nodes from) a system, such as adding a new computer to a distributed software application. An example might involve scaling out from one web server to three. High-performance computing applications, such as seismic analysis and biotechnology, scale workloads horizontally to support tasks that once would have required expensive supercomputers. Other workloads, such as large social networks, exceed the capacity of the largest supercomputer and can only be handled by scalable systems. Exploiting this scalability requires software for efficient resource management and maintenance.
Scaling vertically (up/down) means adding resources to (or removing resources from) a single node, typically involving the addition of CPUs, memory or storage to a single computer.
Larger numbers of elements increases management complexity, more sophisticated programming to allocate tasks among resources and handle issues such as throughput and latency across nodes, while some applications do not scale horizontally.
Network function virtualization defines these terms differently: scaling out/in is the ability to scale by adding/removing resource instances (e.g., virtual machine), whereas scaling up/down is the ability to scale by changing allocated resources (e.g., memory/CPU/storage capacity).
Scalability for databases requires that the database system be able to perform additional work given greater hardware resources, such as additional servers, processors, memory and storage. Workloads have continued to grow and demands on databases have followed suit.
Algorithmic innovations have include row-level locking and table and index partitioning. Architectural innovations include shared-nothing and shared-everything architectures for managing multi-server configurations.
In the context of scale-out data storage, scalability is defined as the maximum storage cluster size which guarantees full data consistency, meaning there is only ever one valid version of stored data in the whole cluster, independently from the number of redundant physical data copies. Clusters which provide "lazy" redundancy by updating copies in an asynchronous fashion are called 'eventually consistent'. This type of scale-out design is suitable when availability and responsiveness are rated higher than consistency, which is true for many web file-hosting services or web caches (if you want the latest version, wait some seconds for it to propagate). For all classical transaction-oriented applications, this design should be avoided.
Many open-source and even commercial scale-out storage clusters, especially those built on top of standard PC hardware and networks, provide eventual consistency only. Idem some NoSQL databases like CouchDB and others mentioned above. Write operations invalidate other copies, but often don't wait for their acknowledgements. Read operations typically don't check every redundant copy prior to answering, potentially missing the preceding write operation. The large amount of metadata signal traffic would require specialized hardware and short distances to be handled with acceptable performance (i.e., act like a non-clustered storage device or database).
Whenever strong data consistency is expected, look for these indicators:
Indicators for eventually consistent designs (not suitable for transactional applications!) are:
It is often advised to focus system design on hardware scalability rather than on capacity. It is typically cheaper to add a new node to a system in order to achieve improved performance than to partake in performance tuning to improve the capacity that each node can handle. But this approach can have diminishing returns (as discussed in performance engineering). For example: suppose 70% of a program can be sped up if parallelized and run on multiple CPUs instead of one. If is the fraction of a calculation that is sequential, and is the fraction that can be parallelized, the maximum speedup that can be achieved by using P processors is given according to Amdahl's Law:
Substituting the value for this example, using 4 processors gives
Doubling the computing power to 8 processors gives
Doubling the processing power has only sped up the process by roughly one-fifth. If the whole problem was parallelizable, the speed would also double. Therefore, throwing in more hardware is not necessarily the optimal approach.
High performance computing has two common notions of scalability: