Data binning, also called discrete binning or bucketing, is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a bin, are replaced by a value representative of that interval, often the central value. It is a form of quantization.
Statistical data binning is a way to group numbers of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). It can also be used in multivariate statistics, binning in several dimensions at once.
In the context of image processing, binning is the procedure of combining a cluster of pixels into a single pixel. As such, in 2x2 binning, an array of 4 pixels becomes a single larger pixel, reducing the overall number of pixels.
This aggregation, although associated with loss of information, reduces the amount of data to be processed, facilitating the analysis. For instance, binning the data may also reduce the impact of read noise on the processed image (at the cost of a lower resolution).
Data binning may be used when small instrumental shifts in the spectral dimension from mass spectrometry (MS) or nuclear magnetic resonance (NMR) experiments will be falsely interpreted as representing different components, when a collection of data profiles is subjected to pattern recognition analysis. A straightforward way to cope with this problem is by using binning techniques in which the spectrum is reduced in resolution to a sufficient degree to ensure that a given peak remains in its bin despite small spectral shifts between analyses. For example, in NMR the chemical shift axis may be discretized and coarsely binned, and in MS the spectral accuracies may be rounded to integer atomic mass unit values. Also, several digital camera systems incorporate an automatic pixel binning function to improve image contrast.
Binning is also used in machine learning to speed up the decision-tree boosting method for supervised classification and regression in algorithms such as Microsoft's LightGBM and scikit-learn's Histogram-based Gradient Boosting Classification Tree.