Manifold hypothesis

Summary

The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space.[1][2][3][4] As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, likened to the local coordinate system of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features.

A visualization of 30,000,000 integers coloured by prime divisibility. This visualization was created using uniform manifold approximation and projection (UMAP), a nonlinear dimensionality reduction method.

The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization.

The major implications of this hypothesis is that

  • Machine learning models only have to fit relatively simple, low-dimensional, highly structured subspaces within their potential input space (latent manifolds).
  • Within one of these manifolds, it’s always possible to interpolate between two inputs, that is to say, morph one into another via a continuous path along which all points fall on the manifold.

The ability to interpolate between samples is the key to generalization in deep learning.[5]

The information geometry of statistical manifolds edit

An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the efficient coding hypothesis, predictive coding and variational Bayesian methods.

The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the Fisher information metric.[6] In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the big data regime this manifold generally exhibits certain properties such as homeostasis:

  1. We can sample large amounts of data from the underlying generative process.
  2. Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity.

In a sense made precise by theoretical neuroscientists working on the free energy principle, the statistical manifold in question possesses a Markov blanket.[7]

The Tower of Babel Paradox edit

 
"Babel", M. C. Escher

How do global codes emerge from local codes, given that a global encoding is necessary for the emergence of synchronization in large neural networks?

A fundamental challenge facing any biologically-plausible Manifold Learning algorithm is Romain Brette's Tower of Babel Paradox[8] for the efficient coding hypothesis:

The efficient coding hypothesis stipulates that neurons encode signals into spike trains in an efficient way, that is, it uses a code such that all redundancy is removed from the original message while preserving information, in the sense that the encoded message can be mapped back to the original message (Barlow, 1961; Simoncelli, 2003). This implies that with a perfectly efficient code, encoded messages are indistinguishable from random. Since the code is determined on the statistics of the inputs and only the encoded messages are communicated, a code is efficient to the extent that it is not understandable by the receiver. This is the paradox of the efficient code.

In the neural coding metaphor, the code is private and specific to each neuron. If we follow this metaphor, this means that all neurons speak a different language, a language that allows expressing concepts very concisely but that no one else can understand. Thus, according to the coding metaphor, the brain is a Tower of Babel.

The predictive and efficient coding theses predict that each neuron has its own efficient code derived via maximum entropy inference. We may think of these local codes as distinct languages. Brette then surmises that since each neuron has a language indistinguishable from random relative to its neighbors, then a global encoding should be impossible.

References edit

  1. ^ Gorban, A. N.; Tyukin, I. Y. (2018). "Blessing of dimensionality: mathematical foundations of the statistical physics of data". Phil. Trans. R. Soc. A. 15 (3): 20170237. Bibcode:2018RSPTA.37670237G. doi:10.1098/rsta.2017.0237. PMC 5869543. PMID 29555807.
  2. ^ Cayton, L. (2005). Algorithms for manifold learning (PDF) (Technical report). University of California at San Diego. p. 1. 12(1–17).
  3. ^ Fefferman, Charles; Mitter, Sanjoy; Narayanan, Hariharan (2016-02-09). "Testing the manifold hypothesis". Journal of the American Mathematical Society. 29 (4): 983–1049. arXiv:1310.0425. doi:10.1090/jams/852. S2CID 50258911.
  4. ^ Olah, Christopher (2014). "Blog: Neural Networks, Manifolds, and Topology".
  5. ^ Chollet, Francois (2021). Deep Learning with Python (2nd ed.). Manning. pp. 128–129. ISBN 9781617296864.
  6. ^ Caticha, Ariel (2015). Geometry from Information Geometry. MaxEnt 2015, the 35th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering. arXiv:1512.09076.
  7. ^ Kirchhoff, Michael; Parr, Thomas; Palacios, Ensor; Friston, Karl; Kiverstein, Julian (2018). "The Markov blankets of life: autonomy, active inference and the free energy principle". J. R. Soc. Interface. 15 (138): 20170792. doi:10.1098/rsif.2017.0792. PMC 5805980. PMID 29343629.
  8. ^ Brette, R. 07 December 2017. The Paradox of the Efficient Code and the Neural Tower of Babel [Blog post]. Retrieved from http://romainbrette.fr/what-is-computational-neuroscience-xxvii-the-paradox-of-the-efficient-code-and-the-neural-tower-of-babel/

Further reading edit

  • Brown, Bradley C. A.; Caterini, Anthony L.; Ross, Brendan Leigh; Cresswell, Jesse C.; Loaiza-Ganem, Gabriel (2023). The Union of Manifolds Hypothesis and its Implications for Deep Generative Modelling. The Eleventh International Conference on Learning Representations. arXiv:2207.02862.
  • Lee, Yonghyeon (2023). A Geometric Perspective on Autoencoders. arXiv:2309.08247.