In Bayesian statistics, Laplace approximation can refer to either approximating the posterior normalizing constant with Laplace's method or approximating the posterior distribution with a Gaussian centered at the maximum a posteriori estimate.
Suppose the function has a unique global maximum at x0. Let be a constant and consider the following two functions:
Note that x0 will be the global maximum of and as well. Now observe:
As M increases, the ratio for will grow exponentially, while the ratio for does not change. Thus, significant contributions to the integral of this function will come only from points x in a neighbourhood of x0, which can then be estimated.
To state and motivate the method, we need several assumptions. We will assume that x0 is not an endpoint of the interval of integration, that the values cannot be very close to unless x is close to x0, and that
We can expand around x0 by Taylor's theorem,
where (see: big O notation).
Since has a global maximum at x0, and since x0 is not an endpoint, it is a stationary point, so the derivative of vanishes at x0. Therefore, the function may be approximated to quadratic order
for x close to x0 (recall ). The assumptions ensure the accuracy of the approximation
(see the picture on the right). This latter integral is a Gaussian integral if the limits of integration go from −∞ to +∞ (which can be assumed because the exponential decays very fast away from x0), and thus it can be calculated. We find
A generalization of this method and extension to arbitrary precision is provided by Fog (2008).
Suppose is a twice continuously differentiable function on and there exists a unique point such that:
This method relies on 4 basic concepts such as
Based on these four concepts, we can derive the relative error of this Laplace's method.
Laplace's approximation is sometimes written as
where is positive.
Importantly, the accuracy of the approximation depends on the variable of integration, that is, on what stays in and what goes into .
In the multivariate case where is a -dimensional vector and is a scalar function of , Laplace's approximation is usually written as:
In extensions of Laplace's method, complex analysis, and in particular Cauchy's integral formula, is used to find a contour of steepest descent for an (asymptotically with large M) equivalent integral, expressed as a line integral. In particular, if no point x0 where the derivative of vanishes exists on the real line, it may be necessary to deform the integration contour to an optimal one, where the above analysis will be possible. Again the main idea is to reduce, at least asymptotically, the calculation of the given integral to that of a simpler integral that can be explicitly evaluated. See the book of Erdelyi (1956) for a simple discussion (where the method is termed steepest descents).
The appropriate formulation for the complex z-plane is
for a path passing through the saddle point at z0. Note the explicit appearance of a minus sign to indicate the direction of the second derivative: one must not take the modulus. Also note that if the integrand is meromorphic, one may have to add residues corresponding to poles traversed while deforming the contour (see for example section 3 of Okounkov's paper Symmetric functions and random partitions).
An extension of the steepest descent method is the so-called nonlinear stationary phase/steepest descent method. Here, instead of integrals, one needs to evaluate asymptotically solutions of Riemann–Hilbert factorization problems.
Given a contour C in the complex sphere, a function defined on that contour and a special point, say infinity, one seeks a function M holomorphic away from the contour C, with prescribed jump across C, and with a given normalization at infinity. If and hence M are matrices rather than scalars this is a problem that in general does not admit an explicit solution.
An asymptotic evaluation is then possible along the lines of the linear stationary phase/steepest descent method. The idea is to reduce asymptotically the solution of the given Riemann–Hilbert problem to that of a simpler, explicitly solvable, Riemann–Hilbert problem. Cauchy's theorem is used to justify deformations of the jump contour.
The nonlinear stationary phase was introduced by Deift and Zhou in 1993, based on earlier work of Its. A (properly speaking) nonlinear steepest descent method was introduced by Kamvissis, K. McLaughlin and P. Miller in 2003, based on previous work of Lax, Levermore, Deift, Venakides and Zhou. As in the linear case, "steepest descent contours" solve a min-max problem. In the nonlinear case they turn out to be "S-curves" (defined in a different context back in the 80s by Stahl, Gonchar and Rakhmanov).
In the generalization, evaluation of the integral is considered equivalent to finding the norm of the distribution with density
Denoting the cumulative distribution , if there is a diffeomorphic Gaussian distribution with density
the norm is given by
and the corresponding diffeomorphism is
where denotes cumulative standard normal distribution function.
In general, any distribution diffeomorphic to the Gaussian distribution has density
and the median-point is mapped to the median of the Gaussian distribution. Matching the logarithm of the density functions and their derivatives at the median point up to a given order yields a system of equations that determine the approximate values of and .
The approximation was introduced in 2019 by D. Makogon and C. Morais Smith primarily in the context of partition function evaluation for a system of interacting fermions.
For complex integrals in the form:
with we make the substitution t = iu and the change of variable to get the bilateral Laplace transform:
Laplace's method can be used to derive Stirling's approximation
for a large integer N.
From the definition of the Gamma function, we have
Now we change variables, letting so that Plug these values back in to obtain
This integral has the form necessary for Laplace's method with
which is twice-differentiable:
The maximum of lies at z0 = 1, and the second derivative of has the value −1 at this point. Therefore, we obtain
This article incorporates material from saddle point approximation on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.