BREAKING NEWS

## Summary

In mathematics, specifically differential calculus, the inverse function theorem gives a sufficient condition for a function to be invertible in a neighborhood of a point in its domain: namely, that its derivative is continuous and non-zero at the point. The theorem also gives a formula for the derivative of the inverse function. In multivariable calculus, this theorem can be generalized to any continuously differentiable, vector-valued function whose Jacobian determinant is nonzero at a point in its domain, giving a formula for the Jacobian matrix of the inverse. There are also versions of the inverse function theorem for complex holomorphic functions, for differentiable maps between manifolds, for differentiable functions between Banach spaces, and so forth.

## Statement

For functions of a single variable, the theorem states that if $f$  is a continuously differentiable function with nonzero derivative at the point a; then $f$  is invertible in a neighborhood of a, the inverse is continuously differentiable, and the derivative of the inverse function at $b=f(a)$  is the reciprocal of the derivative of $f$  at $a$ :

${\bigl (}f^{-1}{\bigr )}'(b)={\frac {1}{f'(a)}}={\frac {1}{f'(f^{-1}(b))}}.$

An alternate version, which assumes that $f$  is continuous and injective near a, and differentiable at a with a non-zero derivative, will also result in $f$  being invertible near a, with an inverse that's similarly continuous and injective , and where the above formula would apply as well.

As a corollary, we see clearly that if $f$  is $k$ -th differentiable, with nonzero derivative at the point a, then $f$  is invertible in a neighborhood of a, the inverse is also $k$ -th differentiable. Here $k$  is a positive integer or $\infty$ .

For functions of more than one variable, the theorem states that if F is a continuously differentiable function from an open set of $\mathbb {R} ^{n}$  into $\mathbb {R} ^{n}$ , and the total derivative is invertible at a point p (that is, the Jacobian determinant of F at p is non-zero), then F is invertible near p: an inverse function to F is defined on some neighborhood of $q=F(p)$ . Writing $F=(F_{1},\ldots ,F_{n})$ , this means that the system of n equations $y_{i}=F_{i}(x_{1},\dots ,x_{n})$  has a unique solution for $x_{1},\dots ,x_{n}$  in terms of $y_{1},\dots ,y_{n}$ , provided that we restrict x and y to small enough neighborhoods of p and q, respectively.

Finally, the theorem says that the inverse function $F^{-1}$  is continuously differentiable, and its Jacobian derivative at $q=F(p)$  is the matrix inverse of the Jacobian of F at p:

$J_{F^{-1}}(q)=[J_{F}(p)]^{-1}.$

The hard part of the theorem is the existence and differentiability of $F^{-1}$ . Assuming this, the inverse derivative formula follows from the chain rule applied to $F^{-1}\circ F={\text{id}}$ :
$I=J_{F^{-1}\circ F}(p)\ =\ J_{F^{-1}}(F(p))\cdot J_{F}(p)\ =\ J_{F^{-1}}(q)\cdot J_{F}(p).$

## Example

Consider the vector-valued function $F:\mathbb {R} ^{2}\to \mathbb {R} ^{2}\!$  defined by:

$F(x,y)={\begin{bmatrix}{e^{x}\cos y}\\{e^{x}\sin y}\\\end{bmatrix}}.$

The Jacobian matrix is:

$J_{F}(x,y)={\begin{bmatrix}{e^{x}\cos y}&{-e^{x}\sin y}\\{e^{x}\sin y}&{e^{x}\cos y}\\\end{bmatrix}}$

with Jacobian determinant:

$\det J_{F}(x,y)=e^{2x}\cos ^{2}y+e^{2x}\sin ^{2}y=e^{2x}.\,\!$

The determinant $e^{2x}\!$  is nonzero everywhere. Thus the theorem guarantees that, for every point p in $\mathbb {R} ^{2}\!$ , there exists a neighborhood about p over which F is invertible. This does not mean F is invertible over its entire domain: in this case F is not even injective since it is periodic: $F(x,y)=F(x,y+2\pi )\!$ .

## Counter-example

The function $f(x)=x+2x^{2}\sin({\tfrac {1}{x}})$  is bounded inside a quadratic envelope near the line $y=x$ , so $f'(0)=1$ . Nevertheless, it has local max/min points accumulating at $x=0$ , so it is not one-to-one on any surrounding interval.

If one drops the assumption that the derivative is continuous, the function no longer need be invertible. For example $f(x)=x+2x^{2}\sin({\tfrac {1}{x}})$  and $f(0)=0$  has discontinuous derivative $f'\!(x)=1-2\cos({\tfrac {1}{x}})+4x\sin({\tfrac {1}{x}})$  and $f'\!(0)=1$ , which vanishes arbitrarily close to $x=0$ . These critical points are local max/min points of $f$ , so $f$  is not one-to-one (and not invertible) on any interval containing $x=0$ . Intuitively, the slope $f'\!(0)=1$  does not propagate to nearby points, where the slopes are governed by a weak but rapid oscillation.

## Methods of proof

As an important result, the inverse function theorem has been given numerous proofs. The proof most commonly seen in textbooks relies on the contraction mapping principle, also known as the Banach fixed-point theorem (which can also be used as the key step in the proof of existence and uniqueness of solutions to ordinary differential equations).

Since the fixed point theorem applies in infinite-dimensional (Banach space) settings, this proof generalizes immediately to the infinite-dimensional version of the inverse function theorem (see Generalizations below).

An alternate proof in finite dimensions hinges on the extreme value theorem for functions on a compact set.

Yet another proof uses Newton's method, which has the advantage of providing an effective version of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible.

### A proof using successive approximation

The inverse function theorem states that if $f$  is a C1 vector-valued function on an open set $U$ , then $\det f^{\prime }(a)\neq 0$  if and only if there is a C1 vector-valued function $g$  defined near $b=f(a)$  with $g(f(x))=x$  near $a$  and $f(g(y))=y$  near $b$ . This was first established by Picard and Goursat using an iterative scheme: the basic idea is to prove a fixed point theorem using the contraction mapping theorem. Taking derivatives, it follows that $g^{\prime }(y)=f^{\prime }(g(y))^{-1}$ .

The chain rule implies that the matrices $f^{\prime }(a)$  and $g^{\prime }(b)$  are each inverses. Continuity of $f$  and $g$  means that they are homeomorphisms that are each inverses locally. To prove existence, it can be assumed after an affine transformation that $f(0)=0$  and $f^{\prime }(0)=I$ , so that $a=b=0$ .

By the fundamental theorem of calculus if $u$  is a C1 function, ${\textstyle u(1)-u(0)=\int _{0}^{1}u^{\prime }(t)\,dt}$ , so that ${\textstyle \|u(1)-u(0)\|\leq \sup _{0\leq t\leq 1}\|u^{\prime }(t)\|}$ . Setting $u(t)=f(x+t(x^{\prime }-x))-x-t(x^{\prime }-x)$ , it follows that

$\|f(x)-f(x^{\prime })-x+x^{\prime }\|\leq \|x-x^{\prime }\|\,\sup _{0\leq t\leq 1}\|f^{\prime }(x+t(x^{\prime }-x))-I\|.$

Now choose $\delta >0$  so that ${\textstyle \|f'(x)-I\|<{1 \over 2}}$  for $\|x\|<\delta$ . Suppose that $\|y\|<\delta /2$  and define $x_{n}$  inductively by $x_{0}=0$  and $x_{n+1}=x_{n}+y-f(x_{n})$ . The assumptions show that if $\|x\|,\,\,\|x^{\prime }\|<\delta$  then

$\|f(x)-f(x^{\prime })-x+x^{\prime }\|\leq \|x-x^{\prime }\|/2$ .

In particular $f(x)=f(x^{\prime })$  implies $x=x^{\prime }$ . In the inductive scheme $\|x_{n}\|<\delta$  and $\|x_{n+1}-x_{n}\|<\delta /2^{n}$ . Thus $(x_{n})$  is a Cauchy sequence tending to $x$ . By construction $f(x)=y$  as required.

To check that $g=f^{-1}$  is C1, write $g(y+k)=x+h$  so that $f(x+h)=f(x)+k$ . By the inequalities above, $\|h-k\|<\|h\|/2$  so that $\|h\|/2<\|k\|<2\|h\|$ . On the other hand if $A=f^{\prime }(x)$ , then $\|A-I\|<1/2$ . Using the geometric series for $B=I-A$ , it follows that $\|A^{-1}\|<2$ . But then

${\|g(y+k)-g(y)-f^{\prime }(g(y))^{-1}k\| \over \|k\|}={\|h-f^{\prime }(x)^{-1}[f(x+h)-f(x)]\| \over \|k\|}\leq 4{\|f(x+h)-f(x)-f^{\prime }(x)h\| \over \|h\|}$

tends to 0 as $k$  and $h$  tend to 0, proving that $g$  is C1 with $g^{\prime }(y)=f^{\prime }(g(y))^{-1}$ .

The proof above is presented for a finite-dimensional space, but applies equally well for Banach spaces. If an invertible function $f$  is Ck with $k>1$ , then so too is its inverse. This follows by induction using the fact that the map $F(A)=A^{-1}$  on operators is Ck for any $k$  (in the finite-dimensional case this is an elementary fact because the inverse of a matrix is given as the adjugate matrix divided by its determinant).  The method of proof here can be found in the books of Henri Cartan, Jean Dieudonné, Serge Lang, Roger Godement and Lars Hörmander.

### A proof using the contraction mapping principle

Here is a proof based on the contraction mapping theorem. Specifically, following T. Tao, it uses the following consequence of the contraction mapping theorem.

Lemma — Let $B(0,r)$  denote an open ball of radius r in $\mathbb {R} ^{n}$  with center 0. If $g:B(0,r)\to \mathbb {R} ^{n}$  is a map such that $g(0)=0$  and there exists a constant $0  such that

$|g(y)-g(x)|\leq c|y-x|$

for all $x,y$  in $B(0,r)$ , then $f=I+g$  is injective on $B(0,r)$  and $B(0,(1-c)r)\subset f(B(0,r))\subset B(0,(1+c)r)$ .

(More generally, the statement remains true if $\mathbb {R} ^{n}$  is replaced by a Banach space.)

Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when $a=0,b=f(a)=0$  and $f'(0)=I$ . Let $g=f-I$ . The mean value inequality applied to $t\mapsto g(x+t(y-x))$  says:

$|g(y)-g(x)|\leq |y-x|\sup _{0

Since $g'(0)=I-I=0$  and $g'$  is continuous, we can find an $r>0$  such that

$|g(y)-g(x)|\leq 2^{-1}|y-x|$

for all $x,y$  in $B(0,r)$ . Then the early lemma says that $f=g+I$  is injective on $B(0,r)$  and $B(0,r/2)\subset f(B(0,r))$ . Then

$f:U=B(0,r)\cap f^{-1}(B(0,r/2))\to V=B(0,r/2)$

is bijective and thus has the inverse. Next, we show the inverse $f^{-1}$  is continuously differentiable (this part of the argument is the same as that in the previous proof). This time, let $g=f^{-1}$  denote the inverse of $f$  and $A=f'(x)$ . For $x=g(y)$ , we write $g(y+k)=x+h$  or $y+k=f(x+h)$ . Now, by the early estimate, we have

$|h-k|=|f(x+h)-f(x)-h|\leq |h|/2$

and so $|h|/2\leq |k|$ . Writing $\|\cdot \|$  for the operator norm,

$|g(y+k)-g(y)-A^{-1}k|=|h-A^{-1}(f(x+h)-f(x))|\leq \|A^{-1}\||Ah-f(x+h)+f(x)|.$

As $k\to 0$ , we have $h\to 0$  and $|h|/|k|$  is bounded. Hence, $g$  is differentiable at $y$  with the derivative $g'(y)=f'(g(y))^{-1}$ . Also, $g'$  is the same as the composition $\iota \circ f'\circ g$  where $\iota :T\mapsto T^{-1}$ ; so $g'$  is continuous.

It remains to show the lemma. First, the map $f$  is injective on $B(0,r)$  since if $f(x)=f(y)$ , then $g(y)-g(x)=x-y$  and so

$|g(y)-g(x)|=|y-x|$ ,

which is a contradiction unless $y=x$ . Next we show $f(B(0,r))\supset B(0,(1-c)r)$ . The idea is to note that this is equivalent to, given a point $y$  in $B(0,(1-c)r)$ , find a fixed point of the map

$F:{\overline {B}}(0,r')\to {\overline {B}}(0,r'),\,x\mapsto y-g(x)$

where $0  such that $|y|\leq (1-c)r'$  and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that $F$  is a well-defined strict-contraction mapping is straightforward. Finally, we have: $f(B(0,r))\subset B(0,(1+c)r)$  since

$|f(x)|=|x+g(x)-g(0)|\leq (1+c)|x|.\square$

As it might be clear, this proof is not substantially different from the previous one, as the proof of the contraction mapping theorem is by successive approximation.

## Applications

### Implicit function theorem

The inverse function theorem can be used to solve a system of equations

{\begin{aligned}&f_{1}(x)=y_{1}\\&\quad \vdots \\&f_{m}(x)=y_{m},\end{aligned}}

i.e., expressing $y_{1},\dots ,y_{m}$  as functions of $x=(x_{1},\dots ,x_{n})$ , provided the Jacobian matrices are invertible. The implicit function theorem allows to solve a more general system of equations:

{\begin{aligned}&f_{1}(x,y)=0\\&\quad \vdots \\&f_{m}(x,y)=0\end{aligned}}

for $y$  in terms of $x$ . Though more general, the theorem is actually a consequence of the inverse function theorem. First, the precise statement of the implicit function theorem is as follows:

• given a map $f:\mathbb {R} ^{n}\times \mathbb {R} ^{m}\to \mathbb {R} ^{m}$ , if $f(a,b)=0$ , $f$  is continuously differentiable in a neighborhood of $(a,b)$  and the derivative of $y\mapsto f(a,y)$  at $b$  is invertible, then there exists a differentiable map $g:U\to V$  for some neighborhoods $U,V$  of $a,b$  such that $f(x,g(x))=0$ .

To see this, consider the map $F(x,y)=(x,f(x,y))$ . By the inverse function theorem, $F:U\times V\to W$  has the inverse $G$  for some neighborhoods $U,V,W$ . We then have:

$(x,y)=F(G_{1}(x,y),G_{2}(x,y))=(G_{1}(x,y),f(G_{1}(x,y),G_{2}(x,y)),$

implying $x=G_{1}(x,y)$  and $y=f(x,G_{2}(x,y)).$  Thus $g(x)=G_{2}(x,0)$  has the required property. $\square$

### Giving a manifold structure

In differential geometry, the inverse function theorem is used to show that the pre-image of a regular value under a smooth map is a manifold. More generally, the theorem shows that, given a smooth map $f:P\to E$ , if $f$  is transversal to $i:M\hookrightarrow E$  a submanifold, then the pre-image $f^{-1}(M)\hookrightarrow P$  is a submanifold.

## Generalizations

### Manifolds

The inverse function theorem can be rephrased in terms of differentiable maps between differentiable manifolds. In this context the theorem states that for a differentiable map $F:M\to N$  (of class $C^{1}$ ), if the differential of $F$ ,

$dF_{p}:T_{p}M\to T_{F(p)}N$

is a linear isomorphism at a point $p$  in $M$  then there exists an open neighborhood $U$  of $p$  such that

$F|_{U}:U\to F(U)$

is a diffeomorphism. Note that this implies that the connected components of M and N containing p and F(p) have the same dimension, as is already directly implied from the assumption that dFp is an isomorphism. If the derivative of F is an isomorphism at all points p in M then the map F is a local diffeomorphism.

### Banach spaces

The inverse function theorem can also be generalized to differentiable maps between Banach spaces X and Y. Let U be an open neighbourhood of the origin in X and $F:U\to Y\!$  a continuously differentiable function, and assume that the Fréchet derivative $dF_{0}:X\to Y\!$  of F at 0 is a bounded linear isomorphism of X onto Y. Then there exists an open neighbourhood V of $F(0)\!$  in Y and a continuously differentiable map $G:V\to X\!$  such that $F(G(y))=y$  for all y in V. Moreover, $G(y)\!$  is the only sufficiently small solution x of the equation $F(x)=y\!$ .

### Banach manifolds

These two directions of generalization can be combined in the inverse function theorem for Banach manifolds.

### Constant rank theorem

The inverse function theorem (and the implicit function theorem) can be seen as a special case of the constant rank theorem, which states that a smooth map with constant rank near a point can be put in a particular normal form near that point. Specifically, if $F:M\to N$  has constant rank near a point $p\in M\!$ , then there are open neighborhoods U of p and V of $F(p)\!$  and there are diffeomorphisms $u:T_{p}M\to U\!$  and $v:T_{F(p)}N\to V\!$  such that $F(U)\subseteq V\!$  and such that the derivative $dF_{p}:T_{p}M\to T_{F(p)}N\!$  is equal to $v^{-1}\circ F\circ u\!$ . That is, F "looks like" its derivative near p. The set of points $p\in M$  such that the rank is constant in a neighbourhood of $p$  is an open dense subset of M; this is a consequence of semicontinuity of the rank function. Thus the constant rank theorem applies to a generic point of the domain.

When the derivative of F is injective (resp. surjective) at a point p, it is also injective (resp. surjective) in a neighborhood of p, and hence the rank of F is constant on that neighborhood, and the constant rank theorem applies.

### Holomorphic functions

If a holomorphic function F is defined from an open set U of $\mathbb {C} ^{n}\!$  into $\mathbb {C} ^{n}\!$ , and the Jacobian matrix of complex derivatives is invertible at a point p, then F is an invertible function near p. This follows immediately from the real multivariable version of the theorem. One can also show that the inverse function is again holomorphic.

### Polynomial functions

If it would be true, the Jacobian conjecture would be a variant of the inverse function theorem for polynomials. It states that if a vector-valued polynomial function has a Jacobian determinant that is an invertible polynomial (that is a nonzero constant), then it has an inverse that is also a polynomial function. It is unknown whether this is true or false, even in the case of two variables. This is a major open problem in the theory of polynomials.

### Selections

When $f:\mathbb {R} ^{n}\to \mathbb {R} ^{m}$  with $m\leq n$ , $f$  is $k$  times continuously differentiable, and the Jacobian $A=\nabla f({\overline {x}})$  at a point ${\overline {x}}$  is of rank $m$ , the inverse of $f$  may not be unique. However, there exists a local selection function $s$  such that $f(s(y))=y$  for all $y$  in a neighborhood of ${\overline {y}}=f({\overline {x}})$ , $s({\overline {y}})={\overline {x}}$ , $s$  is $k$  times continuously differentiable in this neighborhood, and $\nabla s({\overline {y}})=A^{T}(AA^{T})^{-1}$  ($\nabla s({\overline {y}})$  is the Moore–Penrose pseudoinverse of $A$ ).