BREAKING NEWS
Channel capacity

## Summary

Channel capacity, in electrical engineering, computer science, and information theory, is the tight upper bound on the rate at which information can be reliably transmitted over a communication channel.

Following the terms of the noisy-channel coding theorem, the channel capacity of a given channel is the highest information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. [1][2]

Information theory, developed by Claude E. Shannon in 1948, defines the notion of channel capacity and provides a mathematical model by which it may be computed. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution. [3]

The notion of channel capacity has been central to the development of modern wireline and wireless communication systems, with the advent of novel error correction coding mechanisms that have resulted in achieving performance very close to the limits promised by channel capacity.

## Formal definition

The basic mathematical model for a communication system is the following:

${\displaystyle {\xrightarrow[{\text{Message}}]{W}}{\begin{array}{|c|}\hline {\text{Encoder}}\\f_{n}\\\hline \end{array}}{\xrightarrow[{\mathrm {Encoded \atop sequence} }]{X^{n}}}{\begin{array}{|c|}\hline {\text{Channel}}\\p(y|x)\\\hline \end{array}}{\xrightarrow[{\mathrm {Received \atop sequence} }]{Y^{n}}}{\begin{array}{|c|}\hline {\text{Decoder}}\\g_{n}\\\hline \end{array}}{\xrightarrow[{\mathrm {Estimated \atop message} }]{\hat {W}}}}$

where:

• ${\displaystyle W}$ is the message to be transmitted;
• ${\displaystyle X}$ is the channel input symbol (${\displaystyle X^{n}}$ is a sequence of ${\displaystyle n}$ symbols) taken in an alphabet ${\displaystyle {\mathcal {X}}}$;
• ${\displaystyle Y}$ is the channel output symbol (${\displaystyle Y^{n}}$ is a sequence of ${\displaystyle n}$ symbols) taken in an alphabet ${\displaystyle {\mathcal {Y}}}$;
• ${\displaystyle {\hat {W}}}$ is the estimate of the transmitted message;
• ${\displaystyle f_{n}}$ is the encoding function for a block of length ${\displaystyle n}$;
• ${\displaystyle p(y|x)=p_{Y|X}(y|x)}$ is the noisy channel, which is modeled by a conditional probability distribution; and,
• ${\displaystyle g_{n}}$ is the decoding function for a block of length ${\displaystyle n}$.

Let ${\displaystyle X}$ and ${\displaystyle Y}$ be modeled as random variables. Furthermore, let ${\displaystyle p_{Y|X}(y|x)}$ be the conditional probability distribution function of ${\displaystyle Y}$ given ${\displaystyle X}$, which is an inherent fixed property of the communication channel. Then the choice of the marginal distribution ${\displaystyle p_{X}(x)}$ completely determines the joint distribution ${\displaystyle p_{X,Y}(x,y)}$ due to the identity

${\displaystyle \ p_{X,Y}(x,y)=p_{Y|X}(y|x)\,p_{X}(x)}$

which, in turn, induces a mutual information ${\displaystyle I(X;Y)}$. The channel capacity is defined as

${\displaystyle \ C=\sup _{p_{X}(x)}I(X;Y)\,}$

where the supremum is taken over all possible choices of ${\displaystyle p_{X}(x)}$.

Channel capacity is additive over independent channels.[4] It means that using two independent channels in a combined manner provides the same theoretical capacity as using them independently. More formally, let ${\displaystyle p_{1}}$ and ${\displaystyle p_{2}}$ be two independent channels modelled as above; ${\displaystyle p_{1}}$ having an input alphabet ${\displaystyle {\mathcal {X}}_{1}}$ and an output alphabet ${\displaystyle {\mathcal {Y}}_{1}}$. Idem for ${\displaystyle p_{2}}$. We define the product channel ${\displaystyle p_{1}\times p_{2}}$ as ${\displaystyle \forall (x_{1},x_{2})\in ({\mathcal {X}}_{1},{\mathcal {X}}_{2}),\;(y_{1},y_{2})\in ({\mathcal {Y}}_{1},{\mathcal {Y}}_{2}),\;(p_{1}\times p_{2})((y_{1},y_{2})|(x_{1},x_{2}))=p_{1}(y_{1}|x_{1})p_{2}(y_{2}|x_{2})}$

This theorem states:

${\displaystyle C(p_{1}\times p_{2})=C(p_{1})+C(p_{2})}$

Proof

We first show that ${\displaystyle C(p_{1}\times p_{2})\geq C(p_{1})+C(p_{2})}$.

Let ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ be two independent random variables. Let ${\displaystyle Y_{1}}$ be a random variable corresponding to the output of ${\displaystyle X_{1}}$ through the channel ${\displaystyle p_{1}}$, and ${\displaystyle Y_{2}}$ for ${\displaystyle X_{2}}$ through ${\displaystyle p_{2}}$.

By definition ${\displaystyle C(p_{1}\times p_{2})=\sup _{p_{X_{1},X_{2}}}(I(X_{1},X_{2}:Y_{1},Y_{2}))}$.

Since ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ are independent, as well as ${\displaystyle p_{1}}$ and ${\displaystyle p_{2}}$, ${\displaystyle (X_{1},Y_{1})}$ is independent of ${\displaystyle (X_{2},Y_{2})}$. We can apply the following property of mutual information: ${\displaystyle I(X_{1},X_{2}:Y_{1},Y_{2})=I(X_{1}:Y_{1})+I(X_{2}:Y_{2})}$

For now we only need to find a distribution ${\displaystyle p_{X_{1},X_{2}}}$ such that ${\displaystyle I(X_{1},X_{2}:Y_{1},Y_{2})\geq I(X_{1}:Y_{1})+I(X_{2}:Y_{2})}$. In fact, ${\displaystyle \pi _{1}}$ and ${\displaystyle \pi _{2}}$, two probability distributions for ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ achieving ${\displaystyle C(p_{1})}$ and ${\displaystyle C(p_{2})}$, suffice:

${\displaystyle C(p_{1}\times p_{2})\geq I(X_{1},X_{2}:Y_{1},Y_{2})=I(X_{1}:Y_{1})+I(X_{2}:Y_{2})=C(p_{1})+C(p_{2})}$

ie. ${\displaystyle C(p_{1}\times p_{2})\geq C(p_{1})+C(p_{2})}$

Now let us show that ${\displaystyle C(p_{1}\times p_{2})\leq C(p_{1})+C(p_{2})}$.

Let ${\displaystyle \pi _{12}}$ be some distribution for the channel ${\displaystyle p_{1}\times p_{2}}$ defining ${\displaystyle (X_{1},X_{2})}$ and the corresponding output ${\displaystyle (Y_{1},Y_{2})}$. Let ${\displaystyle {\mathcal {X}}_{1}}$ be the alphabet of ${\displaystyle X_{1}}$, ${\displaystyle {\mathcal {Y}}_{1}}$ for ${\displaystyle Y_{1}}$, and analogously ${\displaystyle {\mathcal {X}}_{2}}$ and ${\displaystyle {\mathcal {Y}}_{2}}$.

By definition of mutual information, we have

{\displaystyle {\begin{aligned}I(X_{1},X_{2}:Y_{1},Y_{2})&=H(Y_{1},Y_{2})-H(Y_{1},Y_{2}|X_{1},X_{2})\\&\leq H(Y_{1})+H(Y_{2})-H(Y_{1},Y_{2}|X_{1},X_{2})\end{aligned}}}

Let us rewrite the last term of entropy.

${\displaystyle H(Y_{1},Y_{2}|X_{1},X_{2})=\sum _{(x_{1},x_{2})\in {\mathcal {X}}_{1}\times {\mathcal {X}}_{2}}\mathbb {P} (X_{1},X_{2}=x_{1},x_{2})H(Y_{1},Y_{2}|X_{1},X_{2}=x_{1},x_{2})}$

By definition of the product channel, ${\displaystyle \mathbb {P} (Y_{1},Y_{2}=y_{1},y_{2}|X_{1},X_{2}=x_{1},x_{2})=\mathbb {P} (Y_{1}=y_{1}|X_{1}=x_{1})\mathbb {P} (Y_{2}=y_{2}|X_{2}=x_{2})}$. For a given pair ${\displaystyle (x_{1},x_{2})}$, we can rewrite ${\displaystyle H(Y_{1},Y_{2}|X_{1},X_{2}=x_{1},x_{2})}$ as:

{\displaystyle {\begin{aligned}H(Y_{1},Y_{2}|X_{1},X_{2}=x_{1},x_{2})&=\sum _{(y_{1},y_{2})\in {\mathcal {Y}}_{1}\times {\mathcal {Y}}_{2}}\mathbb {P} (Y_{1},Y_{2}=y_{1},y_{2}|X_{1},X_{2}=x_{1},x_{2})\log(\mathbb {P} (Y_{1},Y_{2}=y_{1},y_{2}|X_{1},X_{2}=x_{1},x_{2}))\\&=\sum _{(y_{1},y_{2})\in {\mathcal {Y}}_{1}\times {\mathcal {Y}}_{2}}\mathbb {P} (Y_{1},Y_{2}=y_{1},y_{2}|X_{1},X_{2}=x_{1},x_{2})[\log(\mathbb {P} (Y_{1}=y_{1}|X_{1}=x_{1}))+\log(\mathbb {P} (Y_{2}=y_{2}|X_{2}=x_{2}))]\\&=H(Y_{1}|X_{1}=x_{1})+H(Y_{2}|X_{2}=x_{2})\end{aligned}}}

By summing this equality over all ${\displaystyle (x_{1},x_{2})}$, we obtain ${\displaystyle H(Y_{1},Y_{2}|X_{1},X_{2})=H(Y_{1}|X_{1})+H(Y_{2}|X_{2})}$.

We can now give an upper bound over mutual information:

{\displaystyle {\begin{aligned}I(X_{1},X_{2}:Y_{1},Y_{2})&\leq H(Y_{1})+H(Y_{2})-H(Y_{1}|X_{1})-H(Y_{2}|X_{2})\\&=I(X_{1}:Y_{1})+I(X_{2}:Y_{2})\end{aligned}}}

This relation is preserved at the supremum. Therefore

${\displaystyle C(p_{1}\times p_{2})\leq C(p_{1})+C(p_{2})}$

Combining the two inequalities we proved, we obtain the result of the theorem:

${\displaystyle C(p_{1}\times p_{2})=C(p_{1})+C(p_{2})}$

## Shannon capacity of a graph

If G is an undirected graph, it can be used to define a communications channel in which the symbols are the graph vertices, and two codewords may be confused with each other if their symbols in each position are equal or adjacent. The computational complexity of finding the Shannon capacity of such a channel remains open, but it can be upper bounded by another important graph invariant, the Lovász number.[5]

## Noisy-channel coding theorem

The noisy-channel coding theorem states that for any error probability ε > 0 and for any transmission rate R less than the channel capacity C, there is an encoding and decoding scheme transmitting data at rate R whose error probability is less than ε, for a sufficiently large block length. Also, for any rate greater than the channel capacity, the probability of error at the receiver goes to 0.5 as the block length goes to infinity.

## Example application

An application of the channel capacity concept to an additive white Gaussian noise (AWGN) channel with B Hz bandwidth and signal-to-noise ratio S/N is the Shannon–Hartley theorem:

${\displaystyle C=B\log _{2}\left(1+{\frac {S}{N}}\right)\ }$

C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are expressed in a linear power unit (like watts or volts2). Since S/N figures are often cited in dB, a conversion may be needed. For example, a signal-to-noise ratio of 30 dB corresponds to a linear power ratio of ${\displaystyle 10^{30/10}=10^{3}=1000}$.

## Channel capacity in wireless communications

This section[6] focuses on the single-antenna, point-to-point scenario. For channel capacity in systems with multiple antennas, see the article on MIMO.

### Bandlimited AWGN channel

AWGN channel capacity with the power-limited regime and bandwidth-limited regime indicated. Here, ${\displaystyle {\frac {\bar {P}}{N_{0}}}=1}$; B and C can be scaled proportionally for other values.

If the average received power is ${\displaystyle {\bar {P}}}$ [W], the total bandwidth is ${\displaystyle W}$ in Hertz, and the noise power spectral density is ${\displaystyle N_{0}}$ [W/Hz], the AWGN channel capacity is

${\displaystyle C_{\text{AWGN}}=W\log _{2}\left(1+{\frac {\bar {P}}{N_{0}W}}\right)}$ [bits/s],

where ${\displaystyle {\frac {\bar {P}}{N_{0}W}}}$ is the received signal-to-noise ratio (SNR). This result is known as the Shannon–Hartley theorem.[7]

When the SNR is large (SNR ≫ 0 dB), the capacity ${\displaystyle C\approx W\log _{2}{\frac {\bar {P}}{N_{0}W}}}$ is logarithmic in power and approximately linear in bandwidth. This is called the bandwidth-limited regime.

When the SNR is small (SNR ≪ 0 dB), the capacity ${\displaystyle C\approx {\frac {\bar {P}}{N_{0}\ln 2}}}$ is linear in power but insensitive to bandwidth. This is called the power-limited regime.

The bandwidth-limited regime and power-limited regime are illustrated in the figure.

### Frequency-selective AWGN channel

The capacity of the frequency-selective channel is given by so-called water filling power allocation,

${\displaystyle C_{N_{c}}=\sum _{n=0}^{N_{c}-1}\log _{2}\left(1+{\frac {P_{n}^{*}|{\bar {h}}_{n}|^{2}}{N_{0}}}\right),}$

where ${\displaystyle P_{n}^{*}=\max \left\{\left({\frac {1}{\lambda }}-{\frac {N_{0}}{|{\bar {h}}_{n}|^{2}}}\right),0\right\}}$ and ${\displaystyle |{\bar {h}}_{n}|^{2}}$ is the gain of subchannel ${\displaystyle n}$, with ${\displaystyle \lambda }$ chosen to meet the power constraint.

In a slow-fading channel, where the coherence time is greater than the latency requirement, there is no definite capacity as the maximum rate of reliable communications supported by the channel, ${\displaystyle \log _{2}(1+|h|^{2}SNR)}$, depends on the random channel gain ${\displaystyle |h|^{2}}$, which is unknown to the transmitter. If the transmitter encodes data at rate ${\displaystyle R}$ [bits/s/Hz], there is a non-zero probability that the decoding error probability cannot be made arbitrarily small,

${\displaystyle p_{out}=\mathbb {P} (\log(1+|h|^{2}SNR),

in which case the system is said to be in outage. With a non-zero probability that the channel is in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to determine the largest value of ${\displaystyle R}$ such that the outage probability ${\displaystyle p_{out}}$ is less than ${\displaystyle \epsilon }$. This value is known as the ${\displaystyle \epsilon }$-outage capacity.

In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword length spans many coherence periods, one can average over many independent channel fades by coding over a large number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication of ${\displaystyle \mathbb {E} (\log _{2}(1+|h|^{2}SNR))}$ [bits/s/Hz] and it is meaningful to speak of this value as the capacity of the fast-fading channel.

• "Transmission rate of a channel", Encyclopedia of Mathematics, EMS Press, 2001 [1994]
• AWGN Channel Capacity with various constraints on the channel input (interactive demonstration)

## References

1. ^ Saleem Bhatti. "Channel capacity". Lecture notes for M.Sc. Data Communication Networks and Distributed Systems D51 -- Basic Communications and Networks. Archived from the original on 2007-08-21.
2. ^ Jim Lesurf. "Signals look like noise!". Information and Measurement, 2nd ed.
3. ^ Thomas M. Cover, Joy A. Thomas (2006). Elements of Information Theory. John Wiley & Sons, New York. ISBN 9781118585771.
4. ^ Cover, Thomas M.; Thomas, Joy A. (2006). "Chapter 7: Channel Capacity". Elements of Information Theory (Second ed.). Wiley-Interscience. pp. 206–207. ISBN 978-0-471-24195-9.
5. ^ Lovász, László (1979), "On the Shannon Capacity of a Graph", IEEE Transactions on Information Theory, IT-25 (1): 1–7, doi:10.1109/tit.1979.1055985.
6. ^ David Tse, Pramod Viswanath (2005), Fundamentals of Wireless Communication, Cambridge University Press, UK, ISBN 9780521845274
7. ^ The Handbook of Electrical Engineering. Research & Education Association. 1996. p. D-149. ISBN 9780878919819.