\documentstyle[psfig]{article}
\textheight=8.5in
\textwidth=6.5in
\oddsidemargin=0in
\topmargin=-0.65in
\begin{document}
\title{\bf EE 290Q Topics in Communication Networks\\
Lecture 8: Rare Events and Large Deviation Theory}
\author{Socrates Vamvakos}
\date{February 8, 1996}
\maketitle
\section{Introduction}
In the previous lectures we focused on worst-case analysis of communication
networks in a deterministic framework. However, due to the inherent
statistical nature of the sources, a statistical framework for the study
of communication networks may be more appropriate in certain ways.
Using this kind of approach one may be able to analyze designs that
allow for increased utilisation of the network resources.
Various techniques can be used for the statistical analysis of networks.
In this lecture we will focus on certain aspects of the Large Deviation
Theory, which is especially useful for estimating the probability of rare
events. Since in the analysis of communication networks the quantities
of interest are often probabilities of rare events (e.g. overflow of a
buffer at a network node), Large Deviation Theory can prove quite useful.
\section{Basic Theory and Results}
We will consider the following simple scenario: Assume that we have
a sequence of i.i.d. random variables $X_{1},X_{2},\ldots$ for
which $E[X_{i}]=\bar{\mu}$ ,
$g(r)\stackrel{\bigtriangleup}{=}E[e^{rX_{i}}]$
and $\Lambda(r)\stackrel{\bigtriangleup}{=}\log g(r)$, $i=1,2,\ldots$.
Moreover, we will assume without loss of generality that $X_{i}$ has a
probability density function which we will denote $p(x)$.
Let $S_{n}\stackrel{\bigtriangleup}{=}\sum_{i=1}^{n}X_{i}$ be the sum of
the $n$ first random variables.
We are interested in the probability $P(S_{n}>n\mu)$, where
$\mu>\bar{\mu}$, and more specifically
in the behaviour of this probability for large $n$.
>From the Weak Law of Large Numbers (WLLN) we know that
$P(S_{n}>n\mu)\rightarrow 0$ as $n\rightarrow \infty$. The Large
Deviation Theory allows us to estimate the {\it rate} of convergence
of this probability to zero.
We will first establish an upper bound to this probability using
the Markov Inequality: \\
Consider a random variable $Z\geq 0$ with expected value $E[Z]<\infty$.
Then we have: $P(Z>a)\leq \frac{E[Z]}{a}$ for any $a>0$.
To show this: Suppose that $f(x)$ is the density of $Z$. Then:
\begin{displaymath}
E[Z]=\int_{0}^{\infty}xf(x)dx \geq \int_{a}^{\infty}xf(x)dx
\geq a\cdot\int_{a}^{\infty}f(x)dx = a\cdot P(Z>a)
\end{displaymath}
from which the result follows.
We will now apply this result to bound $P(S_{n}>n\mu)$. Fix $r>0$ and
consider the random variable $Z=e^{rS_{n}}$. This is obviously a positive
random variable and we can therefore use the Markov Inequality.
\begin{eqnarray}
P(S_{n}>n\mu)&=&P(e^{rS_{n}}>e^{rn\mu}) \ \ \
({\it e^{x}\ a\ monotone\ function}) \nonumber \\
&\leq& \frac{E[e^{rS_{n}}]}{e^{rn\mu}} \ \ \
({\it Markov\ Inequality}) \nonumber \\
&=& \frac{g^{n}(r)}{e^{rn\mu}} \ \ \
({\it X_{i}\ i.i.d.}) \nonumber \\
&=&\exp[-n(r\mu - \Lambda(r))] \label{1}
\end{eqnarray}
Note that the assumption of the existence of the generating function
above is rather strong, since it implies the existence of all the moments
of the $X_{i}$'s, and thus it is not entirely surprising that we obtain
an exponential decay of $P(S_{n}>n\mu)$ in (\ref{1}).
Since (\ref{1}) holds for every $r>0$, we have the stronger inequality:
\begin{equation}
P(S_{n}>n\mu)\leq \exp[-n\sup_{r>0}(r\mu-\Lambda(r))] \label{2}
\end{equation}
We will now show that if $\mu>\bar{\mu}$, then
\begin{equation}
\Lambda^{*}(\mu)\stackrel{\bigtriangleup}{=}\sup_{r>0}(r\mu-\Lambda(r))>0.
\label{3}
\end{equation}
$\Lambda^{*}(\cdot)$ is called the {\it Legendre Transform} of
$\Lambda(\cdot)$.
To show (\ref{3}) we notice first that $\Lambda(\cdot)$ is a convex function,
which means that the global maximum of $r\mu-\Lambda(r)$ is unique for every
$\mu$ and is achieved at the point $r=\tilde{r}$ which satisfies the
following relation:
\begin{equation}
\frac{\partial}{\partial r}[r\mu-\Lambda(r)]\mid_{r=\tilde{r}}
=\mu-\Lambda'(\tilde{r})=0 \label{4}
\end{equation}
>From (\ref{4}) we conclude that in order to find $\Lambda^{*}(\mu)$
graphically, one should draw the tangent to the curve $\Lambda(r)$ with
slope equal to $\mu$. The x-axis coordinate of the point
at which the tangent touches
the curve $\Lambda(r)$ is the optimizing value $\tilde{r}$, while the
y-axis coordinate of the point at which the tangent
intersects the $y$-axis is the
negative of $\Lambda^{*}(\mu)$. The above facts are illustrated in the
following figure.
\begin{figure}[ht]
\centerline{\psfig{figure=fig1.ps,height=50mm}}
\caption{Graphical Interpretation of the Legendre Transform}
\end{figure}
The resulting curve for $\Lambda^{*}(\mu)$ is shown
in Figure 2. Note that $\Lambda^{*}(\bar{\mu})=0$ as seen
immediately from the previous figure. Also since $\Lambda(r)$ convex,
we immediately obtain from Figure 1 that $-\Lambda^{*}(\mu)<0$ or
equivalently $\Lambda^{*}(\mu)>0$ for $\mu\neq\bar{\mu}$.
\begin{figure}[ht]
\centerline{\psfig{figure=fig2.ps,height=50mm}}
\caption{The Legendre Transform $\Lambda^{*}(\cdot)$}
\end{figure}
To summarize, we have shown that $P(S_{n}>n\mu)\leq \exp[-n\Lambda^{*}(\mu)]$.
Now we will state the main result of the Large Deviation Theory which will
prove useful in the analysis of buffer systems: \\
\noindent{\bf Theorem}. {\it Under the conditions and definitions stated
above, the following relation holds: }
\begin{equation}
P(S_{n}>n\mu)=\exp[-n\Lambda^{*}(\mu)+o(n)], \label{5}
\end{equation}
{\it where $o(n)$ is a function of $n$ for which
$\lim_{n\rightarrow\infty}\frac{o(n)}{n}=0$.
Equivalently, we may write: }
\begin{equation}
\lim_{n\rightarrow\infty}\frac{1}{n}\log P(S_{n}>n\mu)=-\Lambda^{*}(\mu)
\label{6}
\end{equation}
\noindent{\bf Sketch of Proof}.
Recall that $p(\cdot)$ is the density of the $X_{i}$'s. We define a new density
(tilted distribution) by the relation
\begin{equation}
q(x)=\frac{e^{\tilde{r}x}}{g(\tilde{r})}p(x) \label{7}
\end{equation}
It is easy to check that indeed $\int q(x)dx=1$. In (\ref{7}) $\tilde{r}$
is such that $\mu=\Lambda'(\tilde{r})$, i.e. $\tilde{r}$ is the value of $r$
for which $\Lambda^{*}(\mu)$ is achieved in (\ref{3}).
If we now define a new sequence of
i.i.d. random variables $\tilde{X}_{i},\ i=1,2,\ldots$ with density
$q(\cdot)$, then we can easily see that $\mu=E[\tilde{X}_{i}]$.
Indeed: We have $\mu=\Lambda'(\tilde{r})$ and $\Lambda(r)=\log g(r)$ and
hence $\mu=\frac{g'(\tilde{r})}{g(\tilde{r})}$. \\
Moreover,
\begin{displaymath}
g'(r)=\frac{d}{dr}E[e^{rX_{1}}]=E[X_{1}e^{rX_{1}}]
=\int xe^{rx}p(x)dx
\end{displaymath}
Therefore:
\begin{equation}
\mu=\int \frac{xe^{\tilde{r}x}}{g(\tilde{r})}p(x)dx \ =
\int xq(x)dx=E[\tilde{X}_{1}]
\label{8}
\end{equation}
The event of interest is:
\begin{eqnarray}
P(S_{n}>n\mu)&=&\int_{x_{1}+\ldots+x_{n}>n\mu} p(x_{1})\ldots p(x_{n})
dx_{1}\ldots dx_{n} \nonumber \\
&=&\int_{x_{1}+\ldots+x_{n}>n\mu} g^{n}(\tilde{r})
e^{-\tilde{r}(x_{1}+\ldots+x_{n})} q(x_{1})\ldots q(x_{n})
dx_{1}\ldots dx_{n} \nonumber \\
&=&e^{n\Lambda(\tilde{r})} \int_{x_{1}+\ldots+x_{n}>n\mu}
e^{-\tilde{r}(x_{1}+\ldots+x_{n})} q(x_{1})\ldots q(x_{n})
dx_{1}\ldots dx_{n}
\label{9}
\end{eqnarray}
To proceed with the evaluation of the above integral, suppose that the
conditions of the Central Limit Theorem hold. Then, since the quantity
under the integral is nonnegative for all $(x_{1},\ldots,x_{n})$, we
get for any small $\varepsilon>0$:
\begin{eqnarray}
&&\int_{x_{1}+\ldots+x_{n}>n\mu} e^{-\tilde{r}(x_{1}+\ldots+x_{n})}
q(x_{1})\ldots q(x_{n}) dx_{1}\ldots dx_{n} \nonumber \\
&\geq& \int_{n\mu\bar{\mu}>0$. If this is not
the case, then we can work with $(-X_{i})$ and $(-\tilde{X}_{i})$ and get
the same results.
Now, as we saw in (\ref{8}), $\mu$ is the expected value of the
$\tilde{X}_{i}$'s. Hence, from the Central Limit Theorem we have that
\begin{equation}
P(n\mu<\tilde{X}_{1}+\cdots+\tilde{X}_{n}
\frac{1}{4}
\label{12}
\end{equation}
Thus combining (\ref{9}),(\ref{10}) and (\ref{12}) we get:
\begin{eqnarray}
P(S_{n}>n\mu) \geq e^{-n(\tilde{r}\mu-\Lambda(\tilde{r}))}
\cdot \frac{1}{4} e^{-\tilde{r}n\varepsilon}=
e^{-n\Lambda^{*}(\mu)}\cdot \frac{1}{4} e^{-\tilde{r}n\varepsilon}
\label{13}
\end{eqnarray}
or equivalently:
\begin{eqnarray}
\frac{1}{n}\log P(S_{n}>n\mu) \geq -\Lambda^{*}(\mu)
+\frac{\log \frac{1}{4}}{n} - \tilde{r}\varepsilon
\rightarrow -\Lambda^{*}(\mu) - \tilde{r}\varepsilon\ \ \ {\sf as}\ \
n\rightarrow\infty
\label{14}
\end{eqnarray}
Since $\varepsilon$ arbitrarily small in (\ref{14}), from
relations (\ref{2}) and (\ref{14}) we obtain (\ref{6}).
\hfill $\Box$
>From the above analysis we conclude that the quantity $e^{-n\Lambda^{*}(\mu)}$
is not only an upper bound of $P(S_{n}>n\mu)$ but also a good estimate
for large $n$.
\section{Estimation of Buffer Overflow Probability using LDT}
We will consider now an application of the
above results in the case of a single queue
as shown in the following figure:
\begin{figure}[ht]
\centerline{\psfig{figure=fig3.ps,width=70mm}}
\caption{Time-Slotted Buffer System}
\end{figure}
We consider a discrete-time (time-slotted) queue. At each time slot
$X_{t}$ units of information arrive at the queue (the $X_{t}$'s are i.i.d.
random variables) and up to $C$
units of information leave the queue. Suppose that we are given a buffer
of infinite size.
What is the probability that the contents of the buffer exceed a certain
level $B$? This is equivalent to the event that some cells experience a
delay greater than $\frac{B}{C}$. Notice here that the probability of the
above event is an upper bound to the overflow probability of a buffer
of size $B$ and turns out to be a good approximation for large $B$.
We assume for stability reasons that $E[X_{t}]B)$. The result is that for large $B$:
\begin{equation}
P(W_{0}>B)\approx \exp(-r^{*}B) \label{16}
\end{equation}
In order to define $r^{*}$: Let $Y_{t}=X_{t}-C$ (can obviously take on
negative values) and let $g(r)=E[e^{rY_{t}}]$, $\Lambda(r)=\log g(r)$.
Due to the stability assumption we have that $E[Y_{t}]=\bar{\mu}-C<0$
and thus the nonzero root of $\Lambda(r)$ is strictly positive.
This root is the above mentioned quantity $r^{*}$. Its graphical
interpretation is shown in the next figure.
\begin{figure}[ht]
\centerline{\psfig{figure=fig4.ps,height=50mm}}
\caption{Graphical Interpretation of $r^{*}$}
\end{figure}
\section{Summary}
In this lecture we presented certain results of the Theory of Large
Deviations which will prove useful in the statistical analysis of
communication networks and especially in the calculation of
probabilities of rare events, such as buffer overflows. The presented
results will also provide the foundation for the development of the
theory of effective bandwidths, which has attracted a lot of interest
from researchers in the field.
\begin{thebibliography}{1}
\bibitem{SW95} A. Shwartz and A. Weiss, {\em "Large Deviations for
Performance Analysis: Queues, Communications and Computing"},
New York: Chapman and Hall, 1995, chapter 1.
\end{thebibliography}
\end{document}