Concentration of Sub-Exponential Random Variables

This post reviews the definition of sub-exponential random variables and discusses useful tail bounds (Bernstein) for this family.

From Sub-Gaussian to Sub-Exponential

Recall that a random variable XX with mean μ\mu is sub-Gaussian with constant σ\sigma if its moment generating function is bounded above as follows:

E[eλ(Xμ)]eλ2σ2/2  λR.(1) \mathbb{E}[e^{\lambda (X - \mu)}] \leq e^{\lambda^2 \sigma^2/2} \quad \forall \; \lambda \in \mathbb{R}. \tag{1}

That proved useful because, if XX is sub-Gaussian, then we showed that the tail of its distribution is dominated by a Gaussian random variable’s with parameters μ\mu and σ2\sigma^2. In particular, we are able to apply the Chernoff bound to sub-Gaussian random variables and derive Hoeffding inequality-type bounds on the tail distribution.

Sub-Gaussianness, however, is somewhat restrictive. It requires that the bound on the moment generating function hold for all values of λ\lambda! We can relax that requirement just a little bit and demand that the inequality hold for small values of λ\lambda around 00 instead:

E[eλ(Xμ)]eλ2ν2/2  λ1b.(2) \mathbb{E}[e^{\lambda (X - \mu)}] \leq e^{\lambda^2 \nu^2/2} \quad \forall \; \lvert \lambda \rvert \leq \frac{1}{b}. \tag{2}

A random variable XX with mean μ\mu for which inequality (2) holds is said to be sub-exponential with parameters (ν,b)(\nu, b) for some non-negative bb (and defining 1/0=1/0=\infty).

Before we continue, it’s worth mentioning that sub-exponentialness is preserved under summation of independent random variables, just as sub-Gaussiannes is. In other words, if we have nn sub-exponential random variables XiX_i’s with parameters (νi,bi)(\nu_i, b_i), then their sum X=iXiX=\sum_i X_i is sub-exponential:

E[eλ(XE[X])]=E[ieλ(Xiμi)]=iE[eλ(Xiμi)]ieλ2νi2/2=eλ2iνi2/2, \mathbb{E}[e^{\lambda (X - \mathbb{E}[X])}] = \mathbb{E}[\prod_i e^{\lambda (X_i - \mu_i)}] = \prod_i \mathbb{E}[e^{\lambda (X_i - \mu_i)}] \leq \prod_i e^{\lambda^2\nu_i^2/2} = e^{\lambda^2 \sum_i \nu_i^2/2},

which is valid when λ1/maxibi\lvert \lambda \rvert \leq 1/\max_i b_i. So its parameters are ((iνi2)1/2,maxibi)((\sum_i \nu_i^2)^{1/2}, \max_i b_i).

Equivalent Definitions

There are other ways of characterizing a sub-exponential random variable that are equivalent to the definition in (2). For example, if the moment generating function of XX is finite for small values of λ\lambda, then XX is indeed a sub-exponential variable! This, and other definitions, are presented in the result below.

The following definitions are equivalent for a centered random variable XX:

i. There exist constants ν\nu and bb such that inequality (2) holds.

ii. There exists a constant c0c_0 such that E[eλX]\mathbb{E}[e^{\lambda X}] is finite for λc0\lvert \lambda \rvert \leq c_0.

iii. There exist constants c1c_1 and c2c_2 such that P[Xt]c1ec2t\mathbb{P}[\lvert X \rvert \geq t] \leq c_1 e^{-c_2t} for all t>0t > 0.

iv. The quantity supλ2(E[Xk]k!)1/k\sup_{\lambda \geq 2}\Big( \frac{\mathbb{E}[X^k]}{k!} \Big)^{1/k} is finite.

Proof. (i) \Leftrightarrow (ii) is straightforward.

(ii) \Rightarrow (iii): Applying the Chernoff bound to XX gives:

P[Xt]E[eλX]eλtλ=c0/2=E[ec0X/2]ec0t/2. \mathbb{P}[X \geq t] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda t}} \Bigg\rvert_{\lambda = c_0/2} = \mathbb{E}[e^{c_0 X/2}] e^{-c_0 t/2}.

So setting c1=E[ec0X/2]+E[ec0X/2]c_1 = \mathbb{E}[e^{c_0 X/2}] + \mathbb{E}[e^{-c_0 X/2}] and c2=c0/2c_2=c_0/2 gives the desired result.

(iii) \Rightarrow (ii): We start from the definition of expectation of a positive random variable, for a positive λ\lambda:

E[eλX]=0P[eλXt]  dt1+1P[Xlogtλ]  dt1+c11ec2logt/λ  dt=1+c11tc2/λ  dt1+c1, \begin{aligned} \mathbb{E}[e^{\lambda \lvert X \rvert}] &= \int_0^\infty \mathbb{P[e^{ \lambda \lvert X \rvert} \geq t]} \; dt \leq 1 + \int_1^{\infty} \mathbb{P}[\lvert X \rvert \geq \frac{\log t}{\lambda}] \; dt \\ &\leq 1 + c_1 \int_1^\infty e^{-c_2 \log t / \lambda} \; dt = 1 + c_1 \int_1^\infty t^{-c_2 / \lambda} \; dt \leq 1 + c_1, \end{aligned}

when c2/λ2c_2 / \lambda \geq 2. Because E[eλX]E[eλX]\mathbb{E}[e^{-\lambda X}] \leq \mathbb{E}[e^{\lvert \lambda \rvert \lvert X \rvert}] and E[eλX]E[eλX]\mathbb{E}[e^{\lambda X}] \leq \mathbb{E}[e^{\lvert \lambda \rvert \lvert X \rvert}], we can conclude that E[eλX]\mathbb{E}[e^{\lambda X}] is finite for λc2/2\lvert \lambda \rvert \leq c_2/2.

(ii) \Leftrightarrow (iv): We can argue using the power series expansion of the moment generating function:

E[eλX]=1+k=2E[Xk]k!λk. \mathbb{E}[e^{\lambda X}] = 1 + \sum_{k=2}^\infty \frac{\mathbb{E}[X^k]}{k!} \lambda^k.

Notice that the inverse of lim supλ2E[Xk]k!1/k\limsup_{\lambda \geq 2} \Big\lvert \frac{\mathbb{E}[X^k]}{k!} \Big\rvert^{1/k}is the convergence radius of the series. As a result, if (ii) holds, then the series must converge, implying that the quantity in (iv) must be finite. If (iv) holds, then the series must converge, implying that the quantity in (ii) must be finite for within the convergence radius.

Bernstein’s Condition

An alternative way to characterize sub-exponential random variables is by bounding the polynomial moments of XX, instead of its moment generating function. Concretely, if XX is a random variable with mean μ\mu and variance σ2\sigma^2, then we can show that, if the following condition, known as Bernstein’s condition with parameter bb, holds, then XX is a sub-exponential random variable:

E[(Xμ)k]12k!σ2bk2for k=3,4, \Big\lvert \mathbb{E}\big[ (X - \mu)^k \big] \Big\rvert \leq \frac{1}{2} k! \sigma^2 b^{k-2} \quad \text{for } k = 3, 4, \ldots

Proof. Let’s start from the definition of a sub-exponential random variable and expand the left-hand-side of (2):

E[eλ(Xμ)]=k=0λkE[(Xμ)k]k!=1+λ2σ22+k=3λkE[(Xμ)k]k!. \mathbb{E}[e^{\lambda (X - \mu)}] = \sum_{k=0}^\infty \frac{\lambda^k \mathbb{E}[(X - \mu)^k]}{k!} = 1 + \frac{\lambda^2 \sigma^2}{2} + \sum_{k=3}^\infty \frac{\lambda^{k} \mathbb{E}[(X - \mu)^{k}]}{k!}.

By Bernstein’s condition:

E[eλ(Xμ)]1+λ2σ22+k=312λkσ2bk2=1+λ2σ22+λ2σ22k=3(λb)k2, \mathbb{E}[e^{\lambda (X - \mu)}] \leq 1 + \frac{\lambda^2\sigma^2}{2} + \sum_{k=3}^\infty \frac{1}{2} \lvert \lambda \rvert^k \sigma^2 b^{k-2} = 1 + \frac{\lambda^2\sigma^2}{2} + \frac{\lambda^2\sigma^2}{2} \sum_{k=3}^\infty (\lvert \lambda \rvert b)^{k-2},

which, when λb\lvert \lambda \rvert b is smaller than 11, is equal to:

1+λ2σ22+λ2σ22(11λb1)exp(λ2σ22(1λb))eλ2(2σ)2/2, 1 + \frac{\lambda^2\sigma^2}{2} + \frac{\lambda^2\sigma^2}{2} \Big(\frac{1}{1-\lvert \lambda \rvert b} -1 \Big) \leq \exp\Big( \frac{\lambda^2 \sigma^2}{2(1 - \lvert \lambda \rvert b)} \Big) \leq e^{\lambda^2 (\sqrt{2}\sigma)^2 /2},

when λ<1/2b\lvert \lambda \rvert < 1/2b. That implies that XX is sub-exponential with parameters (2σ,2b)(\sqrt{2}\sigma, 2b).

This is a neat result! For example, it is straightforward to see why Bernstein’s condition holds for bounded random variables with the property that Xμb\lvert X - \mu \rvert \leq b. We proved previously that such random variables are sub-Gaussian. It turns out, they are sub-exponential too!

Concentration Inequalities

Why is this notion of sub-exponentialness useful anyway? Well, for one thing, the family of sub-exponential random variables includes sub-Gaussian random variables. But more importantly, limiting our focus to a neighborhood around 00 enables us to derive more expressive bounds on the tail distribution of a sub-exponential XX. To see why, let’s first derive the basic tail bound and then discuss a more elaborate bound that contrasts the concentration of sub-Gaussian and sub-exponential variables.

Sub-Exponential Tail Bound

For a sub-exponential random variable XX with parameters (ν,b)(\nu, b), the following holds:

P[Xμ+t]{et2/2ν2if 0tν2b,et/2bif t>ν2b.(3) \mathbb{P}[X \geq \mu + t] \leq \begin{cases} e^{-t^2/2\nu^2} & \text{if } 0 \leq t \leq \frac{\nu^2}{b},\\ e^{ -t/2b } & \text{if } t > \frac{\nu^2}{b} \end{cases}. \tag{3}

Proof. For valid values of λ\lambda (i.e., λ1/b\lvert \lambda \rvert \leq 1/b), we have, by definition, that:

P[Xμt]infλ[0,1/b]E[eλ(Xμ)]eλt=infλ[0,1/b]eλ2ν2/2λtf(λ,t). \mathbb{P}[X - \mu \geq t] \leq \inf_{\lvert \lambda \rvert \in [0, 1/b]} \frac{\mathbb{E}[e^{\lambda (X - \mu)}]}{e^{\lambda t}} = \inf_{\lvert \lambda \rvert \in [0, 1/b]} \underbrace{e^{\lambda^2 \nu^2/2 - \lambda t}}_{f(\lambda, t)}.

When tν2/bt \leq \nu^2/b, we can solve the unconstrained minimization problem and obtain:

λf(λ,t)=0ν2λt=0λ=tν2. \frac{\partial}{\partial \lambda} f(\lambda, t) = 0 \Rightarrow \nu^2 \lambda - t = 0 \Rightarrow \lambda = \frac{t}{\nu^2}.

Substituting this value of λ\lambda into f(λ,t)f(\lambda, t) gives the desired result, as in the case of a sub-Gaussian random variable.

On the other hand, when t>ν2/bt > \nu^2/b, then f(λ,t)f(\lambda, t) reaches its minimum at the boundary point when λ=1/b\lambda = 1/b. That’s because f(,t)f(\cdot, t) is monotonically decreasing in its first argument. Evaluating f(1/b,t)f(1/b, t) then leads to:

f(1b,t)=exp(ν22b2tb)exp(t2btb)=exp(t2b). f(\frac{1}{b}, t) = \exp\Big( \frac{\nu^2}{2b^2} - \frac{t}{b} \Big) \leq \exp\Big( -\frac{t}{2b} - \frac{t}{b} \Big) = \exp\Big(-\frac{t}{2b} \Big).

Bernstein Tail Bound

We can do something even more interesting if we consider Bernstein’s condition! If we stare at the proof of the statement above that Bernstein’s condition is sufficient to show that a random variable is sub-exponential, we notice an interesting result hiding in there. In particular, we derived, in that proof, the following bound when λb<1\lvert \lambda \rvert b < 1:

E[eλ(Xμ)]eλ2σ2/2(1λb). \mathbb{E}[e^{\lambda (X - \mu)}] \leq e^{\lambda^2\sigma^2/2(1 - \lvert \lambda \rvert b)}.

Setting λ=t/(bt+σ2)[0,1b)\lambda = t/(bt + \sigma^2) \in [0, \frac{1}{b}) results in the following concentration inequality:

P[Xμt]et2/2(σ2+bt).(4) \mathbb{P}[X - \mu \geq t] \leq e^{-t^2/2(\sigma^2 + bt)}.\tag{4}

Why is that significant? Consider a bounded random variable XX such that Xμb\lvert X - \mu \rvert \leq b. We argued above that XX is sub-exponential as it meets Bernstein’s condition. The tail bound in (4) then introduces a dependence on, not just bb, as the Hoeffding inequality would (as a consequence of XX being sub-Gaussian), but also on the variance of XX!

Because σ2b2\sigma^2 \leq b^2 for a bounded random variable, and for suitably small values of tt, (4) is never worse than the Hoeffding bound. In particular, when σ2b2\sigma^2 \ll b^2, then (4) offers a much sharper bound on the concentration of XX.

Example Sub-Exponential Variables

We have already discussed the case of bounded random variables. Indeed, such variables are sub-Gaussian and sub-exponential. Let’s now look at a random variable that’s sub-exponential but not sub-Gaussian: the χ2\chi^2 random variable.

Recall that if YN(0,1)Y \sim \mathcal{N}(0, 1) is a Gaussian random variable, then X=Y2X=Y^2 is a χ2\chi^2 random varaible whose mean is 11. Let’s inspect its moment generating function:

E[eλ(X1)]=12πeλ(y21)ey2/2dy=eλ12λ. \mathbb{E}[e^{\lambda (X - 1)}] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{\lambda (y^2 - 1)} e^{-y^2/2} dy = \frac{e^{-\lambda}}{\sqrt{1 - 2\lambda}}.

Obviously, when λ>1/2\lambda > 1/2 the moment generating function does not exist! When λ1/4\lvert \lambda \rvert \leq 1/4, then we can show that the moment generating function is bounded above by e2λ2e^{2\lambda^2}. As such, XX is sub-exponential with parameters (2,4)(2, 4).

As another example, consider the sum of nn iid χ2\chi^2 random variables. Because each variable is itself a sub-exponential with parameters (2,4)(2, 4), we know from before that their sum, too, is sub-exponential with parameters (2n,4)(2\sqrt{n}, 4). As a result, we can derive the following bound:

P[1nXi1t]ent2/8,(5) \mathbb{P}[\frac{1}{n}\sum X_i - 1 \geq t] \leq e^{-nt^2/8}, \tag{5}

for all t(0,1)t \in (0, 1).