CDF confidence bands - DKW inequality

$\renewcommand{\Pr}[1]{\text{Pr}\left[ #1 \right]} \def\eps{\varepsilon}$

Overview #

Let's say I have a well known distribution defined by a CDF $F(x)$. I also sample an empirical CDF $F_n(x)$ from a population. Given some confidence parameter, the DKW Inequality bounds how close a random ECDF $F_n(x)$ will be to the true CDF from which the samples are drawn.

Inequality #

The inequality itself states the probability that the Kolmogorov-Smirnov Statistic violates some bound (the K-S statistic is just $||F_n(x) - F(x)||_{\text{max}}$)

$\Pr{\sup_{x} \; \Bigl|F_n(x) - F(x)\Bigr| > \eps} \le 2 e^{-2 n \eps^2} \qquad \text{for every } \eps > 0.$

Note: there is an extended inequality for a multivariate $F$. The bound changes from $2 e^{(\cdots)}$ to $2k \, e^{(\cdots)}$, where $k$ is the vector dimension.

CDF Confidence Bands #

We can use the DKW Inequality to produce a "band" around our true CDF that must contain our empirical CDF with some confidence probability.

With a failure probability $\alpha$, our sampled ECDF will sit outside the interval

$\Bigl[ F(x) - \eps,\; F(x) + \eps \Bigr] \qquad \text{where } \eps = \sqrt{\frac{\ln \frac{2}{\alpha}}{2n}}$

Pros #

1. allows us to contain the entire CDF with a single band value.
2. non-parametric - requires no assumptions about the sample distribution.

Cons #

1. provides weak guarantees at the tails, tighter bounds around the median
2. pointwise approaches (what?) can provide tighter bounds at each individual value
3. non-parametric - provides weaker bounds than others with more assumptions.

Example Bounds #

$n$$\alpha$$\eps$
100.10.38702275602049496
100.010.5146997846583985
100.0010.6164779987778186
1000.10.12238734153404082
1000.010.16276236307187292
1000.0010.19494746035204052
10000.10.038702275602049495
10000.010.05146997846583985
10000.0010.06164779987778186
100000.10.012238734153404082
100000.010.016276236307187292
100000.0010.019494746035204052
200000.10.008654091913011426
200000.010.011509037065006824
200000.0010.013784867119002347