CDF confidence bands - DKW inequality
2022-03-08 ยท 2 min read
\[ \renewcommand{\Pr}[1]{\text{Pr}\left[ #1 \right]} \def\eps{\varepsilon} \]Overview #
Let's say I have a well known distribution defined by a CDF $F(x)$. I also sample an empirical CDF $F_n(x)$ from a population. Given some confidence parameter, the DKW Inequality bounds how close a random ECDF $F_n(x)$ will be to the true CDF from which the samples are drawn.
Inequality #
The inequality itself states the probability that the Kolmogorov-Smirnov Statistic violates some bound (the K-S statistic is just $||F_n(x) - F(x)||_{\text{max}}$)
\[ \Pr{\sup_{x} \; \Bigl|F_n(x) - F(x)\Bigr| > \eps} \le 2 e^{-2 n \eps^2} \qquad \text{for every } \eps > 0. \]Note: there is an extended inequality for a multivariate $F$. The bound changes from $2 e^{(\cdots)}$ to $2k \, e^{(\cdots)}$, where $k$ is the vector dimension.
CDF Confidence Bands #
We can use the DKW Inequality to produce a "band" around our true CDF that must contain our empirical CDF with some confidence probability.
With a failure probability $\alpha$, our sampled ECDF will sit outside the interval
\[ \Bigl[ F(x) - \eps,\; F(x) + \eps \Bigr] \qquad \text{where } \eps = \sqrt{\frac{\ln \frac{2}{\alpha}}{2n}} \]Pros #
- allows us to contain the entire CDF with a single band value.
- non-parametric - requires no assumptions about the sample distribution.
Cons #
- provides weak guarantees at the tails, tighter bounds around the median
- pointwise approaches (what?) can provide tighter bounds at each individual value
- non-parametric - provides weaker bounds than others with more assumptions.
Example Bounds #
$n$ | $\alpha$ | $\eps$ |
---|---|---|
10 | 0.1 | 0.38702275602049496 |
10 | 0.01 | 0.5146997846583985 |
10 | 0.001 | 0.6164779987778186 |
100 | 0.1 | 0.12238734153404082 |
100 | 0.01 | 0.16276236307187292 |
100 | 0.001 | 0.19494746035204052 |
1000 | 0.1 | 0.038702275602049495 |
1000 | 0.01 | 0.05146997846583985 |
1000 | 0.001 | 0.06164779987778186 |
10000 | 0.1 | 0.012238734153404082 |
10000 | 0.01 | 0.016276236307187292 |
10000 | 0.001 | 0.019494746035204052 |
20000 | 0.1 | 0.008654091913011426 |
20000 | 0.01 | 0.011509037065006824 |
20000 | 0.001 | 0.013784867119002347 |