# multinomial distribution

2022-03-08 ยท 3 min read

\[ \renewcommand{\Pr}[1]{\text{Pr}\left[ #1 \right]} \def\eps{\varepsilon} \def\E{\mathbb{E}} \]## Example #

You have a k-sided, weighted die. The weight each face $i = 1, \ldots, k$ is $p_i$. You roll the die $n$ times. The distribution of faces rolled will follow a **Multinomial Distribution**.

## Overview #

Property | Value |
---|---|

Parameters | $n > 0$ number of trials, $p_1, \ldots, p_k$ event probabilities, where $\sum_{i} p_i = n$ |

Support | $i \in \set{1, \ldots, k}, ; x_i \in \set{0, \ldots, k}, \text{ with } \sum_i x_i = n$ |

PMF | $\Pr{X = \set{x_1, \ldots, x_k}} = \frac{n!}{x_1! \cdots x_n!} p_1^{x_1} \cdots p_k^{x_k}$ |

Mean | $\E[X_i] = n \cdot p_i$ |

Variance | $Var(X_i) = n p_i (1 - p_i), ; Cov(X_i, X_j) = -n p_i p_j ; (i \ne j)$ |

## Statistical Inference #

### Equivalence Tests #

Ostrovski 2017 - Testing equivalence of multinomial distributions

Resin 2021 - A Simple Algorithm for Exact Multinomial Tests

### Goodness-of-fit Tests #

Ideally, we could use an exact test, where we check if the observed counts $X$ lie in some failure region (according to some criterion) with some confidence probability $p_{significance}$, for example. The criterion will claim the $X$ is not the same as our hypothesized distribution if $X$ lies in the failure region. We choose the size of our failure region based on our initial choice of confidence (greater required confidence implies greater sized failure region).

\[ p_{significance} = \sum_{X \;\in\; \text{region}} \text{multinomial-pmf}(X) \]Unfortunately, the multinomial CDF and PMF have an absurdly huge support for any non-trivial $n$ and $k$, and we need to calculate the CDF by just exhaustively computing the joint probabilities, which makes this approach computationally infeasible, since any non-trivial region is just too big. :'(

Instead, we need derived random variables, called *test statistics*, which are more mathematically convenient or more computationally tractable, and compare their observed values against the failure regions projected into the *test-statistic's distribution*.

G-test - multinomial goodness-of-fit

Wikiwand - Pearson's chi-squared test

Wikiwand - G-test / log-likelihood ratio test

#### Example Implementation #

Using `g_test`

defined in G-test - multinomial goodness-of-fit > Example Implementation.

```
use claim::{debug_assert_ge, debug_assert_le};
use ndarray::ArrayView1;
use statrs::distribution::{ChiSquared, ContinuousCDF};
/// A goodness-of-fit test between a hypothesized multinomial distribution, `p`,
/// and an experimentally observed distribution, `p_hat`, both represented as
/// dense PMFs. `n` is the number of samples taken to construct `p_hat`.
///
/// Returns a p-value, `Pr[G(x) >= p-value | H_0: x ~ p]`.
fn multinomial_test(n: usize, p: ArrayView1<f64>, p_hat: ArrayView1<f64>) -> f64 {
// want to compute the DOF (nnz of p)
let nnz = p.fold(0.0, |nnz, &x| nnz + if x > 0.0 { 1.0 } else { 0.0 });
let dof = nnz - 1.0;
debug_assert_le!(nnz, p.dim() as f64);
debug_assert_ge!(dof, 1.0);
// impossible to draw p_hat from p
if !is_pmf_subset(p_hat, p) {
return 0.0;
}
let g = g_test(n, p, p_hat);
let pvalue = 1.0 - chisq_cdf(dof, g);
println!(
"multinomial_test: n: {n}, |p|: {}, dof: {dof}, g: {g}, p-value: {pvalue}",
p.dim()
);
pvalue
}
```