# standard error

$\def\E{\mathbb{E}}$

The standard error of a statistic, is the stddev of its finite-sample distribution.

Important: it's read like $stderr(\phi)$ where $\phi$ is the statistic under study.

Example: prob. the most common is the standard error of the mean (SEM), or $\sigma_{\mu_x}$, in which case the statistic $\phi$ is just the sample mean, $\mu_x$.

• The sample mean has its own mean and variance, separate from the sample population mean and variance.
• Imagine we take $n$ samples from the population and record the sample mean. If we repeat this process multiple times, we'll end up with multiple observations of the sample mean. You can see that the sample means will cluster around the true mean, converging to it as $n \rightarrow \infty$.
• Fortunately, we don't actually have to take $n$ separate samples, since there is a mathematical relation b/w an overall sample's statistic and the standard error of the mean.

## Exact Value #

If we know $\sigma$, the standard deviation of the population (which is rarely true), we can exactly compute $\sigma_{\mu_x}$:

Let $x_1, \ldots, x_n$ be i.i.d. samples, with true standard deviation $\sigma$. Then the r.v. $\mu_x = \E[x]$ will have standard deviation:

$\sigma_{\mu_x} = \frac{\sigma}{\sqrt{n}}$

Intuitively, as we decrease $n$ the number of samples, the sample mean $\mu_x$ becomes less accurate.

### Derivation #

Let $\Sigma = \sum_i^n x_i$ be a r.v. for the sum of all i.i.d. samples. The variance $Var(\Sigma)$ is then

$Var(\Sigma) = Var(x_1) + \cdots + Var(x_n) = \sigma^2 + \cdots + \sigma^2 = n \cdot \sigma^2$

Recall that $\sigma$ is the true population stddev. Next, the sample mean in terms of $\Sigma$ is

$\mu_x = \E[x] = \frac{x_1 + \cdots + x_n}{n} = \frac{\Sigma}{n}$

which has variance

$Var(\mu_x) = Var(\frac{\Sigma}{n}) = \frac{n \cdot \sigma^2}{n^2} = \frac{\sigma^2}{n}$

and stddev

$\sigma_{\mu_x} = \sqrt{Var(\mu_x)} = \frac{\sigma}{\sqrt{n}}$

which is our desired value, the standard error of the mean (SEM).

## Estimate #

In the likely event we don't know the true population stddev $\sigma$, we typically use an estimate of $\sigma_{\mu_x}$, which we call $\hat{\sigma}_{\mu_x}$ or also $s_{\mu_x}$.

The typical estimate is just replacing the true pop. stddev $\sigma$ with our observed sample stddev $\sigma_x$.

$\sigma_{\mu_x} \approx \hat{\sigma}_{\mu_x} = \frac{\sigma_x}{\sqrt{n}}$

Note: this is only true iff. the samples are i.i.d. (i.e., uncorrelated). For correlated samples, you gotta use the Markov chain central limit theorem to compute the variance.

## Implications #

As you can see, $\sigma_{\mu_x}$ scales inverse quadratically with the number of samples $n$.

• Roughly, if you take more samples, the dispersion of the sample mean gets tighter around the true mean.
• In other words, as we increase the number of samples, our sample mean will converge to the true mean.
• However, since this dispersion scales inverse quadratically, we need 4x the number of samples in order to 1/2 the dispersion.