Review of Probability, Random Variables, and Distributions

Lecture 8

Definition of Variance

Def: The variance of $X, V a r (X)$ is:
- If $X$ is discrete
  $σ^{2} = V a r (X) = E ((X - μ_{x})^{2}) = \sum_{x} (x - μ_{x})^{2} f (x)$
- If $X$ is continuous
  $σ^{2} = V a r (X) = E ((X - μ_{x})^{2}) = \int_{- \infty}^{+ \infty} (x - μ_{x})^{2} f (x) d x$
- $σ$ is called the standard deviation
- $σ^{2} = E ((X - μ_{x})^{2}) = E (X^{2} - 2 μ_{x} X + μ_{x}^{2}) = E (X^{2}) - 2 μ_{x} E (X) + μ_{x}^{2} = E (X^{2}) - E^{2} (X)$
Properties
1. $\forall X, V a r (X) \geq 0, V a r (C) = 0$
  $V a r (X) = 0 \Leftrightarrow P (X = C) = 1$
2. $V a r (C X) = C^{2} V a r (X)$
3. If $X$ and $Y$ are independent, then
  $E (X Y) = E (X) E (Y)$
  $V a r (X \pm Y) = V a r (X) + V a r (Y)$
4. If $X_{1}, X_{2} . . . X_{n}$ are mutually independent, $V a r (\sum_{i = 1}^{n} C_{i} X_{i} + b) = \sum_{i = 1}^{n} C_{i}^{2} V a r (X_{i})$

Covariance

Def: the covariance of $X$ and $Y$ is
$σ_{X Y} = C o v (X, Y) = E [(X - μ_{x}) (Y - μ_{y})]$
$σ_{X Y} = C o v (X, Y) = E (X Y) - E (X) E (Y)$
If $X$ and $Y$ are independent, $C o v (X, Y) = 0$
The inverse direction may not be true!
$V a r (X \pm Y) = V a r (X) + V a r (Y) \pm 2 C o v (X, Y)$

Correlation coefficient

Def: The correlation coefficient of $X$ and $Y$ is
$ρ_{X Y} = \frac{σ_{X Y}}{σ_{Y} σ_{Y}}$
If $X$ and $Y$ are independent, $ρ_{X Y} = 0$
Properties:
1. $\forall X, Y | ρ_{X Y} \leq 1 |$
  pf: use $V a r (Y - t X) \geq 0$
2. $| ρ_{X Y} | = 1 \Leftrightarrow \exists a \neq 0, b s . t . P (Y = a X + b) = 1$
3. If $| ρ_{X Y} | = 1$ , we call $X$ and $Y$ are completely linear correlated
4. If $ρ_{X Y} = 0$ , $X$ and $Y$ are called uncorrelated, means there is not any "linear correlation" between $X$ and $Y$ .
5. independent $⟶$ uncorrelated
$| ρ_{X Y} |$ denote the strongness of linear correlation between $X$ and $Y$
$ρ_{X Y} > 0$ means there is positive linear correlation between $X$ and
If $X$ becomes larger, then $Y$ tends to become stronger

Lecture 9

Bernoulli Distribution

0-1 distribution $X \sim B (1, p)$
$F (x) = {\begin{aligned} 0, & x < 0 \\ 1 - p, & 0 \leq x < 1 \\ 1, & x \geq 1 \end{aligned}$
$E (X) = p, V a r (X) = p - p^{2} = p q$
Indicator $A \subset S, I_{A} (ω) = {\begin{aligned} 1, & i f ω \in A \\ 0, & i f ω \notin A \end{aligned}$
It can be used everywhere

Binomial Distribution

Def: the number of success $X$ in $n$ Bernoulli trails $X \sim B (n, p)$
If $n = 1$ , it becomes Bernoulli distribution
$p m f$ : $f (x) = P (X = x) = b (x; n, p) = C_{n}^{x} p^{x} q^{n - x}, x = 0, 1, . . ., n$
Binomial: $\sum_{x = 0}^{n} b (x; n, p) = \sum_{x = 0}^{n} C_{n}^{x} p^{x} q^{n - x} = (p + q)^{n} = 1$
$E (X) = n p, V a r (X) = n p q$
- hint: $X_{i} = {\begin{aligned} 1, & t h e i - t h t r a i l s u c c e e d s \\ 0, & t h e i - t h t r a i l f a i l s \end{aligned}$
- $X_{i}$ are mutually independent $X_{i} \sim B (1, p), X = \sum_{i = 1}^{n} X_{i}$

Multinomial Distribution

Def: Multinomial experiments repeatedly: independent, $k$ outcomes each time
DefL Multinomial distribution: the number of each outcomes in $n$ trails
Joint $p m f$ : $f (x_{1}, x_{2}, . . ., x_{k}; p_{1}, p_{2}, . . ., p_{k}, n) = \frac{n!}{x_{1}! x_{2}! . . . x_{k}!} p_{1}^{x_{1}} p_{2}^{x_{2}} . . . p_{k}^{x_{k}}$
Each marginal distribution is binomial

Lecture 10

Hypergeometric Distribution

Motivation: Sampling without replacement
Def: $X$ the number of success
1. $n$ is selected from $N$ terms without replacement;
2. of $N$ terms, $k$ are success and $N - k$ are failures.

X \sim H (N, n, k)

$p m f$ :

f (x; N, n, k) = \frac{C_{k}^{n} C_{n - x}^{N - k}}{C_{n}^{N}}, m a x (0, n - (N - k)) \leq x \leq m i n (n, k)

Relationship to Binomial
- Binomial is the limit case for hypergeometric when $N$ approaches infinity
- When $N$ is larger enough( $\frac{n}{N}$ is small): $f (x; N, n, k) \approx b (x; n, \frac{k}{N})$
$X$ is hypergeometric with $N, n a n d k$ , then
$E (X) = n \frac{k}{N}$
$V a r (X) = \frac{N - n}{N - 1} n \frac{k}{N} (1 - \frac{k}{N})$

Multivariate Hypergeometric

N terms be Lectureified into k kinds, select n randomly, number of each kind
$f (x_{1}, x_{2}, . . ., x_{k}; a_{1}, a_{2}, . . ., a_{k}, N, n) = \frac{C_{a_{1}}^{x_{1}} C_{a_{2}}^{x_{2}} . . . C_{a_{k}}^{x_{k}}}{C_{N}^{n}}$
Each marginal is hypergeometric!

Geometric Distribution

Def: Do Bernoulli experiments until succeed, $X$ the number of trails $X \sim G (p)$
pmf: $g (x; p) = q^{x - 1} p, x = 1, 2, 3. . .$
Mean $E (X)$ and variance $V a r (X)$
$E (X) = \frac{1}{p}, V a r (X) = \frac{q}{p^{2}}$

Negative Binomial Distribution

Def: Do Bernoulli experiments until the k-th succeed, $X$ the number of trails $X \sim N B (k, p)$
pmf:
$b^{*} (x; k, p) = C_{x - 1}^{k - 1} q^{x - k} p^{k}, x = k, k + 1, k + 2, . . .$
Mean $E (X)$ and variance $V a r (X)$
$E (X) = \frac{k}{p}, V a r (X) = \frac{k p}{p^{2}}$

Poisson Distribution

Def: number of occurring in a Poisson process
Derivation: Poisson theorem
$lim_{n \to \infty} C_{n}^{x} (\frac{λ}{n})^{x} (1 - \frac{λ}{n})^{n - x} = \frac{λ^{x}}{x!} e^{- λ}$
pmf:
$p (x; λ) = \frac{λ^{x}}{x!} e^{- λ}, x = 0, 1, 2. . .$
Expectation:
$X \sim P (λ), E (X) = λ, V a r (X) = λ$
Relationship to Binomial
- Poisson distribution is the limit case of binomial when $n$ approaches infinity while $n p$ is fixed
- If $n (n \geq 50)$ is large while $p (p \leq 0.1)$ is small, $X \sim B (n, p) \approx P (n p)$

Lecture 11

Uniform Distribution

Def: $X$ is called uniform distribution on $[a, b]$ if its density satisfy: $X \sim U (a, b)$
$f (x) = {\begin{aligned} \frac{1}{b - a}, & x \in [a, b] \\ 0, & e l s e w h e r e \end{aligned}$
cdf and probability
Expectations: $E (X) = \frac{a + b}{2}, V a r (X) = \frac{(b - a)^{2}}{12}$

Exponential Distribution

Def: $X$ is called exponential distribution if
$f (x) = {\begin{aligned} \frac{1}{β} e^{- \frac{x}{β}}, & x > 0 \\ 0, & x \leq 0 \end{aligned}$
cdf: $F (x) = {\begin{aligned} 0, & x \leq 0 \\ 1 - e^{- \frac{x}{β}}, & x > 0 \end{aligned}$

Gamma Distribution

Gamma Function

Def: Gamma function
$Γ (α) = \int_{0}^{+ \infty} x^{α - 1} e^{- x} d x, α > 0$
Properties:
$Γ (1) = 1, Γ (0.5) = \sqrt{π} Γ (α + 1) = α Γ (α), Γ (n) = (n - 1)!$
Def: the Gamma density is as following: $X \sim Γ (α, β)$
$f (x) = {\begin{aligned} \frac{1}{β^{α} Γ (α)} x^{α - 1} e^{- \frac{x}{β}}, & x > 0 \\ 0, & x \leq 0 \end{aligned}$
Exponential is special case of Gamma density $X \sim e (β) = Γ (1, β)$
Expectations:
$E (X) = α β, V a r (X) = α β^{2}$
$X \sim e (β), E (X) = β, V a r (X) = β^{2}$

Normal Distribution

Standard Normal

Def: $X$ is called standard normal if density
$φ (x) = \frac{1}{\sqrt{2 π}} e^{- \frac{x^{2}}{2}}, x \in (- \infty, + \infty)$
The cdf can be found from tables
$Φ (x) = \int_{- \infty}^{x} φ (t) d t = \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} e^{- \frac{t^{2}}{2}} d t$
$Φ (0) = 0.5, Φ (- x) = 1 - Φ (x)$
Expectations: if $X$ is standard normal
$E (X) = 0, V a r (X) = 1$
$X \sim N (0, 1)$
Def: $X$ is normal with parameter $μ, σ^{2}$
$X \sim N (μ, σ^{2}) \Leftrightarrow \frac{X - μ}{σ} \sim N (0, 1)$
The density of $N (μ, σ^{2})$ is:
$F (x) = P (X \leq x) = P (\frac{X - μ}{σ} \leq \frac{x - μ}{σ}) = Φ (\frac{x - μ}{σ})$
$f (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}, x \in (- \infty, + \infty)$
Expectations:
$E (X) = μ, V a r (X) = σ^{2}$
pth quantile
- Def: for p in $(0, 1)$ , the pth quantile $x_{p}$ of $X$ is $P (X \leq x_{p}) = p$
- Def: for p in $(0, 1)$ , the critical value $c_{p}$ of $X$ is $P (X \geq x_{p}) = p$
- $x_{p} = c_{1 - p}$

Lecture 12

Central Limit Theorem

Th (Lindeberge-Levy): if ${X_{i}}$ is a iid sequence with
$E (X_{k}) = μ, V a r (X_{k}) = σ^{2} Y_{n} = \frac{\sum_{k = 1}^{n} X_{k} - n μ}{\sqrt{n} σ} = \frac{\frac{1}{n} \sum_{k = 1}^{n} X_{k} - μ}{σ / \sqrt{n}}$
Then
$lim_{n \to + \infty} P (Y_{n} \leq x) = Φ (x) \sum_{k = 1}^{n} X_{k} \sim N (n μ, n σ^{2}), \frac{1}{n} \sum_{k = 1}^{n} X_{k} \sim N (μ, \frac{σ^{2}}{n})$

Lecture 13

Estimation Methods

Moment estimate
- Fundamental basis: ${X_{i}}$ iid $E (X_{i}) = μ, V a r (X_{i}) = σ^{2}$
  $\overset{―}{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \sim N (μ, \frac{σ^{2}}{n}) \Rightarrow \overset{―}{X} \overset{n \to \infty}{\to} μ$
- Distribution parameter $θ$ is related to $μ$
- Estimation:
  $E (x) = μ = g (θ) ⟶ θ = h (μ) \approx h (\overset{―}{X}) = \hat{θ}$
The Method of Maximum Likelihood

Suppose the population $X \sim f (x, θ)$
$P (X_{1} = x_{1}, X_{2} = x_{2}, . . ., X_{n} = x_{n}) = f (x_{1}, θ) f (x_{2}, θ) . . . f (x_{n}, θ) \equiv L (θ)$
$L (θ)$ is called likelihood function
The estimation of mle is chosen as:
$L (\hat{θ}) = m a x L (θ)$
Solution of mle for uniform distribution
1. find the likelihood function for $X \sim U (a, b)$
  $L (a, b) = \prod_{i = 1}^{n} f (x_{i}) = (\frac{1}{b - a})^{n}$
2. find mle $\frac{\partial L (a, b)}{\partial a} > 0, \frac{\partial L (a, b)}{\partial b} < 0$
  $\forall i, a < X_{i} < b \Rightarrow a \leq m i n {X_{i}}, b \geq m a x {X_{i}}$
3. The likelihood function is strictly increasing with $a$ but strictly decreasing with $b$ , so the mle are:
  $\hat{a} = m i n {X_{i}}, \hat{b} = m a x {X i}$

Lecture 14

Unbiasedness

Def: if $E (\hat{θ}) = θ$ , $\hat{θ}$ is called unbiased
Def: $b (\hat{θ}) = E (\hat{θ}) - θ$ is called bias
Def: if $b (\hat{θ}) \neq 0, lim_{n \to + \infty} b (\hat{θ}) = 0$ , $\hat{θ}$ is asymptotically

Efficiency

Def: both ${\hat{θ}}_{1}$ and ${\hat{θ}}_{2}$ are biased, ${\hat{θ}}_{1}$ is more efficient than ${\hat{θ}}_{2}$ if $V a r ({\hat{θ}}_{1}) < V a r ({\hat{θ}}_{2})$

Mean Squared Error(MSE)

Def: the mean squared error is:
$M (\hat{θ}) = E [(\hat{θ} - θ)^{2}]$
The MSE can be computed as:
$M (\hat{θ}) = V a r (\hat{θ}) + b^{2} (\hat{θ})$

Lecture 15

Chi-Squared Distribution

X_{i} \sim N (0, 1), X = \sum_{i = 1}^{n} X_{i}^{2} \sim χ^{2} (n)

Derive of density:
$χ^{2} (n) = Γ (\frac{n}{2}, 2)$
$f (x; n) = {\begin{aligned} \frac{1}{2^{n / 2} Γ (n / 2)} x^{n / 2 - 1} e^{- x / 2}, & x > 0 \\ 0, & e l s e w h e r e \end{aligned}$
Expectations: $X \sim χ^{2} (n) \Rightarrow E (x) = n, V a r (X) = 2 n$
Chi-Squared distributions are addictive:
$X \sim χ^{2} (n), Y \sim χ^{2} (m), X, Y i n d e p \Rightarrow X + Y \sim χ^{2} (n + m)$

t-Distribution

X \sim N (0, 1), Y \sim χ^{2} (n) \Rightarrow T = \frac{X}{\sqrt{Y / n}} \sim t (n)

Density:
$f (t) = \frac{Γ [(n + 1) / 2]}{Γ (n / 2) \sqrt{n π}} (1 + \frac{t^{2}}{n})^{- (n + 1) / 2}, - \infty < t < + \infty$
Even function
Limit is standard normal: $lim_{n \to \infty} f (t) = φ (t)$

F-Distribution

X \sim χ^{2} (n_{1}), Y \sim χ^{2} (n_{2}) \Rightarrow F = \frac{X / n_{1}}{Y / n_{2}} \sim F (n_{1}, n_{2})

Property: $F \sim F (n_{1}, n_{2}) \Rightarrow 1 / F \sim F (n_{2}, n_{1})$
The limit case is Normal Distribution

Sampling Distribution Theorems

Suppose the population is Normal: $X \sim N (μ, σ^{2})$
Th1:
$\overset{―}{X} \sim N (μ, \frac{σ^{2}}{n}) o r \frac{\overset{―}{X} - μ}{σ / \sqrt{n}} \sim N (0, 1)$
Th2: $\overset{―}{X}$ and $S^{2}$ are independent, and
$\frac{(n - 1) S^{2}}{σ^{2}} = \sum_{i = 1}^{n} \frac{(X_{i} - \overset{―}{X})^{2}}{σ^{2}} \sim χ^{2} (n - 1)$
Th3:
$\frac{\overset{―}{X} - μ}{S / \sqrt{n}} \sim t (n - 1)$

Lecture 16

CI under Normal Distribution

find $μ$
- $X \sim N (μ, σ^{2})$ , and $σ^{2}$ is given
  1. find $\overset{―}{X} \approx μ$
  2. construct $Z = \frac{\overset{―}{X} - μ}{σ / \sqrt{n}} \sim N (0, 1)$
  3. find $P (- z_{α / 2} < Z < z_{α / 2}) = 1 - α$
  4. solve $- z_{α / 2} < Z < z_{α / 2} \Leftrightarrow \overset{―}{X} - z_{α / 2} \frac{σ}{\sqrt{n}} < μ < X + z_{α / 2}$
- $X \sim N (μ, σ^{2})$ , and $σ^{2}$ is unknown
  1. find $\overset{―}{X} \approx μ$
  2. construct $T = \frac{\overset{―}{X} - μ}{S / \sqrt{n}} \sim t (n - 1)$
  3. find $P (- t_{α / 2} < T < t_{α / 2}) = 1 - α$
  4. solve $- t_{α / 2} < T < t_{α / 2} \Leftrightarrow \overset{―}{X} - t_{α / 2} \frac{S}{\sqrt{n}} < μ < \overset{―}{X} + t_{α / 2} \frac{S}{\sqrt{n}}$
find $σ$
- $X \sim N (μ, σ^{2})$ , and $μ$ is given
  1. construct $W = \sum_{i = 1}^{n} \frac{(X_{i} - μ)^{2}}{σ^{2}} \sim χ^{2} (n)$
  2. solve $P (χ_{1 - α / 2}^{2} < W < χ_{α / 2}^{2}) = 1 - α$
- $X \sim N (μ, σ^{2})$ , and $μ$ is unknown
  1. construct $W = \frac{n - 1}{σ^{2}} S^{2} = \sum_{i = 1}^{n} \frac{(X_{i} - \overset{―}{X})^{2}}{σ^{2}} \sim χ^{2} (n - 1)$

Sampling Distribution under Two Populations

Suppose $X \sim N (μ_{1}, σ_{1}^{2}), Y \sim N (μ_{2}, σ_{2}^{2})$
$X$ , $Y$ independent, $n_{1},$ $n_{2}$ samples from $X, Y$
Th1: var known
$\frac{(\overset{―}{X} - \overset{―}{Y}) - (μ_{1} - μ_{2})}{\sqrt{σ_{1}^{2} / n + σ_{2}^{2} / n_{2}}} \sim N (0, 1)$
Th2: var unknown but equal
$\frac{(\overset{―}{X} - \overset{―}{Y}) - (μ_{1} - μ_{2})}{S_{p} \sqrt{1 / n_{1} + 1 / n_{2}}} \sim t (n_{1} + n_{2} - 2)$

S_{p}^{2} = \frac{(n_{1} - 1) S_{1}^{2} + (n_{2} - 1) S_{2}^{2}}{n_{1} + n_{2} - 2}

Th3: Sampling theorem for Variance

\frac{S_{1}^{2} / σ_{1}^{2}}{S_{2}^{2} / σ_{2}^{2}} \sim F (n_{1} - 1, n_{2} - 1)

Sample variance

S^{2} = \frac{\sum (X_{i} - \overset{―}{X})^{2}}{n - 1}

X \sim ?, E (X) = μ, V a r (X) = σ^{2} \overset{―}{X} = \frac{1}{n} \sum X_{i} \sim N (μ, \frac{σ^{2}}{n})

V a r (X) = E (X^{2}) - E^{2} (X) V a r (\overset{―}{X}) = E ({\overset{―}{X}}^{2}) - E^{2} (\overset{―}{X})

E (X^{2}) = μ^{2} + σ^{2} E ({\overset{―}{X}}^{2}) = μ^{2} + \frac{σ^{2}}{n}

\begin{aligned} E (\sum (X_{i} - \overset{―}{X})^{2}) = & E (\sum (X_{i}^{2} + {\overset{―}{X}}^{2} - 2 X_{i} \overset{―}{X})) \\ = & E (\sum X_{i}^{2} + n {\overset{―}{X}}^{2} - 2 \overset{―}{X} \sum X_{i}) \\ = & E (\sum X_{i}^{2} + n {\overset{―}{X}}^{2} - 2 n {\overset{―}{X}}^{2}) \\ = & \sum E (X_{i}^{2}) - E (n {\overset{―}{X}}^{2}) \\ = & n E (X^{2}) - n E ({\overset{―}{X}}^{2}) \\ = & n (μ^{2} + σ^{2}) - n (μ^{2} + \frac{σ^{2}}{n}) \\ = & n σ^{2} - σ^{2} = (n - 1) σ^{2} \end{aligned}

\begin{aligned} \Rightarrow E (\sum (X_{i} - \overset{―}{X})^{2}) & = (n - 1) σ^{2} \\ \frac{E (\sum (X_{i} - \overset{―}{X})^{2})}{n - 1} & = σ^{2} \\ E (\frac{\sum (X_{i} - \overset{―}{X})^{2}}{n - 1}) & = σ^{2} \end{aligned}

S^{2} = \frac{\sum (X_{i} - \overset{―}{X})^{2}}{n - 1} \Rightarrow E (S^{2}) = σ^{2}

Review of Probability, Random Variables, and Distributions ​

Lecture 8 ​

Definition of Variance ​

Covariance ​

Correlation coefficient ​

Lecture 9 ​

Bernoulli Distribution ​

Binomial Distribution ​

Multinomial Distribution ​

Lecture 10 ​

Hypergeometric Distribution ​

Multivariate Hypergeometric ​

Geometric Distribution ​

Negative Binomial Distribution ​

Poisson Distribution ​

Lecture 11 ​

Uniform Distribution ​

Exponential Distribution ​

Gamma Distribution ​

Gamma Function ​

Normal Distribution ​

Standard Normal ​

Lecture 12 ​

Central Limit Theorem ​

Lecture 13 ​

Estimation Methods ​

Lecture 14 ​

Unbiasedness ​

Efficiency ​

Mean Squared Error(MSE) ​

Lecture 15 ​

Chi-Squared Distribution ​

t-Distribution ​

F-Distribution ​

Sampling Distribution Theorems ​

Lecture 16 ​

CI under Normal Distribution ​

Sampling Distribution under Two Populations ​

Sample variance ​

Review of Probability, Random Variables, and Distributions

Lecture 8

Definition of Variance

Covariance

Correlation coefficient

Lecture 9

Bernoulli Distribution

Binomial Distribution

Multinomial Distribution

Lecture 10

Hypergeometric Distribution

Multivariate Hypergeometric

Geometric Distribution

Negative Binomial Distribution

Poisson Distribution

Lecture 11

Uniform Distribution

Exponential Distribution

Gamma Distribution

Gamma Function

Normal Distribution

Standard Normal

Lecture 12

Central Limit Theorem

Lecture 13

Estimation Methods

Lecture 14

Unbiasedness

Efficiency

Mean Squared Error(MSE)

Lecture 15

Chi-Squared Distribution

t-Distribution

F-Distribution

Sampling Distribution Theorems

Lecture 16

CI under Normal Distribution

Sampling Distribution under Two Populations

Sample variance