Skip to content

Review of Probability, Random Variables, and Distributions

Lecture 8

Definition of Variance

  • Def: The variance of X, Var(X) is:
    • If X is discrete

      σ2=Var(X)=E((Xμx)2)=x(xμx)2f(x)
    • If X is continuous

      σ2=Var(X)=E((Xμx)2)=+(xμx)2f(x)dx
    • σ is called the standard deviation

    • σ2=E((Xμx)2)=E(X22μxX+μx2)=E(X2)2μxE(X)+μx2=E(X2)E2(X)

  • Properties
    1. X, Var(X)0, Var(C)=0

      Var(X)=0P(X=C)=1

    2. Var(CX)=C2Var(X)

    3. If X and Y are independent, then

      E(XY)=E(X)E(Y)

      Var(X±Y)=Var(X)+Var(Y)

    4. If X1,X2...Xn are mutually independent, Var(i=1nCiXi+b)=i=1nCi2Var(Xi)

Covariance

  • Def: the covariance of X and Y is

    σXY=Cov(X,Y)=E[(Xμx)(Yμy)]
  • σXY=Cov(X,Y)=E(XY)E(X)E(Y)

  • If X and Y are independent, Cov(X,Y)=0

  • The inverse direction may not be true!

    Var(X±Y)=Var(X)+Var(Y)±2Cov(X,Y)

Correlation coefficient

  • Def: The correlation coefficient of X and Y is

    ρXY=σXYσYσY
  • If X and Y are independent, ρXY=0

  • Properties:

    1. X,Y |ρXY1|

      pf: use Var(YtX)0

    2. |ρXY|=1a0, bs.t. P(Y=aX+b)=1

    3. If |ρXY|=1, we call X and Y are completely linear correlated

    4. If ρXY=0, X and Y are called uncorrelated, means there is not any "linear correlation" between X and Y.

    5. independent uncorrelated

  • |ρXY| denote the strongness of linear correlation between X and Y

  • ρXY>0 means there is positive linear correlation between X and

    If X becomes larger, then Y tends to become stronger

Lecture 9

Bernoulli Distribution

  • 0-1 distribution XB(1,p)

    F(x)={0,x<01p,0x<11,x1
  • E(X)=p, Var(X)=pp2=pq

  • Indicator AS, IA(ω)={1,if ωA0,if ωA

  • It can be used everywhere

Binomial Distribution

  • Def: the number of success X in n Bernoulli trails XB(n,p)
  • If n=1, it becomes Bernoulli distribution
  • pmf: f(x)=P(X=x)=b(x;n,p)=Cnxpxqnx,x=0,1,...,n
  • Binomial: x=0nb(x;n,p)=x=0nCnxpxqnx=(p+q)n=1
  • E(X)=np,Var(X)=npq
    • hint: Xi={1,the ith trail succeeds0,the ith trail fails
    • Xi are mutually independent XiB(1,p), X=i=1nXi

Multinomial Distribution

  • Def: Multinomial experiments repeatedly: independent, k outcomes each time
  • DefL Multinomial distribution: the number of each outcomes in n trails
  • Joint pmf: f(x1,x2,...,xk;p1,p2,...,pk,n)=n!x1!x2!...xk!p1x1p2x2...pkxk
  • Each marginal distribution is binomial

Lecture 10

Hypergeometric Distribution

  • Motivation: Sampling without replacement
  • Def: X the number of success
    1. n is selected from N terms without replacement;
    2. of N terms, k are success and Nk are failures.
XH(N,n,k)
  • pmf:
f(x;N,n,k)=CknCnxNkCnN, max(0,n(Nk))xmin(n,k)
  • Relationship to Binomial

    • Binomial is the limit case for hypergeometric when N approaches infinity
    • When N is larger enough(nN is small): f(x;N,n,k)b(x;n,kN)
  • X is hypergeometric with N, n and k, then

    E(X)=nkN

    Var(X)=NnN1nkN(1kN)

Multivariate Hypergeometric

  • N terms be Lectureified into k kinds, select n randomly, number of each kind

    f(x1,x2,...,xk;a1,a2,...,ak,N,n)=Ca1x1Ca2x2...CakxkCNn
  • Each marginal is hypergeometric!

Geometric Distribution

  • Def: Do Bernoulli experiments until succeed, X the number of trails XG(p)

  • pmf: g(x;p)=qx1p,x=1,2,3...

  • Mean E(X) and variance Var(X)

    E(X)=1p, Var(X)=qp2

Negative Binomial Distribution

  • Def: Do Bernoulli experiments until the k-th succeed, X the number of trails XNB(k,p)

  • pmf:

    b(x;k,p)=Cx1k1qxkpk, x=k,k+1,k+2,...

  • Mean E(X) and variance Var(X)

    E(X)=kp, Var(X)=kpp2

Poisson Distribution

  • Def: number of occurring in a Poisson process

  • Derivation: Poisson theorem

    limnCnx(λn)x(1λn)nx=λxx!eλ

  • pmf:

    p(x;λ)=λxx!eλ, x=0,1,2...

  • Expectation:

    XP(λ), E(X)=λ, Var(X)=λ

  • Relationship to Binomial

    • Poisson distribution is the limit case of binomial when n approaches infinity while np is fixed
    • If n(n50) is large while p(p0.1) is small, XB(n,p)P(np)

Lecture 11

Uniform Distribution

  • Def: X is called uniform distribution on [a,b] if its density satisfy: XU(a,b)

    f(x)={1ba,x[a,b]0,elsewhere
  • cdf and probability

  • Expectations: E(X)=a+b2,Var(X)=(ba)212

Exponential Distribution

  • Def: X is called exponential distribution if

    f(x)={1βexβ,x>00,x0
  • cdf: F(x)={0,x01exβ,x>0

Gamma Distribution

Gamma Function

  • Def: Gamma function

    Γ(α)=0+xα1exdx,α>0
  • Properties:

    Γ(1)=1,Γ(0.5)=πΓ(α+1)=αΓ(α),Γ(n)=(n1)!

  • Def: the Gamma density is as following: XΓ(α,β)

    f(x)={1βαΓ(α)xα1exβ,x>00,x0

  • Exponential is special case of Gamma density Xe(β)=Γ(1,β)

  • Expectations:

    E(X)=αβ,Var(X)=αβ2

    Xe(β),E(X)=β,Var(X)=β2

Normal Distribution

Standard Normal

  • Def: X is called standard normal if density

    φ(x)=12πex22,x(,+)
  • The cdf can be found from tables

    Φ(x)=xφ(t)dt=x12πet22dt

    Φ(0)=0.5,Φ(x)=1Φ(x)

  • Expectations: if X is standard normal

    E(X)=0,Var(X)=1

    XN(0,1)

  • Def: X is normal with parameter μ,σ2

    XN(μ,σ2)XμσN(0,1)
  • The density of N(μ,σ2) is:

    F(x)=P(Xx)=P(Xμσxμσ)=Φ(xμσ)

    f(x)=12πσe(xμ)22σ2,x(,+)

  • Expectations:

    E(X)=μ,Var(X)=σ2

  • pth quantile

    • Def: for p in (0,1), the pth quantile xp of X is P(Xxp)=p
    • Def: for p in (0,1), the critical value cp of X is P(Xxp)=p
    • xp=c1p

Lecture 12

Central Limit Theorem

  • Th (Lindeberge-Levy): if {Xi} is a iid sequence with

    E(Xk)=μ,Var(Xk)=σ2Yn=k=1nXknμnσ=1nk=1nXkμσ/n
  • Then

    limn+P(Ynx)=Φ(x)k=1nXkN(nμ,nσ2),1nk=1nXkN(μ,σ2n)

Lecture 13

Estimation Methods

  1. Moment estimate

    • Fundamental basis: {Xi}iid E(Xi)=μ,Var(Xi)=σ2

      X=1ni=1nXiN(μ,σ2n)Xnμ

    • Distribution parameter θ is related to μ

    • Estimation:

      E(x)=μ=g(θ)θ=h(μ)h(X)=θ^
  2. The Method of Maximum Likelihood

  • Suppose the population Xf(x,θ)

    P(X1=x1,X2=x2,...,Xn=xn)=f(x1,θ)f(x2,θ)...f(xn,θ)L(θ)

  • L(θ) is called likelihood function

  • The estimation of mle is chosen as:

    L(θ^)=maxL(θ)
  • Solution of mle for uniform distribution

    1. find the likelihood function for XU(a,b)

      L(a,b)=i=1nf(xi)=(1ba)n

    2. find mle L(a,b)a>0,L(a,b)b<0

      i,a<Xi<bamin{Xi},bmax{Xi}

    3. The likelihood function is strictly increasing with a but strictly decreasing with b, so the mle are:

      a^=min{Xi},b^=max{Xi}

Lecture 14

Unbiasedness

  • Def: if E(θ^)=θ, θ^ is called unbiased
  • Def: b(θ^)=E(θ^)θ is called bias
  • Def: if b(θ^)0,limn+b(θ^)=0, θ^ is asymptotically

Efficiency

  • Def: both θ^1 and θ^2 are biased, θ^1 is more efficient than θ^2 if Var(θ^1)<Var(θ^2)

Mean Squared Error(MSE)

  • Def: the mean squared error is:

    M(θ^)=E[(θ^θ)2]

  • The MSE can be computed as:

    M(θ^)=Var(θ^)+b2(θ^)

Lecture 15

Chi-Squared Distribution

XiN(0,1),X=i=1nXi2χ2(n)
  • Derive of density:

    χ2(n)=Γ(n2,2)

    f(x;n)={12n/2Γ(n/2)xn/21ex/2,x>00,elsewhere

  • Expectations: Xχ2(n)E(x)=n,Var(X)=2n

  • Chi-Squared distributions are addictive:

    Xχ2(n),Yχ2(m),X,Y indepX+Yχ2(n+m)

t-Distribution

XN(0,1),Yχ2(n)T=XY/nt(n)
  • Density:

    f(t)=Γ[(n+1)/2]Γ(n/2)nπ(1+t2n)(n+1)/2,<t<+

  • Even function

  • Limit is standard normal: limnf(t)=φ(t)

F-Distribution

Xχ2(n1),Yχ2(n2)F=X/n1Y/n2F(n1,n2)
  • Property: FF(n1,n2)1/FF(n2,n1)
  • The limit case is Normal Distribution

Sampling Distribution Theorems

  • Suppose the population is Normal: XN(μ,σ2)

  • Th1:

    XN(μ,σ2n)orXμσ/nN(0,1)
  • Th2: X and S2 are independent, and

    (n1)S2σ2=i=1n(XiX)2σ2χ2(n1)
  • Th3:

    XμS/nt(n1)

Lecture 16

CI under Normal Distribution

  • find μ
    • XN(μ,σ2), and σ2 is given
      1. find Xμ
      2. construct Z=Xμσ/nN(0,1)
      3. find P(zα/2<Z<zα/2)=1α
      4. solve zα/2<Z<zα/2Xzα/2σn<μ<X+zα/2
    • XN(μ,σ2), and σ2 is unknown
      1. find Xμ
      2. construct T=XμS/nt(n1)
      3. find P(tα/2<T<tα/2)=1α
      4. solve tα/2<T<tα/2Xtα/2Sn<μ<X+tα/2Sn
  • find σ
    • XN(μ,σ2), and μ is given
      1. construct W=i=1n(Xiμ)2σ2χ2(n)
      2. solve P(χ1α/22<W<χα/22)=1α
    • XN(μ,σ2), and μ is unknown
      1. construct W=n1σ2S2=i=1n(XiX)2σ2χ2(n1)

Sampling Distribution under Two Populations

  • Suppose XN(μ1,σ12), YN(μ2,σ22)

  • X, Y independent, n1, n2 samples from X, Y

  • Th1: var known

    (XY)(μ1μ2)σ12/n+σ22/n2N(0,1)
  • Th2: var unknown but equal

    (XY)(μ1μ2)Sp1/n1+1/n2t(n1+n22)
Sp2=(n11)S12+(n21)S22n1+n22
  • Th3: Sampling theorem for Variance
S12/σ12S22/σ22F(n11,n21)

Sample variance

S2=(XiX)2n1X ?, E(X)=μ, Var(X)=σ2X=1nXiN(μ,σ2n)Var(X)=E(X2)E2(X)Var(X)=E(X2)E2(X)E(X2)=μ2+σ2E(X2)=μ2+σ2nE((XiX)2)=E((Xi2+X22XiX))=E(Xi2+nX22XXi)=E(Xi2+nX22nX2)=E(Xi2)E(nX2)=nE(X2)nE(X2)=n(μ2+σ2)n(μ2+σ2n)=nσ2σ2=(n1)σ2E((XiX)2)=(n1)σ2E((XiX)2)n1=σ2E((XiX)2n1)=σ2S2=(XiX)2n1E(S2)=σ2
One day we will climb the highest mountain, and suvey the smallest point.