Skip to main content

Discrete Distributions 1

1. Uniform Distribution

1.1 Definition

A discrete random variable XX is said to follow the uniform distribution D(a,b)\mathcal{D}(a,b) if:

k{a,,b},P(X=k)=1ba+1\forall k\in\{a,\dots,b\},\quad \mathcal{P}(X=k)=\frac{1}{b-a+1}

1.2 Significance

1.3 Moments

1.3.1 Non central Moments

nN,E[Xn]=1ba+1k=abkn\forall n\in\mathbb{N},\quad \mathbb{E}[X^n]=\frac{1}{b-a+1}\sum_{k=a}^bk^n

In particular, the expected value E[X]\mathbb{E}[X] is:

E[X]=1ba+1k=abk=(ba+1)(a+b)2(ba+1)=a+b2\boxed{\mathbb{E}[X]=\frac{1}{b-a+1}\sum_{k=a}^bk=\frac{(b-a+1)(a+b)}{2(b-a+1)}=\frac{a+b}{2}}

1.3.2 Central Moments

nN,E[(XE[X])n]=1ba+1k=ab(ka+b2)n\forall n\in\mathbb{N},\quad \mathbb{E}\left[(X-\mathbb{E}[X])^n\right]=\frac{1}{b-a+1}\sum_{k=a}^b\left(k-\frac{a+b}{2}\right)^n

To get the exact expression of the variance, we will start by calculating the following quantity

k=ab(k+1)3=k=abk3+3k2+3k+1    k=ab3k2=(b+1)3a3k=ab(3k1)3k=abk2=(b+1)3a332(a+b)(ba+1)(ba+1)    6k=abk2=(ba+1)(2a2+2(1+b)a+2(b+1)23(a+b)2)=(ba+1)(2a2+2a+2ab+2b2+4b+23a3b2)=(ba+1)(2a2a+2ab+2b2+b)    k=abk2=16(ba+1)(2a2a+2ab+2b2+b)\begin{align*} \sum_{k=a}^b(k+1)^3&= \sum_{k=a}^b k^3+3k^2+3k+1\\ \implies \sum_{k=a}^b3k^2&=(b+1)^3-a^3-\sum_{k=a}^b(3k-1)\\ 3\sum_{k=a}^bk^2&=(b+1)^3-a^3-\frac{3}{2}(a+b)(b-a+1)-(b-a+1)\\ \implies 6 \sum_{k=a}^bk^2&=(b-a+1)\left(2a^2+2(1+b)a+2(b+1)^2-3(a+b)-2\right)\\ &=(b-a+1)(2a^2+2a+2ab+2b^2+4b+2-3a-3b-2)\\ &=(b-a+1)(2a^2-a+2ab+2b^2+b)\\ \implies \sum_{k=a}^bk^2&=\frac{1}{6}(b-a+1)(2a^2-a+2ab+2b^2+b)\\ \end{align*}

From that, we can directly calculate the variance V[X]\mathbb{V}[X] as follow:

V[X]=E[X2]E[X]2=1ba+1k=abk2(a+b)24=2a2a+2ab+2b2+b6(a+b)24=4a22a+4ab+4b2+2b3a26ab3b212=a22a+b2+2b2ab12=(ba)2+2(ba)12=(ba)2+2(ba)+1112=(ba+1)2112\begin{align*} \mathbb{V}[X]&=\mathbb{E}[X^2]-\mathbb{E}[X]^2\\ &=\frac{1}{b-a+1}\sum_{k=a}^bk^2-\frac{(a+b)^2}{4}\\ &=\frac{2a^2-a+2ab+2b^2+b}{6}-\frac{(a+b)^2}{4}\\ &=\frac{4a^2-2a+4ab+4b^2+2b-3a^2-6ab-3b^2}{12}\\ &=\frac{a^2-2a+b^2+2b-2ab}{12} \\ &=\frac{(b-a)^2+2(b-a)}{12}\\ &=\frac{(b-a)^2+2(b-a)+1-1}{12}\\ &=\frac{(b-a+1)^2-1}{12 } \end{align*}

2. Bernoulli Distribution

2.1 Definition

A discrete random variable XX is said to follow the Bernoulli distribution B(p)\mathcal{B}(p) if:

{P(X=1)=pP(X=0)=1p\begin{cases} \mathcal{P}(X=1)&=p \\ \mathcal{P}(X=0)&=1-p \end{cases}

2.2 Significance

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 11 with probability pp and the value 00 with probability q=1pq=1-p.

Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question.

Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/yes/true/one with probability pp and failure/no/false/zero with probability qq.

It can be used to represent a (possibly biased) coin toss where 11 and 00 would represent "heads" and "tails", respectively, and pp would be the probability of the coin landing on heads (or vice versa where 11 would represent tails and pp would be the probability of tails).

In particular, unfair coins would have p12p\neq \frac{1}{2}

2.3 Moments

2.3.1 Idempotence

The Bernoulli distribution is idempotent:

nN,Xn=X\forall n\in\mathbb{N}^*,\quad X^n=X

2.3.2 Non-central moments

nN,E[Xn]=E[X]=P(X=1)=p\begin{align*} \forall n \in\mathbb{N}^*,\quad \mathbb{E}[X^n]&=\mathbb{E}[X]\\ &=\mathcal{P}(X=1)\\ &=p \end{align*}

2.3.3 Central Moments

nN,E[(XE[X])n]=E[X]=(1p)nP(X=1)+(p)nP(x=0)=(p)n(1p)+p(1p)n=p(1p)((1p)n1(p)n1)\begin{align*} \forall n \in\mathbb{N}^*,\quad \mathbb{E}\left[(X-\mathbb{E}[X])^n\right]&=\mathbb{E}[X]\\ &=(1-p)^n\mathcal{P}(X=1)+(-p)^n\mathcal{P}(x=0)\\ &=(-p)^n(1-p)+p(1-p)^n \\ &=p(1-p)\left((1-p)^{n-1}-(-p)^{n-1}\right) \end{align*}

In particular, the variance V[X]\mathbb{V}[X] is equal to:

V[X]=p(1p)\boxed{\mathbb{V}[X]=p(1-p)}

2.4 Product of Bernoulli distributions

  • Let nNn\in\mathbb{N}

  • Let p1,,pn[0,1]p_1,\dots,p_n\in[0,1]

  • Let X1B(p1),,XnB(pn)X_1\sim \mathcal{B}(p_1),\dots,X_n\sim \mathcal{B}(p_n)

The random variable P=i=1nXiP=\prod_{i=1}^nX_i follows a Bernoulli distribution B(p)\mathcal{B}(p) with:

p=P(i=1nXi=1)p=\mathcal{P}\left(\bigwedge_{i=1}^n X_i=1\right)

If the random variables are independent, then:

p=i=1nXip=\prod_{i=1}^n X_i

2.4.1 Example 1

  • Let X1B(0.5)X_1\sim \mathcal{B}(0.5)
  • Let X2B(0.7)X_2\sim \mathcal{B}(0.7)
  • Let X3B(0.3)X_3 \sim \mathcal{B}(0.3)
  • We will assume that X1,X2,X3X_1,X_2,X_3 are independent
  • Let P=X1X2X3P=X_1X_2X_3

The probability distribution function of PP is:

P(P=k)={0.105k=10.895k=0\mathcal{P}(P=k)=\begin{cases} 0.105 & k=1 \\ 0.895 & k=0 \end{cases}

2.4.2 Example 2

  • Let X1B(0.5)X_1\sim \mathcal{B}(0.5)
  • Let X2B(0.7)X_2\sim \mathcal{B}(0.7)
  • Let X3X_3 be the random variable defined to be 11 if X2=X1X_2=X_1 and 00 otherwise
  • We will assume that X1,X2,X3X_1,X_2,X_3 are independent
  • Let P=X1X2X3P=X_1X_2X_3
P(X1=1X2=1X3=1)=P(X1=1X2=1X1=X2)=P(X1=1X2=1)=0.35\mathcal{P}(X_1 = 1 \wedge X_2=1 \wedge X_3=1) =\mathcal{P}(X_1 = 1 \wedge X_2=1 \wedge X_1=X_2)=\mathcal{P}(X_1 = 1 \wedge X_2=1)=0.35

So PB(0.35)P\sim \mathbb{B}(0.35)

2.5 Binary Function on Bernoulli distributions

  • Let nNn\in\mathbb{N}

  • Let p1,,pn[0,1]p_1,\dots,p_n\in[0,1]

  • Let X1B(p1),,XnB(pn)X_1\sim \mathcal{B}(p_1),\dots,X_n\sim \mathcal{B}(p_n)

  • Let F:{0,1}n{0,1}F:\{0,1\}^n\rightarrow \{0,1\} be a binary function

Then the random variable Y=F(X1,,Xn)Y=F(X_1,\dots,X_n) follows a Bernoulli distribution B(p)\mathcal{B}(p) with:

p=P((X1,,Xn)F1(1))p=\mathcal{P}\left((X_1,\dots,X_n)\in F^{-1}(1)\right)

If the random variables are independent, then:

p=UF1(1)i=1nP(Xi=Ui)p=\sum_{U\in F^{-1}(1)}\prod_{i=1}^n\mathcal{P}(X_i=U_i)

2.5.1 Example 1

  • Let X1B(p1=0.8)X_1\sim \mathcal{B}(p_1=0.8)

  • Let X2B(p2=0.6)X_2 \sim \mathcal{B}(p_2=0.6)

  • Let X3B(p3=0.5)X_3 \sim \mathcal{B}(p_3=0.5)

  • We will assume X1,X2,X3X_1,X_2,X_3 are independent

  • Let FF be a binary function defined by:

    x1x_1x2x_2x3x_3F(x1,x2,x3)F(x_1,x_2,x_3)
    00000011
    00001100
    00110011
    00111111
    11000000
    11001100
    11110000
    11111111
  • Let Y=F(X1,X2,X3)Y=F(X_1,X_2,X_3)

We have:

P(Y=1)=P(X1=0X2=0X3=0)+P(X1=0X2=1X3=0)+P(X1=0X2=1X3=1)+P(X1=1X2=1X3=1)=pˉ1pˉ2pˉ3+pˉ1p2pˉ3+pˉ1p2p3+p1p2p3=pˉ1p3ˉ(pˉ2+p2)+p2p3(p1+pˉ1)=pˉ1p3ˉ+p2p3=0.4P(Y=0)=0.6\begin{align*} \mathcal{P}(Y=1)&= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge X_3=0) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=0) \\ &+\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=1)+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge X_3=1)\\ &= \bar{p}_1\bar{p}_2\bar{p}_3+\bar{p}_1p_2\bar{p}_3+\bar{p}_1p_2p_3+p_1p_2p_3 \\ &= \bar{p}_1\bar{p_3}(\bar{p}_2+p_2)+p_2p_3(p_1+\bar{p}_1)\\ &= \bar{p}_1\bar{p_3}+p_2p_3 \\ &= 0.4\\ \mathcal{P}(Y=0)&=0.6 \end{align*}

So YP(0.4)Y\sim \mathcal{P}(0.4)

Example 2

  • We will define X1,X2X_1,X_2 and FF as the first example
  • Let X3=X1X2X_3=X_1\vee X_2 the random variable equal to 11 if X1=1\orX2=1X_1=1 \or X_2=1
  • Let Y=F(X1,X2,X3)Y=F(X_1,X_2,X_3)
P(Y=1)=P(X1=0X2=0X3=0)+P(X1=0X2=1X3=0)+P(X1=0X2=1X3=1)+P(X1=1X2=1X3=1)=P(X1=0X2=0(X1X2)=0)+P(X1=0X2=1(X1X2)=0)+P(X1=0X2=1(X1X2)=1)+P(X1=1X2=1(X1X2)=1)=P(X1=0X2=0Always)+P(X1=0X2=1Impossible)+P(X1=0X2=1Always)+P(X1=1X2=11=Always)=P(X1=0X2=0)+P(X1=0X2=1)+P(X1=1X2=1)=pˉ1pˉ2+pˉ1p2+p1p2=pˉ1+p1p2=0.68\begin{align*} \mathcal{P}(Y=1)&= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge X_3=0) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=0) \\ &\quad +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=1)+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge X_3=1)\\ &= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge (X_1\vee X_2)=0) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge (X_1\vee X_2)=0) \\ &\quad +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge (X_1\vee X_2)=1)+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge (X_1\vee X_2)=1)\\ &= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge \text{Always}) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge \text{Impossible}) \\ &\quad +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge \text{Always})+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge 1=\text{Always})\\ &=\mathcal{P}(X_1 =0 \wedge X_2=0)+\mathcal{P}(X_1 =0 \wedge X_2=1)+\mathcal{P}(X_1 =1 \wedge X_2=1)\\ &=\bar{p}_1\bar{p}_2+\bar{p}_1p_2+p_1p_2\\ &=\bar{p}_1+p_1p_2\\ &=0.68 \end{align*}

2.6 Conditioning a Bernoulli distribution

  • Let p[0,1]p\in[0,1]
  • Let XX be a Bernoulli distribution, and A\mathcal{A} an event

The random variable YY defined by P(Y=k)=P(X=kA)\mathcal{P}(Y=k)=\mathcal{P}(X=k\mid \mathcal{A}) follows the Bernoulli distribution B(p)\mathcal{B}(p) with:

p=P(X=1A)p=\mathcal{P}(X=1\mid \mathcal{A})

3. Binomial Distribution

3.1 Definition

A random variable XX is said to follow the binomial distribution B(n,p)\mathcal{B}(n,p) with parametes nNn\in\mathbb{N} and p[0,1]p\in[0,1] if:

X1,,XnB(p) i.i.d/X=k=0nXi\exists X_1,\dots,X_n \sim \mathcal{B}(p) \space \text{i.i.d} /\quad X=\sum_{k=0}^nX_i

3.2 Significance

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability pp) or failure (with probability q=1pq=1-p)

3.3 Probability mass function

Let SkS_k be the set of subsets of I={1,,n}I=\{1,\dots,n\} of size kk

The number of such sets is:

Sk=(nk)\lvert S_k \vert = {n \choose k}

With that, the probability mass function is:

k{0,,n},P(X=k)=ASkP(sAXs=1 and sIAXs=0)=ASksAP(Xs=1)×sIAP(Xs=0)thanks to independence=ASksAp×sIA(1p)=ASkpA×(1p)nA=ASkpk×(1p)nk=Skpk×(1p)nk=(nk)pk(1p)nk\begin{align*} \forall k\in\{0,\dots,n\},\quad \mathcal{P}(X=k)&= \sum_{A\in S_k}\mathcal{P}\left(\bigwedge_{s\in A} X_s=1 \space \text{and} \space \bigwedge_{s\in I \setminus A} X_s=0 \right) \\ &= \sum_{A\in S_k}\prod_{s\in A}\mathcal{P}(X_s=1) \times \prod_{s\in I\setminus A}\mathcal{P}(X_s=0) \quad \text{thanks to independence} \\ &=\sum_{A\in S_k}\prod_{s\in A}p \times \prod_{s\in I\setminus A}(1-p) \\ &= \sum_{A\in S_k}p^{\lvert A\rvert} \times (1-p)^{n-\lvert A \rvert}\\ &= \sum_{A\in S_k}p^{k} \times (1-p)^{n-k} \\ &= \lvert S_k \rvert p^{k} \times (1-p)^{n-k} \\ &= {n \choose k}p^k(1-p)^{n-k} \end{align*}

3.4 Moments

3.4.1 Raw Moments

The expected value can be calculated directly from the definition:

E[X]=k=1nE[Xk]=np\boxed{\mathbb{E}[X]=\sum_{k=1}^n\mathbb{E}[X_k]=np}

For higher order moments:

mN,E[Xm]=k=1n(nk)kmpk(1p)nk\begin{align*} \forall m\in\mathbb{N}^*,\quad \mathbb{E}[X^m]&= \sum_{k=1}^n{n \choose k}k^mp^k(1-p)^{n-k} \end{align*}

3.4.2 Central Moments

The variance can be calculated directly from the definition:

V[X]=k=1nV[Xk]=np(1p)\boxed{\mathbb{V}[X]=\sum_{k=1}^n\mathbb{V}[X_k]=np(1-p)}

For higher order central moments:

mN,E[(XE[X])m]=k=1n(nk)(knp)mpk(1p)nk\begin{align*} \forall m\in\mathbb{N}^*,\quad \mathbb{E}\left[\left(X-\mathbb{E}[X]\right)^m \right]&= \sum_{k=1}^n{n \choose k}(k-np)^mp^k(1-p)^{n-k} \end{align*}

4. Geometric Distribution

4.1 Definition

A random variable XX is said to follow the geometric distribution G(p)\mathcal{G}(p) if:

X1,B(p) i.i.d/X=argminnN{Xn=1}\exists X_1,\dots \sim \mathcal{B}(p) \space \text{i.i.d} / \quad X=\arg\min_{n\in\mathbb{N}^*} \{X_n=1\}

4.2 Significance

In probability theory and statistics, the geometric distribution is the probability distribution of the number XX of Bernoulli trials needed to get one success.

The geometric distribution gives the probability that the first occurrence of success requires kk independent trials, each with success probability pp.

4.3 Probability mass function

nN,P(X=n)=P(argminkN{Xk=1}=n)=P(k=1n1Xk=0 and Xn=1)=P(Xn=1)×k=1n1P(Xk=0)=p(1p)n1\begin{align*} \forall n\in\mathbb{N}^*,\quad \mathcal{P}(X=n) &= \mathcal{P}(\arg\min_{k\in\mathbb{N}}\{X_k=1\}=n) \\ &=\mathcal{P}\left(\bigwedge_{k=1}^{n-1} X_k =0 \space \text{and} \space X_n=1\right) \\ &=\mathcal{P}(X_n=1)\times \prod_{k=1}^{n-1}\mathcal{P}(X_k=0) \\ &=p(1-p)^{n-1} \end{align*}

4.4 Moments

4.4.1 Prelude

Let φn\varphi_n defined as:

φn:RRxmNmnxm\begin{align*} \varphi_{n}:&\mathbb{R}^*\rightarrow \mathbb{R}\\ &x\rightarrow \sum_{m\in\mathbb{N}}m^nx^m \end{align*}

This function will be a helper function for calculating E[Xn]\mathbb{E}[X^n]

In fact, φn\varphi_n is differentiable and:

φn=mNmn+1xm1=1xφn+1\varphi'_n=\sum_{m\in\mathbb{N}^*}m^{n+1}x^{m-1}=\frac{1}{x}\varphi_{n+1}

Which implies:

nN,φn+1=xφn\forall n\in\mathbb{N},\quad \varphi_{n+1}=x\varphi'_n

And we have the following:

φ0=mNxm=11x\varphi_0=\sum_{m\in\mathbb{N}}x^m=\frac{1}{1-x}

4.4.2 Raw Moments

nN,E[Xn]=mNmnp(1p)n1=p1pφn(1p)\begin{align*} \forall n\in\mathbb{N},\quad \mathbb{E}[X^n]&=\sum_{m\in\mathbb{N}}m^np(1-p)^{n-1} \\ &= \frac{p}{1-p}\varphi_n(1-p) \end{align*}

With that, we can calculate the expected value E[X]\mathbb{E}[X] as:

xR,φ1(x)=xφ0(x)=x(1x)2E[X]=p1pφ1(1p)=p1p1pp2=1p\begin{align*} \forall x\in\mathbb{R}^*,\quad \varphi_1(x)&=x\varphi_0'(x)\\ &=\frac{x}{(1-x)^2}\\ \mathbb{E}[X]&=\frac{p}{1-p}\varphi_1(1-p) \\ &=\frac{p}{1-p}\cdot \frac{1-p}{p^2}\\ &=\frac{1}{p} \end{align*}

4.4.3 Variance

The variance V[X]\mathbb{V}[X] can be calculated as:

xR{0,1},φ2(x)=xφ1(x)=x(x(1x)2)=x(1x)2+2(1x)x(1x)4=x1x+2x(1x)3=x(x+1)(1x)3V[X]=E[X2]E[X]2=p1pφ2(1p)1p2=p1p(1p)(2p)p31p2=2p1p2=1pp2\begin{align*} \forall x\in\mathbb{R}\setminus\{0,1\},\quad \varphi_2(x)&=x\varphi_1'(x)\\ &=x\cdot \left(\frac{x}{(1-x)^2}\right)'\\ &=x\cdot \frac{(1-x)^2+2(1-x)x}{(1-x)^4}\\ &=x\cdot \frac{1-x+2x}{(1-x)^3}\\ &=\frac{x(x+1)}{(1-x)^3}\\ \mathbb{V}[X]&=\mathbb{E}[X^2]-\mathbb{E}[X]^2\\ &=\frac{p}{1-p}\cdot \varphi_2(1-p)-\frac{1}{p^2}\\ &=\frac{p}{1-p}\cdot \frac{(1-p)(2-p)}{p^3}-\frac{1}{p^2}\\ &=\frac{2-p-1}{p^2}\\ &=\frac{1-p}{p^2} \end{align*}

5. Negative Binomial Distribution

A random variable XX is said to follow the negative binomial distribution NB(r,p)\mathcal{NB}(r,p) with paramters rNr\in\mathbb{N}^* and p[0,1]p\in[0,1] if:

X1,B(p) i.i.d/X=argminnN{k=1nXk=r}\exists X_1,\dots \sim \mathcal{B}(p) \space \text{i.i.d} / \quad X=\arg\min_{n\in\mathbb{N}} \left\{\sum_{k=1}^nX_k=r\right\}

5.2 Significance

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of trials in a sequence of independent and identically distributed Bernoulli trials so that a specified (non-random) number of successes rr occurs.

For example, we can define rolling a 66 on a die as a success, and rolling any other number as a failure, and ask how many rolls will occur to get the third success (r=3)(r=3). In such a case, the probability distribution of the number of needed trials will be a negative binomial distribution.

5.3 Probability mass function

Let Sn,kS_{n,k} be the set of subsets of In={1,,n}I_n=\{1,\dots,n\} of size kk

nN,P(X=n)=P(argminnN{k=1nXk=r})=ASn1,r1P(sAXs=1 and Xn=1 andsIn1AXs=0)=ASn1,r1P(Xn=1)sAP(Xs=1)sIn1AP(Xs=0)=ASn1,r1pA+1(1p)n1A=ASn1,r1pr(1p)nr=Sn1,r1pr(1p)nr=(n1r1)pr(1p)nr\begin{align*} \forall n\in\mathbb{N},\quad \mathcal{P}(X=n) &= \mathcal{P}\left(\arg\min_{n\in\mathbb{N}} \left\{\sum_{k=1}^nX_k=r\right\}\right) \\ &= \sum_{A\in S_{n-1,r-1}} \mathcal{P}\left(\bigwedge_{s\in A}X_s=1 \space \text{and} \space X_n=1 \space \text{and} \bigwedge_{s\in I_{n-1}\setminus A}X_s=0\right) \\ &= \sum_{A\in S_{n-1,r-1}} \mathcal{P}(X_n=1)\prod_{s\in A}\mathcal{P}(X_s=1) \cdot \prod_{s\in I_{n-1}\setminus A} \mathcal{P}(X_s=0) \\ &= \sum_{A\in S_{n-1,r-1}} p^{\lvert A \rvert +1} (1-p)^{n-1-\lvert A \rvert} \\ &= \sum_{A\in S_{n-1,r-1}} p^{r} (1-p)^{n-r} \\ &= \lvert S_{n-1,r-1}\rvert \cdot p^{r} (1-p)^{n-r} \\ &= {n-1 \choose r-1}p^r(1-p)^{n-r} \end{align*}

5.4 Moments

5.4.1 Raw Moments

  • Let p[0,1]p\in[0,1]
  • For rN,r\in\mathbb{N}, let XrNB(r,p)X_r\sim \mathcal{NB}(r,p)
nN,E[Xrn]=mN(m1r1)mnpr(1p)mr=mN(m1)!(mr)!(r1)!mnpr(1p)mr=mNm!(mr)!r!rmn1pr(1p)mr=mN(mr)rmn1pr(1p)mr=rpmN(m1r)(m1)n1pr+1(1p)m1r=rpmN(m1r)s=0n1(n1s)(1)n1smspr+1(1p)m1r=rps=0n1(1)n1s(n1s)mN(m1r)mspr+1(1p)m1r=rps=0n1(1)n1s(n1s)E[Xr+1s]\begin{align*} \forall n\in\mathbb{N}^*,\quad \mathbb{E}[X_r^n]&=\sum_{m\in\mathbb{N}}{m-1 \choose r-1}m^np^r(1-p)^{m-r}\\ &=\sum_{m\in\mathbb{N}}\frac{(m-1)!}{(m-r)!(r-1)!}m^np^r(1-p)^{m-r}\\ &=\sum_{m\in\mathbb{N}}\frac{m!}{(m-r)!r!}rm^{n-1}p^r(1-p)^{m-r}\\ &=\sum_{m\in\mathbb{N}}{m \choose r}rm^{n-1}p^r(1-p)^{m-r}\\ &=\frac{r}{p}\sum_{m\in\mathbb{N}^*}{m-1\choose r}(m-1)^{n-1}p^{r+1}(1-p)^{m-1-r}\\ &=\frac{r}{p}\sum_{m\in\mathbb{N}^*}{m-1\choose r}\sum_{s=0}^{n-1}{n-1\choose s}(-1)^{n-1-s}m^sp^{r+1}(1-p)^{m-1-r}\\ &=\frac{r}{p}\sum_{s=0}^{n-1}(-1)^{n-1-s}{n-1\choose s}\sum_{m\in\mathbb{N}^*}{m-1\choose r}m^sp^{r+1}(1-p)^{m-1-r}\\ &=\frac{r}{p}\sum_{s=0}^{n-1}(-1)^{n-1-s}{n-1\choose s}\mathbb{E}[X^s_{r+1}] \end{align*}

In particular, the expected value is:

E[Xr]=rpE[Xr+10]=rp\boxed{\mathbb{E}[X_r]=\frac{r}{p}\mathbb{E}[X^0_{r+1}]=\frac{r}{p}}

5.4.2 Central Moments

We will start by the variance

E[Xr2]=rp(E[Xr+10]+E[Xr+1])=rp(r+1p1)=r(r+1p)p2    V[Xr]=E[Xr2]E[Xr]2=r(r+1p)p2r2p2=r1pp2\begin{align*} \mathbb{E}[X_r^2]&=\frac{r}{p}\left(-\mathbb{E}[X_{r+1}^0]+\mathbb{E}[X_{r+1}]\right)\\ &=\frac{r}{p}\cdot (\frac{r+1}{p}-1)\\ &=\frac{r(r+1-p)}{p^2}\\ \implies \mathbb{V}[X_r]&=\mathbb{E}[X_r^2]-\mathbb{E}[X_r]^2\\ &=\frac{r(r+1-p)}{p^2}-\frac{r^2}{p^2}\\ &=r\frac{1-p}{p^2} \end{align*}

5.5 Relation to the Geometric Distribution

The geometric distribution is a special case of the negative binomial distribution.

In fact:

p[0,1],G(p)=NB(1,p)\boxed{\forall p\in [0,1],\quad \mathcal{G}(p)=\mathcal{NB}(1,p)}