Skip to main content

formalisation

Probability Formalisation

1. Importance

This section will give how probability is formally defined.

It is not required at all, but it gives insight when a probability question makes sense.

Also it gives how to construct some random variables.

Be aware that this an advanced topic ⚠️⚠️⚠️.

1.1 Examples that have a meaning

  • Choose XX uniformly from [0,1][0,1]

  • Choose XX uniformly from {0,1,2,3}\{0,1,2,3\}

  • Let XU(0,1),X\sim \mathcal{U}(0,1), what is the probability that XQX\in \mathbb{Q}?

  • Let Qˉ={αR/PQ[x]/P(α)=0}\bar{\mathbb{Q}}=\{\alpha\in\mathbb{R}/\quad \exists P\in\mathbb{Q}[x]/\quad P(\alpha)=0\} The set of all real numbers that are a solution to some polynomial function with rational coefficients. The statement below is a valid statement:

    Let XN(0,1)X\sim \mathcal{N}(0,1), what is the probability that XQˉX\in \bar{\mathbb{Q}}?

1.2 Examples that does not have a meaning

  • Choose XX uniformly from N\mathbb{N}

  • Choose XX uniformly from R\mathbb{R}

  • Let VR\mathcal{V}\subseteq \mathbb{R} a set such that xR,!yV/xyQ[0,1].\forall x\in\mathbb{R},\exists!y\in\mathcal{V}/\quad x-y \in\mathbb{Q}\cap[0,1]. Such set is called a Vitali set. The statement below does not have a meaning

    Let XU(0,1)X\sim \mathcal{U}(0,1), what is the probability that XVX\in\mathcal{V}?

1.3. Road Map

To formally define a probability, we need:

  • A universe Ω\Omega

  • set F\mathcal{F} of events.

  • Some probability function μ\mu that assigns a probability p[0,1]p\in[0,1] for every event AF.A\in\mathcal{F}.

Now we will start by defining suitable set of events F.\mathcal{F}.

Then we will define some consistent probability functions μ\mu

Finally, we will formally define a random variable.

2. Sigma Algebra

2.1 Definition

Let SS be some set, and P(S)\mathscr{P}(S) be its power set.

A subset ΣP(S)\Sigma \subseteq \mathscr{P}(S) is called a σ\sigma-algebra if:

  1. SΣS\in \Sigma

  2. Σ\Sigma is closed under set complementation:

    AΣ,Aˉ=SAΣ\forall A\in \Sigma ,\quad \bar{A}= S\setminus A \in \Sigma
  3. Σ\Sigma is closed under countable union:

    (An)nNΣN,nNAnΣ\forall (A_n)_{n\in\mathbb{N}}\in\Sigma^{\mathbb{N}},\quad \bigcup_{n\in\mathbb{N}}A_n \in \Sigma

2.2 Example

For any set S,S, P(S)\mathscr{P}(S) and {,S}\{\empty,S\} are both a sigma algebra

2.3 Importance

In probability, A σ\sigma-algebra gives which events make sens. Those events can be aggregated using set union or set intersection.

3. Measurable Space

3.1 Definition

Let SS be some set and F\mathcal{F} be a σ\sigma-algebra of SS

The couple (S,F)(S,\mathcal{F}) is called a measurable space

3.2 Significance

A measurable space is on some sense a space on which it is possible to define a function defining the "size" of any set AFA\in\mathcal{F}

This function will have some constraints that will be formalised next.

3.3 Examples

For any set S,S, (S,P(S))(S,\mathscr{P}(S)) is a measurable set

3.4 Measurable Set

A set UP(S)U\in\mathscr{P}(S) is said to be measurable if UFU\in \mathcal{F}

Any set UFU\notin \mathcal{F} is called a non-measurable set.

4. Measure Space

4.1 Definition

A measure space (S,F,μ)(S,\mathcal{F},\mu) is a measurable space (S,F)(S,\mathcal{F}) with an additional real valued function μ:FRˉ\mu:\mathcal{F}\rightarrow \bar{\mathbb{R}} called measure satisfying the following conditions:

  1. The measure of the empty set is null: μ()=0\mu(\emptyset)=0

  2. The measure function is non-negative: AF,μ(A)0\forall A\in \mathcal{F}, \quad \mu(A)\ge 0

  3. The measure function is countably additive with respect to disjoint sets:

    (An)FN pairwise disjoint,μ(nNAn)=nNμ(An)\forall (A_n)\in \mathcal{F}^\mathbb{N} \space \text{pairwise disjoint}, \quad \mu\left(\bigcup_{n\in\mathbb{N}} A_n\right)=\sum_{n\in\mathbb{N}}\mu(A_n)

4.2 Significance

A measure space is a set SS with a function μ\mu giving the size of measurable subsets ASA\subseteq S

4.3 Examples

4.3.1 Finite Set

  • Let SS be a finite set of size nNn\in\mathbb{N}
  • Let M=(S,P(S))\mathcal{M}=(S,\mathscr{P}(S)) be a measurable space

We will define μ\mu as follow:

AS,μ(A)=A\forall A\subseteq S, \quad \mu(A)=\vert A \rvert

We can verify that (M,μ)(\mathcal{M},\mu) is a measure space.

Also, with p=μSp=\frac{\mu}{\lvert S \rvert}, (M,p)(\mathcal{M},p) is also a measure space.

4.3.2 Natural numbers

We will set F=P(N)\mathcal{F}=\mathscr{P}(\mathbb{N})

We will define μ\mu as follow:

ANfinite,μ(A)=A\forall A\subseteq \mathbb{N} \quad \text{finite}, \quad \mu(A)=\vert A \rvert

We can verify that (M,μ)(\mathcal{M},\mu) is a measure space.

As an example, μ({1,2,8})=3.\mu(\{1,2,8\})=3.

We can also define another measure λ\lambda as:

AN,μ(A)=kA2k\forall A\subseteq \mathbb{N}, \quad \mu(A)=\sum_{k\in A}2^{-k}

As an example, λ({0,2})=1+14=1.25\lambda(\{0,2\})=1+\frac{1}{4}=1.25

Also, λ(N)=nN2n=2\lambda(\mathbb{N})=\sum_{n\in\mathbb{N}}2^{-n}=2

4.3.2 Real Line

R\mathbb{R} can be augmented to a measure space (R,B,μ)(\mathbb{R},\mathcal{B},\mu) with μ\mu defined as:

ab,μ(]a,b[)=ba\forall a\le b,\quad \mu(]a,b[)=b-a

In fact:

  • μ\mu gives the length of a set ABA\in\mathcal{B}
  • B\mathcal{B} is called a Borel set, and its construction is a too advanced.
  • BP(R),\mathcal{B}\neq \mathscr{P}(\mathbb{R}), as it happens that the Vitali set V\mathcal{V} defined at 1.21.2 is not a Borel set.

4.3 Measurable Function

  • Let (S,F,μ)(S,\mathcal{F},\mu) be a measure space
  • Let (E,E)(E,\mathcal{E}) be a measurable space

A function f:SEf:S\rightarrow E is said to be measurable if the pre-image of any measurable set is measurable:

UE,f1(U)F\forall U\in \mathcal{E},\quad f^{-1}(U)\in \mathcal{F}

5. Probability Space

5.1 Definition

A probability space is a measure space (Ω,F,μ)(\Omega,\mathcal{F},\mu) with the additional constraint that μ(Ω)=1\mu(\Omega)=1

5.2 Terminology

5.3 Examples

5.3.1 Finite Sets

The measure space (M,p)(\mathcal{M},p) from the example 4.3.14.3.1 is a probability space

5.3.2 Natural numbers

The measure λ\lambda from 3.3.23.3.2 is not a probability measure, but it induces a probability measure ϕ\phi defined by:

AN,ϕ(A)=λ(A)λ(N)=12kA2k\forall A\subseteq \mathbb{N},\quad \phi(A)=\frac{\lambda(A)}{\lambda(\mathbb{N})}=\frac{1}{2}\sum_{k\in A}2^{-k}

The measure μ\mu from 3.3.23.3.2 cannot induce a probability space like λ\lambda as μ(N)=+\mu(\mathbb{N})=+\infty

5.3.3 Real numbers

Let (R,B,μ)(\mathbb{R},\mathcal{B},\mu) the measure space defined in 4.3.34.3.3

We will define another measure λ\lambda defined as follow:

ab,λ(]a,b[)=μ([0,1]]a,b[)\forall a\le b,\quad \lambda(\mathopen]a,b\mathclose[)=\mu([0,1]\cap\mathopen]a,b\mathclose[)

With that, (R,B,λ)(\mathbb{R},\mathcal{B},\lambda) is a probability space.

5.3.4 Dirac Measure

Let (R,P(R),δ),(\mathbb{R},\mathscr{P}(\mathbb{R}),\delta), with δ\delta defined as:

AR,δ(A)={1if 0A0otherwise\forall A\subseteq \mathbb{R},\quad \delta(A)=\begin{cases} 1 &\text{if} \space 0\in A\\ 0 &\text{otherwise} \end{cases}

6. Random Variable

6.1 Definition

  • Let (Ω,F,μ)(\Omega,\mathcal{F},\mu) be a probability space
  • Let (E,E)(E,\mathcal{E}) a measurable space

A function X:ΩEX:\Omega\rightarrow E is called a random variable if it is measurable.

In other words, that is if it is a measurable function whose domain constitute a probability space.

6.2 Probability of an event

Let UEU\in\mathcal{E} an event.

We define the probability that XUX\in U, denoted by P(XU)\mathcal{P}(X\in U) as follow:

P(XU)=μ({ωΩ/X(ω)U})=μ(X1(U))\mathcal{P}(X\in U)=\mu\left(\{\omega \in \Omega / \quad X(\omega)\in U\}\right)=\mu\left(X^{-1}(U)\right)

6.3 Distribution

The probability function defined at 6.2 constitute a measure D\mathcal{D} of the measurable space (E,E)(E,\mathcal{E})

This measure is defined as: UE,D(U)=μ(X1(U)),\forall U\in \mathcal{E},\quad\mathcal{D}(U)=\mu(X^{-1}(U)), and it is called the distribution of X.X.

If XX has a distribution D\mathcal{D}, we say that XX follows a D\mathcal{D} distribution, and we note it as:

XDX\sim \mathcal{D}

6.4 Classification

We will only consider two types of random variables:

6.4.1 Discrete Random Variable

A random variable is said to be discrete if X(Ω)X(\Omega) is countable

6.4.2 Continuous Random Variable

A random variable XX is said to be continuous if:

ωΩ,P(X=ω)=0\forall \omega\in \Omega,\quad \mathcal{P}(X=\omega)=0

6.5 Examples

This examples show the formal construction of some random variables.

6.5.1 Discrete Uniform Variable

  • nNn \in\mathbb{N}^*
  • Let a,bNa,b\in\mathbb{N}^* such that aba\le b
  • Let S={a,,b}S=\{a,\dots,b\}
  • As SS is finite, we can define the probability space (S,P(S),p)(S,\mathscr{P}(S),p) as in 5.3.15.3.1
  • Let X:SSX:S\rightarrow S defined by X(ω)=ωX(\omega)=\omega

We have:

sS,P(X=s)=p(X1(ω))=p({ω})=1S=1ba+1\forall s\in S,\quad \mathcal{P}(X=s)=p(X^{-1}(\omega))=p(\{\omega\})=\frac{1}{\lvert S \rvert}=\frac{1}{b-a+1}

Note that as X(S)=SX(S)=S is a countable set, this random variable is a discrete random variable.

6.5.2 Bernoulli Random Variable

  • Let (R,B,λ)(\mathbb{R},\mathcal{B},\lambda) as defined on 5.3.35.3.3

  • Let E={0,1},E=P(E),E=\{0,1\}, \mathcal{E}=\mathscr{P}(E), so that (E,E)(E,\mathcal{E}) is a measurable space

  • Let p[0,1]p\in[0,1].

  • Let X:REX:\mathbb{R}\rightarrow E defined as:

    X(ω)={1if x<p0otherwiseX(\omega)=\begin{cases} 1 & \text{if} \space x <p \\ 0& \text{otherwise} \end{cases}

We have:

P(X=1)=λ(X1(1))=λ(],p[)=μ([0,1]],p[)=μ([0,p[)=pP(X=0)=P(X1)=1P(X=1)=1p\begin{align*} \mathcal{P}(X=1)&=\lambda(X^{-1}(1)) \\ &=\lambda(\mathopen]-\infty,p\mathclose[)\\ &=\mu([0,1]\cap \mathopen]-\infty,p\mathclose[) \\ &=\mu([0,p[)\\ &= p \\ \mathcal{P}(X=0)&=\mathcal{P}(X\ne 1) \\ &= 1-\mathcal{P}(X=1)\\ &= 1-p \end{align*}

With that, we can verify that XB(p)X\sim \mathcal{B}(p)

Note that as X(R)={0,1}X(\mathbb{R})=\{0,1\} is a countable set, this random variable is a discrete random variable.

6.5.3 Continuous Uniform Variable

  • Let (R,B,λ)(\mathbb{R},\mathcal{B},\lambda) as defined on 5.3.35.3.3
  • Let p[0,1]p\in[0,1].
  • Let X:RRX:\mathbb{R}\rightarrow \mathbb{R} defined as W(ω)=ωW(\omega)=\omega

We have:

a,b[0,1]/ab,P(X[a,b])=λ(X1([a,b]))=λ([a,b])=μ([0,1][a,b])=μ([a,b])=ba\begin{align*} \forall a,b\in[0,1] /a\le b,\quad \mathcal{P}(X\in [a,b])&=\lambda(X^{-1}([a,b]))\\ &=\lambda([a,b])\\ &=\mu([0,1]\cap [a,b]) \\ &=\mu([a,b])\\ &=b-a \end{align*}

This result essentially says that XU(0,1).X\sim \mathcal{U}(0,1). And we can verify that XX is a continuous random variable.

Now we will calculate P(XQ)\mathcal{P}(X\in \mathbb{Q})

As XX is a continuous random variable, we have: xR,P(X=x)=0.\forall x \in\mathbb{R},\quad \mathcal{P}(X=x)=0.

Furthermore, as Q\mathbb{Q} is infinitely countable, there exists a bijective function Φ:NQ\Phi:\mathbb{N}\rightarrow \mathbb{Q}. with that:

P(XQ)=λ(Q)=λ(nN{Φ(n)})=nNλ({Φ(n)})=nN0=0\begin{align*} \mathcal{P}(X\in\mathbb{Q})&=\lambda(\mathbb{Q})\\ &=\lambda\left(\bigcup_{n\in\mathbb{N}}\{\Phi(n)\}\right)\\ &=\sum_{n\in\mathbb{N}}\lambda(\{\Phi(n)\})\\ &=\sum_{n\in\mathbb{N}}0\\ &=0 \end{align*}

7. Discrete Random Variable

7.1 Definition

  • Let (Ω,F,μ)(\Omega,\mathcal{F},\mu) be a probability space
  • Let (E,E)(E,\mathcal{E}) a measurable space

A random variable X:ΩEX:\Omega \rightarrow E is said to be discrete if X(Ω)X(\Omega) is countable

7.2 Probability Mass Function

7.2.1 Definition

The probability mass function MXM_X is defined as:

MX(ω)=P(X=ω)M_X(\omega)=\mathcal{P}(X=\omega)

Now we will recover the probability of an event from its mass function.

7.2.2 Probability of an event

Let AE,A\in \mathcal{E}, we have the following:

P(XA)=μ(X1(A))=μ(X1(A)Ω)=μ(X1(A)X1(X(Ω)))=μ(X1(AX(Ω)))=P(XAX(Ω))\begin{align*} \mathcal{P}(X\in A)&=\mu(X^{-1}(A))\\ &=\mu\left(X^{-1}(A)\cap \Omega\right)\\ &=\mu\left(X^{-1}(A)\cap X^{-1}(X(\Omega))\right)\\ &=\mu\left(X^{-1}\left(A\cap X(\Omega)\right)\right)\\ &=\mathcal{P}(X\in A\cap X(\Omega) ) \end{align*}

As AX(Ω)X(Ω)A\cap X(\Omega)\subseteq X(\Omega) is countable, there exists a bijective function Φ:IAX(Ω)\Phi: \mathcal{I}\rightarrow A \cap X(\Omega) with IN\mathcal{I}\subseteq \mathbb{N}.

And from that we can calculate the probability of A:A:

P(XA)=P(XAX(Ω))=μ(X1(AX(Ω)))=μ(X1(nN{Φ(n)}))=μ(nNX1({Φ(n)}))=nNμ(X1(Φ(n)))=ωAX(Ω)μ(X1(ω))we define this sum as the one above=ωAX(Ω)P(X=ω)=ωAX(Ω)P(X=ω)+P(XAXˉ(Ω))as the last term is zero=ωAX(Ω)P(X=ω)+ωAXˉ(Ω)P(X=ω)as every term is zero=ωAP(X=ω)it makes sense as only countably many terms are non-zero\begin{align*} \mathcal{P}(X\in A)&=\mathcal{P}(X\in A\cap X(\Omega))\\ &=\mu\left(X^{-1}(A\cap X(\Omega))\right)\\ &=\mu\left(X^{-1}\left(\bigcup_{n\in\mathbb{N}}\{\Phi(n)\}\right)\right)\\ &=\mu\left(\bigcup_{n\in\mathbb{N}}X^{-1}\left(\{\Phi(n)\}\right)\right)\\ &=\sum_{n\in\mathbb{N}} \mu(X^{-1}(\Phi(n)))\\ &=\sum_{\omega \in A\cap X(\Omega)}\mu(X^{-1}(\omega)) \quad \text{we define this sum as the one above}\\ &=\sum_{\omega\in A\cap X(\Omega)}\mathcal{P}(X = \omega) \\ &=\sum_{\omega\in A\cap X(\Omega)}\mathcal{P}(X = \omega) + \mathcal{P}(X\in A \cap\bar{X}(\Omega))\quad \text{as the last term is zero} \\ &=\sum_{\omega\in A\cap X(\Omega)}\mathcal{P}(X = \omega) + \sum_{\omega \in A\cap \bar{X}(\Omega)}\mathcal{P}(X=\omega) \quad \text{as every term is zero}\\ &=\sum_{\omega\in A}\mathcal{P}(X=\omega) \quad \text{it makes sense as only countably many terms are non-zero} \end{align*}

By that, for every event AEA\in\mathcal{E} we have:

P(XA)=ωAP(X=ω)=ωAMX(ω)\mathcal{P}(X\in A)=\sum_{\omega \in A}\mathcal{P}(X=\omega)=\sum_{\omega \in A}M_X(\omega)

7.3 Examples

  • The Discrete Uniform Variable shown in 6.5.16.5.1
  • The Bernoulli Random Variable shown in 6.5.26.5.2
  • Even when we expand the domain of the Bernoulli Random Variable to E=R,E=B.E=\mathbb{R},\mathcal{E}=\mathcal{B}. It will always be a discrete random variable as X(Ω)={0,1}X(\Omega)=\{0,1\}

8. Real Random Variable

8.1 Definition

  • Let (Ω,F,μ)(\Omega,\mathcal{F},\mu) be a probability space
  • Let E=R,E=BE=\mathbb{R},\mathcal{E}=\mathcal{B} so that (R,B)(\mathbb{R},\mathcal{B}) is a measurable space

A real random variable is a random variable X:ΩRX:\Omega\rightarrow \mathbb{R}

Furthermore, if it is continuous, it is said to be a continuous real random variable

8.2 Cumulative Distribution Function

For a random variable XX, its cumulative distribution function FXF_X is defined by:

xR,FX(x)=P(Xx)=P(X],x])\forall x\in\mathbb{R},\quad F_X(x)=\mathcal{P}(X\le x)=\mathcal{P}(X\in\mathopen]-\infty,x\mathclose])

Example

Let XU(0,1)X\sim \mathcal{U}(0,1)

The cumulative distribution function of XX is:

FX(x)={0if x<0xif x[0,1]1otherwiseF_X(x)=\begin{cases} 0 &\text{if} \space x <0 \\ x &\text{if} \space x\in[0,1]\\ 1 &\text{otherwise} \end{cases}

8.3 Probability Density Function

8.3.1 Definition

Where it can be defined, the probability Density function fXf_X is the derivative of the FX:F_X:

fX=FXf_X=F_X'

8.3.2 Probability of an event

If FXF_X is differentiable almost everywhere, then for every event AB:A\in\mathcal{B}:

P(XA)=AfX(x) dx\mathcal{P}(X\in A)=\int_{A}f_X(x)\space \text{dx}

In particular, for every interval [a,b]:[a,b]:

P(X[a,b])=abfX(x) dx\mathcal{P}(X\in[a,b])=\int_{a}^bf_X(x)\space \text{dx}

8.3.3 Example

Let XU(0,1)X\sim \mathcal{U}(0,1)

The probability density function fXf_X can be defined as:

fX(x)={0if x<01if x[0,1]0otherwisef_X(x)=\begin{cases} 0 &\text{if} \space x <0 \\ 1 &\text{if} \space x\in[0,1]\\ 0 &\text{otherwise} \end{cases}