# Transforming random variables

In certain cases, it is useful to transform a random variable $\Theta$ to another random variable $\Psi$ via a transformation $\Psi=f(\Theta)$. For instance, this situation arises typically if one wants to sample from the probability density function $p_{\scriptscriptstyle{\Theta}}(\theta)$ of a positively-valued random variable $\Theta$. Markov chain Monte Carlo (MCMC) algorithms are conventionally designed to draw correlated samples from a desired target (density) $p_{\scriptscriptstyle{\Psi}}(\psi)$ of a random variable $\Psi$ taking values over the real line. So, if the target of interest $p_{\scriptscriptstyle{\Theta}}$ has support over the positive real line and the MCMC algorithm samples from a density $p_{\scriptscriptstyle{\Psi}}$ with support over the real line, then a random variable transformation, such as $\Psi=\log{(\Theta)}$, can help resolve the matter. In particular, the transformation allows to sample from $p_{\scriptscriptstyle{\Psi}}$ via the MCMC method of choice and then the inverse transformation $\Theta=\exp{(\Psi)}$ converts the simulated Markov chain to a set of sample points from $p_{\scriptscriptstyle{\Theta}}$. Obviously, it is needed to find the form of the target density $p_{\scriptscriptstyle{\Psi}}$ on which MCMC will be applied.

Although such random variable transformations are common practice, one may need to look up the formula for the transformation to pass from the original density $p_{\scriptscriptstyle{\Theta}}$ to the transformed density $p_{\scriptscriptstyle{\Psi}}$. The main source of confusion is whether one needs the Jacobian associated with the transformation $f$ or with the inverse transformation $f^{-1}$.

There is a way to retrieve the formula intuitively via a geometric argument, rather than trying to uncover it mnemonically. The main argument is that of area preservation in the case of univariate random variables. It suffices to realize that for a small displacement, the area below the curves of the two densities is the same, which means that

$p_{\scriptscriptstyle{\Psi}}(\psi)d\psi=p_{\scriptscriptstyle{\Theta}}(\theta)d\theta$.

This realization suffices to recover the remaining steps. It follows that

$p_{\scriptscriptstyle{\Psi}}(\psi)=p_{\scriptscriptstyle{\Theta}}(\theta)\frac{d\theta}{d\psi}$.

Notice that

$\Theta \overset{f}{\underset{f^{-1}}\rightleftarrows} \Psi$,

which gives the transformed density

$p_{\scriptscriptstyle{\Psi}}(\psi)=p_{\scriptscriptstyle{\Theta}}(f^{-1}(\psi))\left|\frac{df^{-1}(\psi)}{d\psi}\right|$.

The Jacobian in the univariate case is the derivative $\frac{df^{-1}(\psi)}{d\psi}$, associated with the inverse transformation $f^{-1}$. The absolute value of the derivative ensures that the density $p_{\scriptscriptstyle{\Psi}}$ is non-negative. Having understood the univariate case, the multivariate scenario follows straightforwardly as

$p_{\scriptscriptstyle{\boldsymbol\Psi}}(\boldsymbol\psi)=p_{\scriptscriptstyle{\boldsymbol\Theta}}(f^{-1}(\boldsymbol\psi))\left|\frac{\partial f^{-1}_{i}(\boldsymbol\psi)}{\partial_{j}\boldsymbol\psi}\right|$,

where $\left|\frac{\partial f^{-1}_{i}(\boldsymbol\psi)}{\partial_{j}\boldsymbol\psi}\right|$ denotes the determinant of the Jacobian of $f^{-1}$.

To follow through with the example

$\Theta \overset{\log}{\underset{\exp}\rightleftarrows} \Psi$,

notice that $f=\log$, $f^{-1}=\exp$. So, the derivative $\frac{df^{-1}(\psi)}{d\psi}$ becomes

$\displaystyle\frac{df^{-1}(\psi)}{d\psi}=\frac{d \exp (\psi)}{d\psi}=\exp{(\psi)}$,

whence

$p_{\scriptscriptstyle{\Psi}}(\psi)=p_{\scriptscriptstyle{\Theta}}(\exp{(\psi)}) \exp{(\psi)}$.

The target log-density for MCMC is thus

$\log{(p_{\scriptscriptstyle{\Psi}}(\psi))}=\log{(p_{\scriptscriptstyle{\Theta}}(\exp{(\psi)}))} + \psi$.

# The birthday problem and tail recursion

The birthday problem or probability paradox concerns the probability that a group of $n$ people all have different birthdays, see for instance p. 15 of the book “Statistical modelling and computation” by Dirk Kroese and Joshua Chan. The purpose of this blog post is to exemplify how tail recursion can be used to avoid mutability in the context of a simple statistical computation, thus switching from the imperative to the functional programming paradigm.

Short recap on the solution to the birthday problem

As a succinct recap of the solution to the birthday problem, let $A_k$ denote the event that the $k$ first people have different birthdays for $k=1,2,\dots, n$. Notice that $A_{k}\subseteq A_{k-1}$, whence $A_n=\underset{k=1}{\overset{n}{\bigcap}}A_k$. The probability that all $n$ people have different birthdays is then $\mbox{P}(A_n)=\mbox{P}(\underset{k=1}{\overset{n}{\bigcap}}A_k)$. By application of the product rule of probability,

$\mbox{P}(A_n)=\mbox{P}(A_1)\displaystyle\prod_{k=2}^{n}\mbox{P}(A_k|A_{k-1})$.

Given that the first $k-1$ people have different birthdays, the first $k$ people have different birthdays if and only if the birthday of the $k$-th person is chosen from the remaining $365-(k-1)$ birthdays. So,

$\mbox{P}(A_k|A_{k-1})=\displaystyle\frac{365-k+1}{365}$.

Obviously, $\mbox{P}(A_1)=1$, which leads to

$\mbox{P}(A_n)=\displaystyle\frac{1}{365^{n-1}} \prod_{k=2}^{n}(365-k+1)$.

Computing the probability of different birthdays via an iterative function

To compute the probability $\mbox{P}(A_n)$ that all $n$ people have distinct birthdays as a function of $n$, consider the following iterative function in R:

UniqueBirthdayProb <- function(n) {
p <- 1
for (k in 1:n) {
p <- p * (365 - k + 1) / 365
}
return(p)
}


The function call

UniqueBirthdayProb(23)


gives a probability of $0.4927028$, which means that the probability of $n=23$ people not exhibiting a duplicate birthday is less than $50\%$.

The following snippet computes and plots $\mbox{P}(A_n)$ for $n=1,2,\dots,70$:

nseq <- 1:70
pseq <- sapply(nseq, UniqueBirthdayProb)

plot(
nseq, pseq,
type="o", pch="*", ylim=c(0, 1),
xlab="n", ylab=expression('P(A'[n]*')'),
main="Probability of distinct birthdays"
)


The resulting plot is shown below:

Computing the probability of different birthdays via a recursive function

The following R function computes $\mbox{P}(A_n)$ recursively:

RecUniqueBirthdayProb <- function(n) {
if (n == 1) {
return(1)
} else {
return(RecUniqueBirthdayProb(n - 1) * (365 - n + 1) / 365)
}
}


Mutability has been avoided in that the state of the local variable p in the body of UniqueBirthdayProb(n) is not altered. The function RecUniqueBirthdayProb(n) calls itself recursively, and therefore avoids the need for a local variable. A downside of the recursive function calls in RecUniqueBirthdayProb(n) is that the stack frame is filled for increasing values of the input argument n.

Computing the probability of different birthdays via a tail-recursive function

Tail recursion reduces the number of recursive calls. This is how to define a tail-recursive function in R to compute $\mbox{P}(A_n)$:

TailRecUniqueBirthdayProb <- function(n) {
loop <- function(acc, n) {
if (n == 1) {
return(acc)
} else {
return(loop(acc * (365 - n + 1) / 365, n - 1))
}
}
loop(1, n)
}


A closure loop(acc, n) is defined inside the tail-recursive function TailRecUniqueBirthdayProb(n). The number of recursive calls is reduced via the accumulator acc, which is the first input argument to loop(acc, n).

A comparison of stack frames between the three functions

To compare the iterative, recursive and tail-recursive functions in terms of their stack demands, the trace() R function is called upon each of the three functions, yielding the following output:

> trace(UniqueBirthdayProb)
> UniqueBirthdayProb(10)
trace: UniqueBirthdayProb(10)
[1] 0.8830518
> untrace(UniqueBirthdayProb)
>
> trace(RecUniqueBirthdayProb)
> RecUniqueBirthdayProb(10)
trace: RecUniqueBirthdayProb(10)
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
trace: RecUniqueBirthdayProb
[1] 0.8830518
> untrace(RecUniqueBirthdayProb)
>
> trace(TailRecUniqueBirthdayProb)
> TailRecUniqueBirthdayProb(10)
trace: TailRecUniqueBirthdayProb(10)
[1] 0.8830518
> untrace(TailRecUniqueBirthdayProb)


It is clear from the output that RecUniqueBirthdayProb(n) requires more recursive function calls than TailRecUniqueBirthdayProb(n). Strictly speaking, and despite the reduction in the number of recursive function calls, R does not support tail elimination. The focus of this post is on the concept of tail recursion and how it can be applied in a simple statistical problem, not on the specifics of tail elimination in R. The choice of programming language in this post has been made on pedagogical and not on performance grounds.

# Proof of the Baker-Campbell-Hausdorff formula

The Baker-Campbell-Hausdorff (BCH) formula appears in stochastic analysis and in quantum mechanics. In the context of stochastic analysis, the BCH formula provides a method to calculate the log-signature of the concatenation of two rough paths. In the context of quantum mechanics, the BCH formula allows to compute products of general operators in Hilbert spaces.

The log-signature of the concatenation of two paths in $\mathbb{R}^d$ expresses as a sum of Lie brackets of formal power series in the tensor algebra of $\mathbb{R}^d$. To learn how the BCH formula is used for computing log-signatures, the reader is referred to section 2.2.4, p. 37, of the book “Differential equations driven by rough paths” by Terry Lyons, Michael Caruan and Thierry Lévy, section 4, p. 4-5, of the notes “Calculation of iterated-integral signatures and log signatures” by Jeremy Reizenstein, and example 9, p. 16, of the notes “A primer on the signature method in machine learning” by Ilya Chevyrev and Andrey Kormilitzin.

This blog post provides a proof of the BCH formula from a quantum perspective, as stated in lemma 34, p. 56, of the lecture notes “Quantum mechanics” by Martin Plenio.

Let $\mathcal{H}$ be a finite Hilbert space over $\mathbb{C}$. Consider two operators $A:\mathcal{H}\rightarrow\mathcal{H}$ and $B:\mathcal{H}\rightarrow\mathcal{H}$. For the operators $A,B$, define the so-called commutator $[A,B]=AB-BA$. The BCH formula in this setting takes the form

$e^{B}Ae^{-B}=A+[B,A]+\frac{1}{2}[B,[B,A]]+\dots$

As a side note, the commutator $[A,B]$ corresponds to a Lie bracket in the rough path case.

To prove the BCH formula, introduce the auxiliary function $f(x)=e^{xB}Ae^{-xB}$, where $x\in\mathbb{R}$. Take the Taylor series expansion of $f(x)$ around $0$, i.e. consider $f(x)=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^{n}$. Setting $x=1$ leads to

$e^{B}Ae^{-B}=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}=A+f^{(1)}(0)+\frac{f^{(2)}(0)}{2}+\dots$

By application of definition 27, p. 51, of Plenio’s notes, it holds that $e^{xB}=\sum_{n=0}^{\infty}\frac{x^n B^n}{n!}$. Differentiating $e^{xB}$ gives $\frac{de^{xB}}{dx}=\sum_{n=1}^{\infty}\frac{x^{n-1} B^n}{(n-1)!}=B\sum_{k=0}^{\infty}\frac{x^k B^k}{k!}=Be^{xB}$. Similarly, $\frac{de^{-xB}}{dx}=-Be^{-xB}$. The chain rule yields

$\frac{df(x)}{dx}=\frac{de^{xB}}{dx}Ae^{-xB}+e^{xB}A\frac{de^{-xB}}{dx}$.

So, $\frac{df(x)}{dx}=Be^{xB}Ae^{-xB}-e^{xB}ABe^{-xB}$. For $x=0$, it is obvious that $f^{(1)}(0)=BA-AB=[B,A]$.

Working in a similar manner, the second-order derivative of $f$ is found to be

$\frac{d^{(2)}f(x)}{dx^2}=B^2e^{xB}Ae^{-xB}-2Be^{xB}ABe^{-xB}+e^{xB}AB^2e^{-xB}$.

For $x=0$, it can be seen that $f^{(2)}(0)=B^2A-2BAB+AB^2=[B,[B,A]]$.

The rest of the proof follows inductively.

# Clarification on the countable additivity of Lebesgue measure

As emphasized in remark 1.2.4, p. 19, of Terence Tao’s book “An introduction to measure theory”, finite additivity doesn’t hold for Lebesgue outer measure $m^{*}(\cdot)$ in general, and therefore it doesn’t hold for Lebesgue measure $m(\cdot)$ either. So, the Lebesgue outer measure $m^{*}(E\cup F)$ of the union of two disjoint sets $E, F$ in the Euclidean metric space $(\mathbb{R}^d, |\cdot|)$ does not necessarily satisfy $m^{*}(E\cup F)=m^{*}(E)+m^{*}(F)$.

If $E,F$ are both Lebesgue measurable, then it holds that $m^{*}(E\cup F)=m^{*}(E)+m^{*}(F)$. Moreover, $m^{*}(E)=m(E)$$m^{*}(F)=m(F)$, which means that the union $E\cup F$ is also Lebesgue measurable and ultimately finite additivity follows as $m(E\cup F)=m(E)+m(F)$.

The main point is that if $E\cup F$ is Lebesgue measurable and $E\cap F=\emptyset$, then finite additivity doesn’t follow. Instead, the disjoint set assumption $E\cap F=\emptyset$ needs to be replaced by the positive distance assumption $\mbox{dist}(E, F)=\inf\{|x-y|: x\in E, y\in F\}>0$ to ensure finite additivity for the outer measure, as explained and proved in lemma 1.2.5, p. 19, of Tao’s book.

Considering this limitation in the applicability of finite additivity, a reader may feel alarmed when reading lemma 1.9.c, p. 21, of David Williams’ book “Probability with martingales”, which states that if $m(S)<\infty$ in a measure space $(S, \Sigma, m)$, then $m(F\cup E) = m(F)+m(E)-m(F\cap E)$ for $F, E\in\Sigma$. If $F\cap E=\emptyset$, then $m(F\cup E) = m(F)+m(E)$, so finite additivity holds.

Recall that a measure $m:\Sigma\rightarrow[0,\infty]$ on $(S,\Sigma)$ has the whole σ-algebra $\Sigma$ on $S$ as its domain. This implies that for any $F, E\in\Sigma$ the measures $m(F),m(E)$ exist by assumption, therefore there is no conflict between the aforementioned statements on finite additivity of measure found in Tao’s and Williams’ book.

It becomes clear that operating on a measure space $(\mathbb{R}^d,\Sigma, m)$ seems to avoid the trouble of having to prove the existence of Lebesgue measure for every set of interest, simply because by definition $m(F)$ exists for every $F\in\Sigma$. However, there is no free lunch. A follow-up question arises, as one then needs to show that $F$ belongs to the defined σ-algebra $\Sigma$, which is not always trivial.

For instance, the experiment of tossing a coin infinitely often is presented in p. 24 of Williams’ book by introducing an associated sample space $\Omega$ and subsequently a probability triple $(\Omega,\mathcal{F},P)$. An event $F$ of possible interest is that the ratio of heads in $n$ tosses tends to $1/2$ as $n\rightarrow\infty$. Even for such a seemingly simple experiment, proving that this event $F$, seen as a set, belongs to the σ-algebra $\mathcal{F}$ on $\Omega$ is already not so trivial to prove.

On a final note, lemma 1.2.15, p. 30, in Tao’s book, proves countable additivity for disjoint Lebesgue measurable sets, which subsumes finite additivity. The proof is easy to follow; the claim is proved first for compact, then for bounded and then for unbounded sets. To conclude the present post, a clarification is made in the proof of the bounded case. In particular, it will be explained why for a bounded Lebesgue measurable set $E_n$ there exists a compact set $K_n$ such that $m(E_n) \le m(K_n)+\epsilon/2^n$.

Since $E_n$ is Lebesgue measurable, it follows from exercise 1.2.7.iv in Tao’s book that for any $\epsilon > 0$ there exists a closed set $K_n$ such that $K_n\subseteq E_n$ and $m^{*}(E_n\setminus K_n) \le \epsilon/2^n$. By applying the countable subadditivity of outer measure (see exercise 1.2.3.iii in Tao’s book), $E_n=K_n\cup (E_n\setminus K_n)$ leads to $m^{*}(E_n)\le m^{*}(K_n)+m^{*}(E_n\setminus K_n) \le m^{*}(K_n)+\epsilon/2^n$. Furthermore, $K_n$ is Lebesgue measurable since it is closed (see lemma 1.2.13.ii in Tao’s book). Thus, Lebesgue outer measure can be replaced by Lebesgue measure to give $m(E_n) \le m(K_n)+\epsilon/2^n$. Finally, $K_n$ is bounded, as a subset of the bounded set $E_n$, and closed, so $K_n$ is compact, which completes the proof.

# Proof of Baire’s category theorem

The book “Probability with martingales” by David Williams provides a proof of Baire’s category theorem in p. 203-204. This blog post reproduces the proof in greater detail to clarify some of the statements that are left to the book reader to confirm.

Baire’s category theorem is stated as follows:

If a complete metric space $S$ can be written as a union $S=\underset{n\in\mathbb{N}}{\cup}F_n$ of a countable sequence of closed sets $F_n,n\in\mathbb{N}$, then at least one of these closed sets contains an open ball.

The theorem was proved for $\mathbb{R}^d$ by René-Louis Baire in his PhD thesis “Sur les fonctions de variable réelles“.

Assume a complete metric space $(S,\rho)$ with some metric $\rho:S\times S\rightarrow \mathbb{R}$, which need not be necessarily the Euclidean metric space $(\mathbb{R}^d, |\cdot|)$.

Start by assuming that none of $F_n,n\in\mathbb{N}$, contains an open ball. Moreover, assume that none of the complements $F_n^c,n\in\mathbb{N}$, is empty. If there exists an empty complement $F_n^c=\emptyset$, then $F_n=S$, which is a trivial case of finite union that will be treated at the end of the proof separately.

Since $F_1$ is closed, $F_1^c$ is open, i.e. $F_1^c=(F_1^c)^{\mathrm{o}}$. Due to $F_1^c$ being non-empty, there exists $x_1\in F_1^c=(F_1^c)^{\mathrm{o}}$. Hence, there exists $\epsilon_1 >0$ so that the open ball $B(x_1,\epsilon_1)$ with center $x_1$ and radius $\epsilon_1$ satisfies $B(x_1,\epsilon_1)\subseteq F_1^c$.

By assumption, $F_2$ contains no open ball, so $B(x_1, \epsilon_1/2)\not\subseteq F_2$, which means that $B(x_1, \epsilon_1/2)\cap F_2^c \neq\emptyset$. Furthermore, $B(x_1, \epsilon_1/2)\cap F_2^c$ is an open set as the intersection of the open sets $B(x_1, \epsilon_1/2)$ and $F_2^c$. Thus, there exist $x_2\in B(x_1, \epsilon_1/2)\cap F_2^c$ and $\epsilon_2>0$ with $\epsilon_2<\min \{\epsilon_1/2,1/2\}$ such that $B(x_2,\epsilon_2)\subseteq B(x_1,\epsilon_1/2)\cap F_2^c$. Note that $B(x_2,\epsilon_2)\subseteq B(x_1,\epsilon_1/2)$.

Inductively, choose $\epsilon_n<\min \{\epsilon_{n-1}/2,1/n\},n\in\mathbb{N}$, to introduce the n-th ball $B(x_n,\epsilon_n)\subseteq B(x_{n-1},\epsilon_{n-1}/2)\cap F_n^c$. This way, a countable sequence of nested balls is defined with $B(x_n,\epsilon_n)\subseteq B(x_{n-1},\epsilon_{n-1}/2)$. It will be shown that the centers of these balls make up a Cauchy sequence $(x_n)$. According to the Archimedean property, for any $\epsilon > 0$, there exists $n_0\in\mathbb{N}$ such that $0< 1/n_o < \epsilon$. Let $m,n\in\mathbb{N}$ be natural numbers with $m>n_o$ and $n>n_o$. Using the triangle inequality, $\rho (x_m, x_n) \le \rho (x_m, x_{n_o})+\rho (x_{n_o}, x_n)$. Obviously, $x_n\in B(x_n,\epsilon_n)$. Since $n>n_o$, it follows that $B(x_n,\epsilon_n)\subseteq B(x_{n_o}, \epsilon_{n_o}/2)$, so $x_n\in B(x_{n_o}, \epsilon_{n_o}/2)$. Apparently, $x_n\in B(x_{n_o}, \epsilon_{n_o}/2)$ implies $\rho(x_n,x_{n_o})<\epsilon_{n_o}/2$. As $\epsilon_{n_o}<1/n_o$ and $1/n_o < \epsilon$, it follows that $\rho(x_n,x_{n_o})<\epsilon/2$. Similarly, $\rho(x_m,x_{n_o})<\epsilon/2$, which leads to $\rho (x_m, x_n) < \epsilon$, so the ball centers constitute the Cauchy sequence $(x_n)$. The metric space $(S, \rho)$ has been assumed to be complete, so the Cauchy sequence $(x_n)$ is convergent, therefore its limit $x:=\lim{x_n}$ exists in $S$.

The next step is to show that the limit $x$ of the ball centers $(x_n)$ belongs to the intersection of these balls, i.e. $x\in\underset{n\in\mathbb{N}}{\cap}B(x_n,\epsilon_n)$. Let $n\in \mathbb{N}$ and $m\in\mathbb{N}$ with $m>n$. By applying the triangle inequality, $\rho (x, x_n) \le \rho (x, x_m)+\rho (x_m, x_n)$. Notice that $m>n$ yields $x_m\in B(x_m,\epsilon_m)\subseteq B(x_n, \epsilon_n/2)$, hence $\rho (x_m, x_n) <\epsilon_n/2$, which leads to $\rho (x, x_n) \le \rho (x, x_m)+\epsilon_n/2$. Observe that $(x_m)$ is a subsequence of $(x_n)$, which means that $\underset{m\rightarrow\infty}{\lim}\rho (x, x_m)=0$. So, taking the limit as $m\rightarrow \infty$ produces  $\rho (x, x_n) \le \epsilon_n/2$, whence $x\in B(x_n,\epsilon_n)$ for any $n \in \mathbb{N}$, thus $x\in\underset{n\in\mathbb{N}}{\cap}B(x_n,\epsilon_n)$.

It is obvious from $B(x_n,\epsilon_n)\subseteq B(x_{n-1},\epsilon_{n-1}/2)\cap F_n^c$ that $B(x_n,\epsilon_n)\subseteq F_n^c$. Consequently, $x\in\underset{n\in\mathbb{N}}{\cap}B(x_n,\epsilon_n)\subseteq\underset{n\in\mathbb{N}}{\cap}F_n^c=\left(\underset{n\in\mathbb{N}}{\cup}F_n\right)^c=S^c$. The contradiction $x\in S^c=\emptyset$ has been reached, which completes the proof.

The case of finite union $S=\overset{k}{\underset{n=1}{\cup}}F_n$ is simpler, as it is not needed to construct an infinite sequence of open balls and then take the limit of their centers. Instead, it suffices to observe that the center $x_k$ of the smallest ball $B(x_k,\epsilon_k)$ satisfies $x_k\in\overset{k}{\underset{n=1}{\cap}}B(x_n,\epsilon_n)\subseteq\overset{k}{\underset{n=1}{\cap}} F_n^c$, leading to the analogous contradiction $x_k\in S^c=\emptyset$.

On a final note, it is emphasized that the existence of the countable sequence of ball centers $(x_n)$ is based on the axiom of choice.

# Clarification on the proof that an open set is a countable union of almost disjoint boxes

To establish that the Lebesgue outer measure of any open set in the Euclidean metric space of $\mathbb{R}^d$ is equal to the volume of any partitioning of that set into almost disjoint boxes, lemma 1.2.11 in p. 24 of the book “An introduction to measure theory” by Terence Tao first states that any open set $E\subseteq \mathbb{R}^d$ can be expressed as a countable union of almost disjoint boxes (and in fact as a countable union of almost disjoint closed cubes).

Note that this lemma is a generalization of the fact that every open subset of $\mathbb{R}$ can be expressed as a countable union of disjoint open intervals (see theorem 1.3, p. 6, in the book “Real analysis: measure theory, integration and Hilbert spaces” by Elias M. Stein and Rami Shakarchi and also lemma 2 of this blog post of mine).

The purpose of the present blog post is to elaborate on a detail in the proof of lemma 1.2.11 of Tao’s book. In particular, it will be explained why  every closed dyadic cube $Q$ is contained in exactly one maximal cube $Q^{*}$, without reproducing the rest of the proof.

Introduce the notation $Q_{n,\mathbf{i}}=\overset{d}{\underset{k=1}{\times}}\left[\frac{i_k}{2^n},\frac{i_k+1}{2^n}\right]$ to indicate the dependence of closed dyadic cubes on $n\in\mathbb{N}\cup \left\{0\right\}$ and $\mathbf{i}=(i_1,i_2,\cdots, i_d)\in\mathbb{Z}^{d}$. For every $\mathbf{i}=(i_1,i_2,\cdots, i_d)\in\mathbb{Z}^{d}$ for which there exists a closed dyadic cube $Q_{n,\mathbf{i}}$ contained in $E$, choose the biggest closed dyadic cube $Q_{n,\mathbf{i}}$ in $E$, i.e. choose $n_0=\underset{n}{\min}\left\{n\in\mathbb{N}\cup \left\{0\right\}:Q_{n,\mathbf{i}}\in E\right\}$. It is now obvious that by having capped the cubes by a sidelength of 1, there exists a maximum cube $Q_{n_0,\mathbf{i}}$ among $\left\{Q_{n,\mathbf{i}}\in E:n\in\mathbb{N}\cup \left\{0\right\}\right\}$, which is a maximal cube among those contained in $E$.

What is left to show is that every closed dyadic cube $Q$ can’t be contained in two or more maximal cubes. Assume that $Q$ is contained in two distinct maximal cubes $Q^*$ and $Q^{**}$, i.e. $Q\subseteq Q^*$ and $Q\subseteq Q^{**}$ with $Q^*\neq Q^{**}$. The dyadic nesting property then leads to the contradiction $Q^*\subseteq Q^{**}$ or $Q^{**}\subseteq Q^*$ for the maximal cubes $Q^*$ and $Q^{**}$, since $Q\subseteq Q^*$ and $Q\subseteq Q^{**}$ excludes the possibility of $Q^*$ and $Q^{**}$ being almost disjoint.

# Clarification on the proof of Carathéodory’s lemma

Carathéodory’s lemma, as it appears in p. 197 (appendix A1) of the book “Probability with martingales” by David Williams, is stated as follows:

Let $\lambda$ be an outer measure on the measurable space $(S,\mathcal{G})$. Then the λ-sets in $\mathcal{G}$ form a σ-algebra $\mathcal{L}$ on which $\lambda$ is countably additive, so that $(S,\mathcal{L},\lambda)$ is a measure space.

Three aspects of the proof of Carathéodory’s lemma provided in Williams’ book are clarified in this blog post.

Definition of λ-system

The concept of λ-system, which is used implicitly but it is not defined in Williams’ book, is introduced in this post.

A collection $\mathcal{C}$ of subsets of a set $S$ is called a λ-system on $S$ if

1. $S\in\mathcal{C}$,
2. $L\in\mathcal{C}\Rightarrow L^{c}\in\mathcal{C}$ (it is closed under complements),
3. $(\forall n\in\mathbb{N})L_n\in\mathcal{C}$ with $L_i\cap L_j=\emptyset$ for $i\neq j$ it holds that $\underset{n\in\mathbb{N}}{\cup}L_n\in\mathcal{C}$ (it is closed under countable disjoint unions).

Not that the only difference between a λ-system and a σ-algebra is that the former is closed under countable disjoint unions while the latter is closed under countable unions. Moreover, the first condition on the definition of a λ-system could be alternatively set to $\emptyset\in\mathcal{C}$ instead of $S\in\mathcal{C}$ due to closure under complementarity, i.e. due to the second condition of the definition.

Lemma

If a collection of subsets of a set $S$ is a λ-system and a π-system on $S$, it is also a σ-algebra on $S$.

This lemma is used without being proved in Williams’ book for proving Carathéodory’s lemma. In what follows, the lemma will be proved before proceeding with the proof of Carathéodory’s lemma.

Although not relevant to subsequent developments, it is mentioned that a σ-algebra on a set $S$ is also a λ-system on $S$ as it can be trivially seen from the involved definitions.

Proof of the lemma

Let $\mathcal{C}$ be a collection of subsets $S$ that is both a λ-system and a π-system on $S$. To show that $\mathcal{C}$ is a σ-algebra on $S$, it suffices that it is closed under countable unions.

Let $B_n\in\mathcal{C},n\in\mathbb{N}$. The main idea is to express the collection $\{B_n: n\in\mathbb{N}\}$ as a collection $\{L_n: n\in\mathbb{N}\}$ of pairwise disjoint sets ($L_i\cap L_j=\emptyset$ for $i\neq j$) so that $\underset{n\in\mathbb{N}}{\cup}B_n=\underset{n\in\mathbb{N}}{\cup}L_n$. Along these lines, define $L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)$.

Obviously, $\underset{n\in\mathbb{N}}{\cup}L_n\subseteq \underset{n\in\mathbb{N}}{\cup}B_n$. To prove the converse set inequality, let $x\in\underset{n\in\mathbb{N}}{\cup}B_n$ and assume that $(\forall n\in\mathbb{N}) x\not\in B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)$. In this case, for each $n\in\mathbb{N}$, either $x\not\in B_n$ or $x\in B_n\cap\left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)$. There is at least one $n_{*}\in\mathbb{N}$ such that $x\in B_{n_{*}}\cap\left(\overset{n_{*}-1}{\underset{k=1}{\cup}} B_k\right)$, otherwise $(\forall n\in\mathbb{N}) x\not\in B_n$ leads to the contradiction $x\not\in\underset{n\in\mathbb{N}}{\cup}B_n$. Let $n_o\in\mathbb{N}$ be the minimum natural for which $x\in B_{n_{o}}\cap\left(\overset{n_{o}-1}{\underset{k=1}{\cup}} B_k\right)$. In turn, $x\in \overset{n_{o}-1}{\underset{k=1}{\cup}} B_k\Rightarrow (\exists i. Due to $n_o$ being the smallest natural for which $x\in B_{n_{o}}\cap\left(\overset{n_{o}-1}{\underset{k=1}{\cup}} B_k\right)$, it is deduced that $x\not\in B_i\cap\left(\overset{i-1}{\underset{k=1}{\cup}} B_k\right)$, hence $x\not\in\left(\overset{i-1}{\underset{k=1}{\cup}} B_k\right)$. Thus, $(\exists i\in\mathbb{N}) x\in B_i\setminus \left(\overset{i-1}{\underset{k=1}{\cup}} B_k\right)$, which is a contradiction. Thereby, $\underset{n\in\mathbb{N}}{\cup}B_n\subseteq \underset{n\in\mathbb{N}}{\cup}L_n$, and this establishes the equality $\underset{n\in\mathbb{N}}{\cup}L_n= \underset{n\in\mathbb{N}}{\cup}B_n$.

Assume that there are $i,j\in\mathbb{N}$ with $i\neq j$ and $L_i \cap L_j \neq \emptyset$. Let $x\in L_i \cap L_j$. Without loss of generality assume that $i. Then $x\in B_i$ with $i, while $x\in L_j=B_j\setminus \left(\overset{j-1}{\underset{k=1}{\cup}} B_k\right)$, which means that $(\forall k\in\mathbb{N})$ with $k it holds that $x\not\in B_k$, so a contradiction has been reached. Thereby, the sets $L_n, n\in\mathbb{N}$, are pairwise disjoint.

It has thus been shown that the collection $L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)$ consists of pairwise disjoints sets that satisfy $\underset{n\in\mathbb{N}}{\cup}L_n=\underset{n\in\mathbb{N}}{\cup}B_n$.

Notice that $L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)=B_n \overset{n-1}{\underset{k=1}{\cap}}B_k^c$. Since $\mathcal{C}$ is a λ-system, $B_k^c\in\mathcal{C}$ for the various $k$. Moreover, $\mathcal{C}$ is a π-system, hence the finite intersection $L_n=B_n \overset{n-1}{\underset{k=1}{\cap}}B_k^c$ is also in $\mathcal{C}$. Since the collection $L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)$ is a disjoint union of elements $L_n\in\mathcal{C}$ and $\mathcal{C}$ is a λ-system, it follows that the union $\underset{n\in\mathbb{N}}{\cup}L_n= \underset{n\in\mathbb{N}}{\cup}B_n$ is also in $\mathcal{C}$.

Since the countable (but not necessarily disjoint) union $\underset{n\in\mathbb{N}}{\cup}B_n$ of any collection $\{B_n: n\in\mathbb{N}\}$ of sets $B_n\in\mathcal{C}$ is also in $\mathcal{C}$, it follows that $\mathcal{C}$ is a σ-algebra.

First clarification

The above lemma explains why the proof of Carathéodory’s lemma in Williams’ book states that it suffices to show that for a countable collection $\{L_n: n\in\mathbb{N}\}$ of disjoint sets $L_n\in\mathcal{L}$ it holds that $\underset{n\in\mathbb{N}}{\cup}L_n\in\mathcal{L}$. The conclusion then extends to any such countable union of sets, disjoint or not.

Second clarification

It is mentioned in p. 197 of Williams’ book that from

$\lambda (G)\ge\overset{n}{\underset{k=1}{\sum}}\lambda(L_k\cap G)+\lambda(L^c\cap G)$

follows

$\lambda (G)\ge\overset{\infty}{\underset{k=1}{\sum}}\lambda(L_k\cap G)+\lambda(L^c\cap G)$.

To see why this is the case, recall that the outer measure $\lambda:\mathcal{G}\rightarrow [ 0, \infty ]$ takes in values in $\lambda (G)\in[0, \infty]$ for any $G\in\mathcal{G}$.

Distinguish two cases. If $\lambda (G)\in [0,\infty )$ (i.e. if $\lambda (G)$ is finite), then the sequence $a_n=\overset{n}{\underset{k=1}{\sum}}\lambda(L_k\cap G),n\in\mathbb{N}$, is a bounded increasing sequence, therefore it converges, which means that the limit $\overset{\infty}{\underset{k=1}{\sum}}\lambda(L_k\cap G)=\underset{n\rightarrow\infty}{\lim}a_n < \infty$ exists, so taking limits leads from the former to the latter inequality in the book.

If $\lambda (G)=\infty$, then $\infty \ge\overset{\infty}{\underset{k=1}{\sum}}\lambda(L_k\cap G)+\lambda(L^c\cap G)$ holds trivially.

Third clarification

To show that $\lambda (L)=\underset{k\in\mathbb{N}}{\sum}\lambda (L_k)$ for $L=\underset{k\in\mathbb{N}}{\cup}L_k$, notice first that

$\lambda (L)\le\underset{k\in\mathbb{N}}{\sum}\lambda (L_k)$

follows from the countable subadditivity of the outer measure $\lambda$.

Moreover, setting $G=L$ in equation (d) of p. 197 gives

$\lambda (L)\ge\underset{k\in\mathbb{N}}{\sum}\lambda (L_k\cap L)+\lambda (L^c\cap L)=\underset{k\in\mathbb{N}}{\sum}\lambda (L_k)$,

which concludes the argument.

# Proof: a representation of the Borel σ-algebra on reals

This post elaborates on the proof of a proposition in page 17 of the book “Probability with martingales” by David Williams. The relevant proposition is first stated; the Borel σ-algebra $\mathcal{B}:=\mathcal{B}(\mathbb{R})$ on the set $\mathbb{R}$ of real numbers is given by $\mathcal{B}=\sigma (\pi (\mathbb{R}))$, where $\sigma(\pi(\mathbb{R}))$ denotes the sigma algebra on $\mathbb{R}$ generated by $\pi (\mathbb{R})=\{(-\infty, x]:x\in\mathbb{R}\}$.

It is reminded that the Borel σ-algebra $\mathcal{B}(S)$ on a set $S$ is the σ-algebra $\sigma(\mathcal{O})$ generated by the family $\mathcal{O}$ of open subsets of $S$, i.e. $\mathcal{B}(S):=\sigma(\mathcal{O})$.

Two lemmas are provided before proceeding with the proof.

Lemma 1

Let $S$ be a set, and let $\mathcal{C}_1,\mathcal{C}_2$ be two collections of subsets of $S$. If $\mathcal{C}_1\subseteq\sigma(\mathcal{C}_2)$, then $\sigma(\mathcal{C}_1)\subseteq\sigma(\mathcal{C}_2)$.

Proof of lemma 1

$\sigma(\mathcal{C}_2)$ is a σ-algebra on $S$ that contains $\mathcal{C}_1$ and $\sigma(\mathcal{C}_1)$ is the intersection of all σ-algebras on $S$ that contain $\mathcal{C}_1$, so $\sigma(\mathcal{C}_1)\subseteq\sigma(\mathcal{C}_2)$.

Lemma 2

Every open subset $G$ of $\mathbb{R}$ in the Euclidean metric space $(\mathbb{R},\|\cdot\|)$ can be represented uniquely as a countable union of disjoint open intervals.

Proof of lemma 2

This lemma appears also as theorem 1.3, p. 6, in the book “Real analysis: measure theory, integration and Hilbert spaces” by Elias M. Stein and Rami Shakarchi, where a partial proof is provided as uniqueness is not established. In what follows, the proof from Stein’s and Shakarchi’s book is reproduced in more detail, plus uniqueness is proved.

Let $G$ be an open subset of $\mathcal{R}$ and $x\in G$. Introduce the sets $\mathcal{A}_x:=\{a\in\mathbb{R}: a and $\mathcal{B}_x:=\{b\in\mathbb{R}: x. Moreover, let $a_x:=\mbox{inf}\mathcal{A}_x$ and $b_x:=\mbox{sup}\mathcal{B}_x$, allowing for $a_x,b_x$ to be infinite. Consider also the interval $I_x:=(a_x,b_x)$.

It will be shown that $I_x\subseteq G$. Let $y\in I_x$. Assume that $(\forall a\in\mathcal{A}_x)y\le a$, in which case $y$ is a lower bound of $\mathcal{A}_x$, so $y\le a_x$, thus contradicting the assumption $y\in I_x=(a_x, b_x)$. Hence, there exists $a\in\mathcal{A}_x$ such that $a. By a symmetric argument, there exists $b\in\mathcal{B}_x$ such that $y. So, $y\in (a,b)\subseteq (a_x,b_x)$. If $y=x$, then obviously $y\in G$. If $y\neq x$, then $y\in (a,x)\cup (x, b)$. Note that $a\in\mathcal{A}_x$ means that $(a,x)\subseteq G$. So, if $y\in (a,x)$, then $y\in G$. Similarly, if $y\in (x,b)$, then $y\in G$. Thus, $y\in I_x\Rightarrow y\in G$, which means that $I_x\subseteq G$.

$(\forall x\in G) I_x\subseteq G\Rightarrow\underset{x\in G}{\bigcup}I_x\subseteq G$. Furthermore, let $y\in G$. As $G=G^{o}$, there exists $r>0$ such that $(x-r,x+r)\subseteq G$. It follows from $y-r and $(y-r,y)\subseteq (y-r,y+r)\subseteq G$ that $y\in\mathcal{A}_y$. In a similar way, $y\in\mathcal{B}_y$. So, $a_y\le y\le b_y\Rightarrow y\in I_y\subseteq\underset{x\in G}{\bigcup}I_x$. Hence, $G=\underset{x\in G}{\bigcup}I_x$.

Assume that $I_x\cap I_y\neq\emptyset$, in which case $I_x\cup I_y$ is also an open interval. It is straightforward to confirm that if $I$ is an open interval containing $x$ and being contained in $G$, then $I\subseteq I_x$. So, $I_x\cup I_y\subseteq I_x$ and $I_x\cup I_y\subseteq I_y$, which yields $I_x=I_y$. It has been shown that $I_x\cap I_y\neq\emptyset\Rightarrow I_x=I_y$. By contraposition, $I_x\neq I_y\Rightarrow I_x\cap I_y=\emptyset$, which means that the sets of the collection $\{I_x:x\in G\}$ are pairwise disjoint.

For every set $I_x$ in $\{I_x :x\in G\}$, pick a rational $q_x\in I_x$. The set $S:=\{q_x: q_x\in I_x\}$ of selected rationals is obviously a subset of the set $\mathbb{Q}$ of rationals. $S\subseteq\mathbb{Q}$ implies the cardinality inequality $|S|\le |Q|$, so $S$ is a countable set. Since  $|\{I_x :x\in G\}|=|S|$, it is deduced that the collection $\{I_x :x\in G\}$ is also countable.

It has been proved so far that $G$ can be expressed as a countable union of disjoint open intervals. Although not pertinent to the subsequent proof, it will be shown that such representation is unique for the sake of completeness of exposition.

Let $\{I_k : k\in K\}$ and $\{J_{\ell} : \ell\in L\}$ be two collections of disjoint open intervals such that $G=\underset{k\in K}{\bigcup}I_k=\underset{\ell\in L}{\bigcup}J_{\ell}$.

For any $k\in K$, it holds that $I_k=\underset{\ell\in L}\bigcup(I_k\cap J_{\ell})$. Moreover, the sets of the collection $\{I_k\cap J_{\ell}:\ell\in L\}$ are pairwise disjoint. Since $I_k$ and $J_{_\ell}$ are open, $(I_k\cap J_{\ell})^{o}=I_k^{o}\cap J_{\ell}^{o}=I_k\cap J_{\ell}$, which means that $I_k\cap J_{\ell}$ is also open.

In the collection $\{I_k\cap J_{\ell}:\ell\in L\}$, at least one set is non-empty, otherwise trivially $G=\emptyset$.

In the Euclidean metric space $(\mathbb{R},\|\cdot\|)$, every interval is a connected set. Since $I_k$ is an interval, it is connected. Furthermore, a subset $S$ of a metric space $E$ is connected, if and only if there is no open cover $\{A,B\}$ of $S$ such that $S\cap A\neq\emptyset$, $S\cap B\neq\emptyset$ and $S\cap A\cap B=\emptyset$. Hence, by also considering that $I_k$ is connected, it is deduced that at most one set in $\{I_k\cap J_{\ell}:\ell\in L\}$ is non-empty.

So, exactly one set in $\{I_k\cap J_{\ell}:\ell\in L\}$ is non-empty, say $I_k\cap J_{\ell}$. Consequently, $I_k=I_k\cap J_{\ell}$, which in turns gives $I_k\subseteq J_{\ell}$.

In an analogous manner, starting from $J_{\ell}=\underset{k\in K}\bigcup(J_{\ell}\cap I_k)$, it is concluded that $J_{\ell}\subseteq I_k$. So, $I_k=J_{\ell}$, therefore the collections $\{I_k : k\in K\}$ and $\{J_{\ell} : \ell\in L\}$ coincide.

Proof of $\boldsymbol{\mathcal{B}(\mathbb{R})=\sigma(\pi(\mathbb{R}))}$

Let $\mathcal{O}$ be the family of open subsets in the Euclidean metric space $(\mathbb{R},\|\cdot\|)$. It will be shown that the Borel σ-algebra $\mathcal{B}(\mathbb{R})=\sigma(\mathcal{O})$ on $\mathbb{R}$ is equal to the σ-algebra $\sigma(\pi(\mathbb{R}))$ on $\mathbb{R}$ generated by $\pi (\mathbb{R})=\{(-\infty, x]:x\in\mathbb{R}\}$.

Firstly, it will be proved that $\pi (\mathbb{R})\subseteq\sigma(\mathcal{O})$. Consider a set $(-\infty, x]\in\pi (\mathbb{R})$. Note that $(-\infty, x]=\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n)$. Indeed, for $y\in\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n)$, it holds that $(\forall n\in\mathbb{N})y. Now assume that $x, in which case $y-x>0$, so by the Archimedean property, there exists $n_o \in\mathbb{N}$ such that $0<1/n_o , whence $x+1/n_o, which is a contradiction. Thus, $y\le x$, i.e. $y\in(-\infty,x]$. It has been established that $\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n)\subseteq (-\infty,x]$. Obviously  $(-\infty,x]\subseteq\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n)$ holds too.

Since $(-\infty,x+1/n)\in\mathcal{O}$ and $\sigma(\mathcal{O})$ is a σ-algebra on $\mathbb{R}$, the countable intersection $(-\infty, x]=\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n)$ belongs to $\sigma(\mathcal{O})$.

With the help of lemma 1, $\pi (\mathbb{R})\subseteq\sigma(\mathcal{O})$ yields $\sigma(\pi (\mathbb{R}))\subseteq\sigma(\mathcal{O})=\mathcal{B}(\mathbb{R})$.

To prove the converse set inequality $\sigma(\mathcal{O}) \subseteq\sigma(\pi (\mathbb{R}))$, start by proving that $\mathcal{O}\subseteq\sigma(\pi (\mathbb{R}))$. Along these lines, consider an open set $G\in\mathcal{O}$. It is known from lemma 2 that $X$ can be written as a countable union of open intervals.

Let $(a, b)$ be one of these open intervals in the countable union. Recall from the proof of lemma 2 that $a, b$ can be infinite. Consequently, four cases are distinguished:

1. $a\in\mathbb{R}, b\in\mathbb{R}$. Set $(a, b)=\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n]$, where $\epsilon=(b-a)/2$. This case will be treated in detail.
2. $a=-\infty, b\in\mathbb{R}$. Set $(-\infty, b)=\underset{n\in\mathbb{N}}{\bigcup}(-\infty,b-\epsilon/n]$, where $\epsilon=(b-a)/2$.
3. $a\in\mathbb{R}, b=\infty$. Set $(a,\infty)=\underset{n\in\mathbb{N}}{\bigcup}(a,n]$.
4. $a=\infty, b=\infty$. Set $(-\infty, \infty)=(-\infty,c]\cup(c,\infty)$ for any real $c$, which leads back to case 3.

Case 1 is now treated in detail. The equality $(a, b)=\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n]$ is proved by observing that $y\in (a,b)\Leftrightarrow (\exists n_o \in\mathbb{N}) a. Leaving aside the remaining trivial deductions among these equivalences, it will be clarified that $y\in (a,b)\Rightarrow (\exists n_o \in\mathbb{N}) a. So, $y means $0 < (b-y)/\epsilon$. By the Archimedean property, there exists $n_o\in\mathbb{N}$ such that $0 < 1/n_o < (b-y)/\epsilon$, whence $y.

A set of the form $(a, u], a, can be written as $(a,u]=(-\infty, u]\cap(-\infty, a]^{c}$. So, the sets $(a,b-\epsilon/n], n\in\mathbb{N}$, appearing in $(a, b)=\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n]$ can be expressed as $(a,b-\epsilon/n]=(-\infty,b-\epsilon/n]\cap(-\infty, a]^{c}$. Thereby, $(-\infty,b-\epsilon/n]\in\pi(\mathbb{R})$ and $(-\infty,a]\in\pi(\mathbb{R})$ along with the fact that $\sigma (\pi(\mathbb{R}))$ is a σ-algebra lead to $(a,b-\epsilon/n]\in\sigma (\pi(\mathbb{R}))$. So, $(a,b)\in\sigma (\pi(\mathbb{R}))$, which implies $\mathcal{O}\subseteq\sigma(\pi (\mathbb{R}))$. Finally, $\sigma(\mathcal{O}) \subseteq\sigma(\pi (\mathbb{R}))$ by application of lemma 1, so $\mathcal{B}(\mathbb{R})=\sigma(\mathcal{O}) =\sigma(\pi (\mathbb{R}))$.

# Clarification on the σ-algebra generated by a set

This post elaborates on the concept of σ-algebra generated by a set, as presented in p. 17 of the book “Probability with martingales” by David Williams.

Let $\mathcal{C}$ be a class of subsets of a set $S$. It is stated in the book that the σ-algebra $\sigma(\mathcal{C})$ generated by $\mathcal{C}$ is the smallest σ-algebra on $S$ such that contains $\mathcal{C}$. Moreover, it is mentioned that $\sigma(\mathcal{C})$ is the intersection of all σ-algebras on $S$ that contain $\mathcal{C}$.

Three clarifications will be made. Let $\mathcal{G}:=\{\sigma\mbox{-algebra }\Sigma_i\mbox{ on }S\mbox{ with }\mathcal{C}\subseteq \Sigma_i:i\in I\}$ be the set of all σ-algebras on $S$ that contain $\mathcal{C}$.

Firstly, it will be shown that the intersection $\underset{i}{\bigcap}\Sigma_i$, which is the intersection of all σ-algebras on $S$ that contain $\mathcal{C}$, is itself a σ-algebra on $S$ that contains $\mathcal{C}$. Since $S\in\Sigma_i$ for all $i\in I$, it follows that $S\in\underset{i}{\bigcap}\Sigma_i$. Moreover, consider a set $F\in\underset{i}{\bigcap}\Sigma_i$. So $F\in\Sigma_i$ for all $i$, which means $F^{c}\in\Sigma_i$ for all $i$, so $F^{c}\in\underset{i}{\bigcap}\Sigma_i$. Let $\{F_j:j\in J\}$ be a countable collection of sets $F_j\in\underset{i}{\bigcap}\Sigma_i$. It follows that $F_j\in \Sigma_i$ for all $i\in I$ and all $j\in J$, so $\underset{j}{\bigcup}F_j\in\Sigma_i$ for all $i$, which gives $\underset{j}{\bigcup}F_j\in\underset{i}{\bigcap}\Sigma_i$. It has thus been confirmed that $\underset{i}{\bigcap}\Sigma_i$ is a σ-algebra on $S$. As $\mathcal{C}\subseteq\Sigma_i$ for all $i$, it becomes obvious that $\mathcal{C}\subseteq\underset{i}{\bigcap}\Sigma_i$, so the σ-algebra $\underset{i}{\bigcap}\Sigma_i$ on $S$ contains $\mathcal{C}$.

Consider now those σ-algebras $\Sigma_k\in\mathcal{G}$ with $\mathcal{C}\subseteq\Sigma_k$ that satisfy $\Sigma_k\subseteq\Sigma_i$ for all $\Sigma_i\in\mathcal{G}$ with $\mathcal{C}\subseteq\Sigma_i$. This is what is meant by “smallest σ-algebra $\Sigma$ on $S$ such that $\mathcal{C}\subseteq\Sigma$” in William’s book. Assume that there are two such distinct σ-algebras $\Sigma_k$ and $\Sigma_{\ell}$, i.e. $\Sigma_k\neq\Sigma_{\ell}$. Since $\Sigma_{\ell}\in\mathcal{G}$, it is then deduced that $\Sigma_k\subseteq\Sigma_{\ell}$. Similarly, $\Sigma_{\ell}\subseteq\Sigma_{k}$, so $\Sigma_k=\Sigma_{\ell}$, which is a contradiction. Hence, there is a single σ-algebra in $\mathcal{G}$ that contains $\mathcal{C}$ and is a subset of any other σ-algebra in $\mathcal{G}$ that also contains $\mathcal{C}$.

It has been shown that $\underset{i}{\bigcap}\Sigma_i\in\mathcal{G}$. Moreover, $\underset{i}{\bigcap}\Sigma_i$ satisfies $\underset{i}{\bigcap}\Sigma_i\subseteq\Sigma$ for any $\Sigma\in\mathcal{G}$. Uniqueness has also been established, so $\underset{i}{\bigcap}\Sigma_i$ is the only element in $\mathcal{G}$ that is a subset of all $\Sigma\in\mathcal{G}$. Put in words, $\underset{i}{\bigcap}\Sigma_i$ is the only σ-algebra on $S$ that contains $\mathcal{C}$ and is contained in any other σ-algebra $\Sigma$ on $S$ that satisfies $\mathcal{C}\subseteq\Sigma$.

$\underset{i}{\bigcap}\Sigma_i$ is called the σ-algebra generated by $\mathcal{C}$ and is denoted by $\sigma(\mathcal{C})$.

# Clarification on algebras and σ-algebras of a set

David Williams makes a terminological remark on algebras in p. 16 of his book “Probability with martingales”; he mentions that an algebra $\Sigma$ on a set $S$ (defined ordinarily as a collection of subsets of $S$ that contains $S$ and that is closed under set complementation and closed under finite set unions) is “a true algebra in the algebraists’ sense”.

This remark emphasises two aspects of the concept of algebra $\Sigma$ on a set $S$. Firstly, it means that $\Sigma$ is an algebra over a field $K$. The field $K$ contains exactly two elements, so it can be defined to be the subset $K=\{0,1\}$ of integers. According to the definition of algebra over a field, $\Sigma$ is a vector space equipped with a bilinear product. So any algebra (and consequently any σ-algebra) is a type of vector space of sets (subsets of $S$).

Secondly, consider the defining operations of an algebra $\Sigma$ on a set $S$. The vector space addition is defined to be the symmetric difference $A\Delta B:= (A\cup B)\setminus (A\cap B)$. The bilinear product, which turns the vector space to an algebra over $\{0,1\}$, is the set intersection $A\cap B$. It is straightforward to check that the product $A\cap B$ satisfies right and left distributivity and compatibility with scalars. The two operations (addition and bilinear product) defined from the product space $\Sigma\times\Sigma$ to the vector space $\Sigma$ are both symmetric.

In summary, any algebra (and any σ-algebra) is a vector space of sets equipped with two symmetric set operations.