Category Archives: Books

Clarification on the proof of Carathéodory’s lemma

Carathéodory’s lemma, as it appears in p. 197 (appendix A1) of the book “Probability with martingales” by David Williams, is stated as follows:

Let \lambda be an outer measure on the measurable space (S,\mathcal{G}). Then the λ-sets in \mathcal{G} form a σ-algebra \mathcal{L} on which \lambda is countably additive, so that (S,\mathcal{L},\lambda) is a measure space.

Three aspects of the proof of Carathéodory’s lemma provided in Williams’ book are clarified in this blog post.

Definition of λ-system

The concept of λ-system, which is used implicitly but it is not defined in Williams’ book, is introduced in this post.

A collection \mathcal{C} of subsets of a set S is called a λ-system on S if

  1. S\in\mathcal{C},
  2. L\in\mathcal{C}\Rightarrow L^{c}\in\mathcal{C} (it is closed under complements),
  3. (\forall n\in\mathbb{N})L_n\in\mathcal{C} with L_i\cap L_j=\emptyset for i\neq j it holds that \underset{n\in\mathbb{N}}{\cup}L_n\in\mathcal{C} (it is closed under countable disjoint unions).

Not that the only difference between a λ-system and a σ-algebra is that the former is closed under countable disjoint unions while the latter is closed under countable unions. Moreover, the first condition on the definition of a λ-system could be alternatively set to \emptyset\in\mathcal{C} instead of S\in\mathcal{C} due to closure under complementarity, i.e. due to the second condition of the definition.


If a collection of subsets of a set S is a λ-system and a π-system on S, it is also a σ-algebra on S.

This lemma is used without being proved in Williams’ book for proving Carathéodory’s lemma. In what follows, the lemma will be proved before proceeding with the proof of Carathéodory’s lemma.

Although not relevant to subsequent developments, it is mentioned that a σ-algebra on a set S is also a λ-system on S as it can be trivially seen from the involved definitions.

Proof of the lemma

Let \mathcal{C} be a collection of subsets S that is both a λ-system and a π-system on S. To show that \mathcal{C} is a σ-algebra on S, it suffices that it is closed under countable unions.

Let B_n\in\mathcal{C},n\in\mathbb{N}. The main idea is to express the collection \{B_n: n\in\mathbb{N}\} as a collection \{L_n: n\in\mathbb{N}\} of pairwise disjoint sets (L_i\cap L_j=\emptyset for i\neq j) so that \underset{n\in\mathbb{N}}{\cup}B_n=\underset{n\in\mathbb{N}}{\cup}L_n. Along these lines, define L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right).

Obviously, \underset{n\in\mathbb{N}}{\cup}L_n\subseteq \underset{n\in\mathbb{N}}{\cup}B_n. To prove the converse set inequality, let x\in\underset{n\in\mathbb{N}}{\cup}B_n and assume that (\forall n\in\mathbb{N}) x\not\in B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right). In this case, for each n\in\mathbb{N}, either x\not\in B_n or x\in B_n\cap\left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right). There is at least one n_{*}\in\mathbb{N} such that x\in B_{n_{*}}\cap\left(\overset{n_{*}-1}{\underset{k=1}{\cup}} B_k\right), otherwise (\forall n\in\mathbb{N}) x\not\in B_n leads to the contradiction x\not\in\underset{n\in\mathbb{N}}{\cup}B_n. Let n_o\in\mathbb{N} be the minimum natural for which x\in B_{n_{o}}\cap\left(\overset{n_{o}-1}{\underset{k=1}{\cup}} B_k\right). In turn, x\in \overset{n_{o}-1}{\underset{k=1}{\cup}} B_k\Rightarrow (\exists i<n_o)x\in B_i. Due to n_o being the smallest natural for which x\in B_{n_{o}}\cap\left(\overset{n_{o}-1}{\underset{k=1}{\cup}} B_k\right), it is deduced that x\not\in B_i\cap\left(\overset{i-1}{\underset{k=1}{\cup}} B_k\right), hence x\not\in\left(\overset{i-1}{\underset{k=1}{\cup}} B_k\right). Thus, (\exists i\in\mathbb{N}) x\in B_i\setminus \left(\overset{i-1}{\underset{k=1}{\cup}} B_k\right), which is a contradiction. Thereby, \underset{n\in\mathbb{N}}{\cup}B_n\subseteq \underset{n\in\mathbb{N}}{\cup}L_n, and this establishes the equality \underset{n\in\mathbb{N}}{\cup}L_n= \underset{n\in\mathbb{N}}{\cup}B_n.

Assume that there are i,j\in\mathbb{N} with i\neq j and L_i \cap L_j \neq \emptyset. Let x\in L_i \cap L_j. Without loss of generality assume that i<j. Then x\in B_i with i<j, while x\in L_j=B_j\setminus \left(\overset{j-1}{\underset{k=1}{\cup}} B_k\right), which means that (\forall k\in\mathbb{N}) with k<j it holds that x\not\in B_k, so a contradiction has been reached. Thereby, the sets L_n, n\in\mathbb{N}, are pairwise disjoint.

It has thus been shown that the collection L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right) consists of pairwise disjoints sets that satisfy \underset{n\in\mathbb{N}}{\cup}L_n=\underset{n\in\mathbb{N}}{\cup}B_n.

Notice that L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right)=B_n \overset{n-1}{\underset{k=1}{\cap}}B_k^c. Since \mathcal{C} is a λ-system, B_k^c\in\mathcal{C} for the various k. Moreover, \mathcal{C} is a π-system, hence the finite intersection L_n=B_n \overset{n-1}{\underset{k=1}{\cap}}B_k^c is also in \mathcal{C}. Since the collection L_n:=B_n\setminus \left(\overset{n-1}{\underset{k=1}{\cup}} B_k\right) is a disjoint union of elements L_n\in\mathcal{C} and \mathcal{C} is a λ-system, it follows that the union \underset{n\in\mathbb{N}}{\cup}L_n= \underset{n\in\mathbb{N}}{\cup}B_n is also in \mathcal{C}.

Since the countable (but not necessarily disjoint) union \underset{n\in\mathbb{N}}{\cup}B_n of any collection \{B_n: n\in\mathbb{N}\} of sets B_n\in\mathcal{C} is also in \mathcal{C}, it follows that \mathcal{C} is a σ-algebra.

First clarification

The above lemma explains why the proof of Carathéodory’s lemma in Williams’ book states that it suffices to show that for a countable collection \{L_n: n\in\mathbb{N}\} of disjoint sets L_n\in\mathcal{L} it holds that \underset{n\in\mathbb{N}}{\cup}L_n\in\mathcal{L}. The conclusion then extends to any such countable union of sets, disjoint or not.

Second clarification

It is mentioned in p. 197 of Williams’ book that from

\lambda (G)\ge\overset{n}{\underset{k=1}{\sum}}\lambda(L_k\cap G)+\lambda(L^c\cap G)


\lambda (G)\ge\overset{\infty}{\underset{k=1}{\sum}}\lambda(L_k\cap G)+\lambda(L^c\cap G).

To see why this is the case, recall that the outer measure \lambda:\mathcal{G}\rightarrow [ 0, \infty ] takes in values in \lambda (G)\in[0, \infty] for any G\in\mathcal{G}.

Distinguish two cases. If \lambda (G)\in [0,\infty ) (i.e. if \lambda (G) is finite), then the sequence a_n=\overset{n}{\underset{k=1}{\sum}}\lambda(L_k\cap G),n\in\mathbb{N}, is a bounded increasing sequence, therefore it converges, which means that the limit \overset{\infty}{\underset{k=1}{\sum}}\lambda(L_k\cap G)=\underset{n\rightarrow\infty}{\lim}a_n < \infty exists, so taking limits leads from the former to the latter inequality in the book.

If \lambda (G)=\infty, then \infty \ge\overset{\infty}{\underset{k=1}{\sum}}\lambda(L_k\cap G)+\lambda(L^c\cap G) holds trivially.

Third clarification

To show that \lambda (L)=\underset{k\in\mathbb{N}}{\sum}\lambda (L_k) for L=\underset{k\in\mathbb{N}}{\cup}L_k, notice first that

\lambda (L)\le\underset{k\in\mathbb{N}}{\sum}\lambda (L_k)

follows from the countable subadditivity of the outer measure \lambda.

Moreover, setting G=L in equation (d) of p. 197 gives

\lambda (L)\ge\underset{k\in\mathbb{N}}{\sum}\lambda (L_k\cap L)+\lambda (L^c\cap L)=\underset{k\in\mathbb{N}}{\sum}\lambda (L_k),

which concludes the argument.

Proof: a representation of the Borel σ-algebra on reals

This post elaborates on the proof of a proposition in page 17 of the book “Probability with martingales” by David Williams. The relevant proposition is first stated; the Borel σ-algebra \mathcal{B}:=\mathcal{B}(\mathbb{R}) on the set \mathbb{R} of real numbers is given by \mathcal{B}=\sigma (\pi (\mathbb{R})), where \sigma(\pi(\mathbb{R})) denotes the sigma algebra on \mathbb{R} generated by \pi (\mathbb{R})=\{(-\infty, x]:x\in\mathbb{R}\}.

It is reminded that the Borel σ-algebra \mathcal{B}(S) on a set S is the σ-algebra \sigma(\mathcal{O}) generated by the family \mathcal{O} of open subsets of S, i.e. \mathcal{B}(S):=\sigma(\mathcal{O}).

Two lemmas are provided before proceeding with the proof.

Lemma 1

Let S be a set, and let \mathcal{C}_1,\mathcal{C}_2 be two collections of subsets of S. If \mathcal{C}_1\subseteq\sigma(\mathcal{C}_2), then \sigma(\mathcal{C}_1)\subseteq\sigma(\mathcal{C}_2).

Proof of lemma 1

\sigma(\mathcal{C}_2) is a σ-algebra on S that contains \mathcal{C}_1 and \sigma(\mathcal{C}_1) is the intersection of all σ-algebras on S that contain \mathcal{C}_1, so \sigma(\mathcal{C}_1)\subseteq\sigma(\mathcal{C}_2).

Lemma 2

Every open subset G of \mathbb{R} in the Euclidean metric space (\mathbb{R},\|\cdot\|) can be represented uniquely as a countable union of disjoint open intervals.

Proof of lemma 2

This lemma appears also as theorem 1.3, p. 6, in the book “Real analysis: measure theory, integration and Hilbert spaces” by Elias M. Stein and Rami Shakarchi, where a partial proof is provided as uniqueness is not established. In what follows, the proof from Stein’s and Shakarchi’s book is reproduced in more detail, plus uniqueness is proved.

Let G be an open subset of \mathcal{R} and x\in G. Introduce the sets \mathcal{A}_x:=\{a\in\mathbb{R}: a<x, (a,x)\subseteq G\} and \mathcal{B}_x:=\{b\in\mathbb{R}: x<b, (x,b)\subseteq G\}. Moreover, let a_x:=\mbox{inf}\mathcal{A}_x and b_x:=\mbox{sup}\mathcal{B}_x, allowing for a_x,b_x to be infinite. Consider also the interval I_x:=(a_x,b_x).

It will be shown that I_x\subseteq G. Let y\in I_x. Assume that (\forall a\in\mathcal{A}_x)y\le a, in which case y is a lower bound of \mathcal{A}_x, so y\le a_x, thus contradicting the assumption y\in I_x=(a_x, b_x). Hence, there exists a\in\mathcal{A}_x such that a<y. By a symmetric argument, there exists b\in\mathcal{B}_x such that y<b. So, y\in (a,b)\subseteq (a_x,b_x). If y=x, then obviously y\in G. If y\neq x, then y\in (a,x)\cup (x, b). Note that a\in\mathcal{A}_x means that (a,x)\subseteq G. So, if y\in (a,x), then y\in G. Similarly, if y\in (x,b), then y\in G. Thus, y\in I_x\Rightarrow y\in G, which means that I_x\subseteq G.

(\forall x\in G) I_x\subseteq G\Rightarrow\underset{x\in G}{\bigcup}I_x\subseteq G. Furthermore, let y\in G. As G=G^{o}, there exists r>0 such that (x-r,x+r)\subseteq G. It follows from y-r<y and (y-r,y)\subseteq (y-r,y+r)\subseteq G that y\in\mathcal{A}_y. In a similar way, y\in\mathcal{B}_y. So, a_y\le y\le b_y\Rightarrow y\in I_y\subseteq\underset{x\in G}{\bigcup}I_x. Hence, G=\underset{x\in G}{\bigcup}I_x.

Assume that I_x\cap I_y\neq\emptyset, in which case I_x\cup I_y is also an open interval. It is straightforward to confirm that if I is an open interval containing x and being contained in G, then I\subseteq I_x. So, I_x\cup I_y\subseteq I_x and I_x\cup I_y\subseteq I_y, which yields I_x=I_y. It has been shown that I_x\cap I_y\neq\emptyset\Rightarrow I_x=I_y. By contraposition, I_x\neq I_y\Rightarrow I_x\cap I_y=\emptyset, which means that the sets of the collection \{I_x:x\in G\} are pairwise disjoint.

For every set I_x in \{I_x :x\in G\}, pick a rational q_x\in I_x. The set S:=\{q_x: q_x\in I_x\} of selected rationals is obviously a subset of the set \mathbb{Q} of rationals. S\subseteq\mathbb{Q} implies the cardinality inequality |S|\le |Q|, so S is a countable set. Since  |\{I_x :x\in G\}|=|S|, it is deduced that the collection \{I_x :x\in G\} is also countable.

It has been proved so far that G can be expressed as a countable union of disjoint open intervals. Although not pertinent to the subsequent proof, it will be shown that such representation is unique for the sake of completeness of exposition.

Let \{I_k : k\in K\} and \{J_{\ell} : \ell\in L\} be two collections of disjoint open intervals such that G=\underset{k\in K}{\bigcup}I_k=\underset{\ell\in L}{\bigcup}J_{\ell}.

For any k\in K, it holds that I_k=\underset{\ell\in L}\bigcup(I_k\cap J_{\ell}). Moreover, the sets of the collection \{I_k\cap J_{\ell}:\ell\in L\} are pairwise disjoint. Since I_k and J_{_\ell} are open, (I_k\cap J_{\ell})^{o}=I_k^{o}\cap J_{\ell}^{o}=I_k\cap J_{\ell}, which means that I_k\cap J_{\ell} is also open.

In the collection \{I_k\cap J_{\ell}:\ell\in L\}, at least one set is non-empty, otherwise trivially G=\emptyset.

In the Euclidean metric space (\mathbb{R},\|\cdot\|), every interval is a connected set. Since I_k is an interval, it is connected. Furthermore, a subset S of a metric space E is connected, if and only if there is no open cover \{A,B\} of S such that S\cap A\neq\emptyset, S\cap B\neq\emptyset and S\cap A\cap B=\emptyset. Hence, by also considering that I_k is connected, it is deduced that at most one set in \{I_k\cap J_{\ell}:\ell\in L\} is non-empty.

So, exactly one set in \{I_k\cap J_{\ell}:\ell\in L\} is non-empty, say I_k\cap J_{\ell}. Consequently, I_k=I_k\cap J_{\ell}, which in turns gives I_k\subseteq J_{\ell}.

In an analogous manner, starting from J_{\ell}=\underset{k\in K}\bigcup(J_{\ell}\cap I_k), it is concluded that J_{\ell}\subseteq I_k. So, I_k=J_{\ell}, therefore the collections \{I_k : k\in K\} and \{J_{\ell} : \ell\in L\} coincide.

Proof of \boldsymbol{\mathcal{B}(\mathbb{R})=\sigma(\pi(\mathbb{R}))}

Let \mathcal{O} be the family of open subsets in the Euclidean metric space (\mathbb{R},\|\cdot\|). It will be shown that the Borel σ-algebra \mathcal{B}(\mathbb{R})=\sigma(\mathcal{O}) on \mathbb{R} is equal to the σ-algebra \sigma(\pi(\mathbb{R})) on \mathbb{R} generated by \pi (\mathbb{R})=\{(-\infty, x]:x\in\mathbb{R}\}.

Firstly, it will be proved that \pi (\mathbb{R})\subseteq\sigma(\mathcal{O}). Consider a set (-\infty, x]\in\pi (\mathbb{R}). Note that (-\infty, x]=\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n). Indeed, for y\in\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n), it holds that (\forall n\in\mathbb{N})y<x+1/n. Now assume that x<y, in which case y-x>0, so by the Archimedean property, there exists n_o \in\mathbb{N} such that 0<1/n_o <y-x, whence x+1/n_o<y, which is a contradiction. Thus, y\le x, i.e. y\in(-\infty,x]. It has been established that \underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n)\subseteq (-\infty,x]. Obviously  (-\infty,x]\subseteq\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n) holds too.

Since (-\infty,x+1/n)\in\mathcal{O} and \sigma(\mathcal{O}) is a σ-algebra on \mathbb{R}, the countable intersection (-\infty, x]=\underset{n\in\mathbb{N}}{\bigcap}(-\infty,x+1/n) belongs to \sigma(\mathcal{O}).

With the help of lemma 1, \pi (\mathbb{R})\subseteq\sigma(\mathcal{O}) yields \sigma(\pi (\mathbb{R}))\subseteq\sigma(\mathcal{O})=\mathcal{B}(\mathbb{R}).

To prove the converse set inequality \sigma(\mathcal{O}) \subseteq\sigma(\pi (\mathbb{R})), start by proving that \mathcal{O}\subseteq\sigma(\pi (\mathbb{R})). Along these lines, consider an open set G\in\mathcal{O}. It is known from lemma 2 that X can be written as a countable union of open intervals.

Let (a, b) be one of these open intervals in the countable union. Recall from the proof of lemma 2 that a, b can be infinite. Consequently, four cases are distinguished:

  1. a\in\mathbb{R}, b\in\mathbb{R}. Set (a, b)=\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n], where \epsilon=(b-a)/2. This case will be treated in detail.
  2. a=-\infty, b\in\mathbb{R}. Set (-\infty, b)=\underset{n\in\mathbb{N}}{\bigcup}(-\infty,b-\epsilon/n], where \epsilon=(b-a)/2.
  3. a\in\mathbb{R}, b=\infty. Set (a,\infty)=\underset{n\in\mathbb{N}}{\bigcup}(a,n].
  4. a=\infty, b=\infty. Set (-\infty, \infty)=(-\infty,c]\cup(c,\infty) for any real c, which leads back to case 3.

Case 1 is now treated in detail. The equality (a, b)=\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n] is proved by observing that y\in (a,b)\Leftrightarrow (\exists n_o \in\mathbb{N}) a<y\le b-\epsilon/n_o\Leftrightarrow y\in\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n]. Leaving aside the remaining trivial deductions among these equivalences, it will be clarified that y\in (a,b)\Rightarrow (\exists n_o \in\mathbb{N}) a<y\le b-\epsilon/n_o. So, y<b means 0 < (b-y)/\epsilon. By the Archimedean property, there exists n_o\in\mathbb{N} such that 0 < 1/n_o < (b-y)/\epsilon, whence y<b-\epsilon/n_o.

A set of the form (a, u], a<u, u\in\mathbb{R}, can be written as (a,u]=(-\infty, u]\cap(-\infty, a]^{c}. So, the sets (a,b-\epsilon/n], n\in\mathbb{N}, appearing in (a, b)=\underset{n\in\mathbb{N}}{\bigcup}(a,b-\epsilon/n] can be expressed as (a,b-\epsilon/n]=(-\infty,b-\epsilon/n]\cap(-\infty, a]^{c}. Thereby, (-\infty,b-\epsilon/n]\in\pi(\mathbb{R}) and (-\infty,a]\in\pi(\mathbb{R}) along with the fact that \sigma (\pi(\mathbb{R})) is a σ-algebra lead to (a,b-\epsilon/n]\in\sigma (\pi(\mathbb{R})). So, (a,b)\in\sigma (\pi(\mathbb{R})), which implies \mathcal{O}\subseteq\sigma(\pi (\mathbb{R})). Finally, \sigma(\mathcal{O}) \subseteq\sigma(\pi (\mathbb{R})) by application of lemma 1, so \mathcal{B}(\mathbb{R})=\sigma(\mathcal{O}) =\sigma(\pi (\mathbb{R})).

Clarification on the σ-algebra generated by a set

This post elaborates on the concept of σ-algebra generated by a set, as presented in p. 17 of the book “Probability with martingales” by David Williams.

Let \mathcal{C} be a class of subsets of a set S. It is stated in the book that the σ-algebra \sigma(\mathcal{C}) generated by \mathcal{C} is the smallest σ-algebra on S such that contains \mathcal{C}. Moreover, it is mentioned that \sigma(\mathcal{C}) is the intersection of all σ-algebras on S that contain \mathcal{C}.

Three clarifications will be made. Let \mathcal{G}:=\{\sigma\mbox{-algebra }\Sigma_i\mbox{ on }S\mbox{ with }\mathcal{C}\subseteq \Sigma_i:i\in I\} be the set of all σ-algebras on S that contain \mathcal{C}.

Firstly, it will be shown that the intersection \underset{i}{\bigcap}\Sigma_i, which is the intersection of all σ-algebras on S that contain \mathcal{C}, is itself a σ-algebra on S that contains \mathcal{C}. Since S\in\Sigma_i for all i\in I, it follows that S\in\underset{i}{\bigcap}\Sigma_i. Moreover, consider a set F\in\underset{i}{\bigcap}\Sigma_i. So F\in\Sigma_i for all i, which means F^{c}\in\Sigma_i for all i, so F^{c}\in\underset{i}{\bigcap}\Sigma_i. Let \{F_j:j\in J\} be a countable collection of sets F_j\in\underset{i}{\bigcap}\Sigma_i. It follows that F_j\in \Sigma_i for all i\in I and all j\in J, so \underset{j}{\bigcup}F_j\in\Sigma_i for all i, which gives \underset{j}{\bigcup}F_j\in\underset{i}{\bigcap}\Sigma_i. It has thus been confirmed that \underset{i}{\bigcap}\Sigma_i is a σ-algebra on S. As \mathcal{C}\subseteq\Sigma_i for all i, it becomes obvious that \mathcal{C}\subseteq\underset{i}{\bigcap}\Sigma_i, so the σ-algebra \underset{i}{\bigcap}\Sigma_i on S contains \mathcal{C}.

Consider now those σ-algebras \Sigma_k\in\mathcal{G} with \mathcal{C}\subseteq\Sigma_k that satisfy \Sigma_k\subseteq\Sigma_i for all \Sigma_i\in\mathcal{G} with \mathcal{C}\subseteq\Sigma_i. This is what is meant by “smallest σ-algebra \Sigma on S such that \mathcal{C}\subseteq\Sigma” in William’s book. Assume that there are two such distinct σ-algebras \Sigma_k and \Sigma_{\ell}, i.e. \Sigma_k\neq\Sigma_{\ell}. Since \Sigma_{\ell}\in\mathcal{G}, it is then deduced that \Sigma_k\subseteq\Sigma_{\ell}. Similarly, \Sigma_{\ell}\subseteq\Sigma_{k}, so \Sigma_k=\Sigma_{\ell}, which is a contradiction. Hence, there is a single σ-algebra in \mathcal{G} that contains \mathcal{C} and is a subset of any other σ-algebra in \mathcal{G} that also contains \mathcal{C}.

It has been shown that \underset{i}{\bigcap}\Sigma_i\in\mathcal{G}. Moreover, \underset{i}{\bigcap}\Sigma_i satisfies \underset{i}{\bigcap}\Sigma_i\subseteq\Sigma for any \Sigma\in\mathcal{G}. Uniqueness has also been established, so \underset{i}{\bigcap}\Sigma_i is the only element in \mathcal{G} that is a subset of all \Sigma\in\mathcal{G}. Put in words, \underset{i}{\bigcap}\Sigma_i is the only σ-algebra on S that contains \mathcal{C} and is contained in any other σ-algebra \Sigma on S that satisfies \mathcal{C}\subseteq\Sigma.

\underset{i}{\bigcap}\Sigma_i is called the σ-algebra generated by \mathcal{C} and is denoted by \sigma(\mathcal{C}).

Clarification on algebras and σ-algebras of a set

David Williams makes a terminological remark on algebras in p. 16 of his book “Probability with martingales”; he mentions that an algebra \Sigma on a set S (defined ordinarily as a collection of subsets of S that contains S and that is closed under set complementation and closed under finite set unions) is “a true algebra in the algebraists’ sense”.

This remark emphasises two aspects of the concept of algebra \Sigma on a set S. Firstly, it means that \Sigma is an algebra over a field K. The field K contains exactly two elements, so it can be defined to be the subset K=\{0,1\} of integers. According to the definition of algebra over a field, \Sigma is a vector space equipped with a bilinear product. So any algebra (and consequently any σ-algebra) is a type of vector space of sets (subsets of S).

Secondly, consider the defining operations of an algebra \Sigma on a set S. The vector space addition is defined to be the symmetric difference A\Delta B:= (A\cup B)\setminus (A\cap B). The bilinear product, which turns the vector space to an algebra over \{0,1\}, is the set intersection A\cap B. It is straightforward to check that the product A\cap B satisfies right and left distributivity and compatibility with scalars. The two operations (addition and bilinear product) defined from the product space \Sigma\times\Sigma to the vector space \Sigma are both symmetric.

In summary, any algebra (and any σ-algebra) is a vector space of sets equipped with two symmetric set operations.

Proof: a and n are relatively prime iff ax≡b(modn) has a solution modulo n

The lemma in p. 231 of the book “A book of abstract algebra” by Charles C. Pinter states that if a and n are relatively prime, then the equation ax\equiv b(\mbox{mod} n) has a solution modulo n. It is also stated below the lemma that if a and n are not relatively prime, then the equation ax\equiv b(\mbox{mod} n) has no solution modulo n. By contraposition, the latter statement means that if the equation ax\equiv b(\mbox{mod} n) has a solution modulo n, then a and n are relatively prime.

More generally, less emphasis is given on the converse of the lemma in the literature. The purpose of this post is to clarify that the equivalence holds. To start with, a and n are relatively prime if and only if their greatest common divisor satisfies \mbox{gcd}(a, n)=1. Along these lines, the lemma can then be stated in the following way; \mbox{gcd}(a, n)=1 if and only if the equation ax\equiv b(\mbox{mod} n) has a solution modulo n.

Proofs of the fact that \mbox{gcd}(a, n)=1 means that the equation ax\equiv b(\mbox{mod} n) has a solution modulo n are available in Pinter’s book and elsewhere in the literature, so interest is in proving the converse. It will be shown that if the equation ax\equiv b(\mbox{mod} n) has a solution modulo n, then \mbox{gcd}(a, n)=1.

If the congruence ax\equiv b(\mbox{mod} n) has a solution, then it has a solution modulo m, where m = n/\mbox{gcd}(a,n) (see for example theorem 2, p. 231, in Pinter’s book). Obviously, m\le n.

It will be shown that m = n. Towards this end, assume that m < n. So, it has been assumed that the congruence ax\equiv b(\mbox{mod} n) has a solution modulo n. This means that it has a solution, therefore it has a solution modulo m.

Let x\equiv c(\mbox{mod}m) be the solution modulo m and x\equiv r(\mbox{mod}n) the solution modulo n. Since x = c is a solution of the congruence ax\equiv b(\mbox{mod} n), it follows that c\equiv r(\mbox{mod}n). The congruences x\equiv r(\mbox{mod}n) and c\equiv r(\mbox{mod}n) lead to x\equiv c(\mbox{mod}n).

So, x\equiv c(\mbox{mod}n)\Leftrightarrow x\equiv c(\mbox{mod}m). Notice now that x\equiv c(\mbox{mod}n) implies x-c\equiv 0(\mbox{mod}n), hence the coset equality \overline{x-c}=\overline{0}_n holds in \mathbb{Z}_n, where \overline{0}_n=\{\dots,-2n,-n,0,n,2n,\dots\}. Similarly, x\equiv c(\mbox{mod}m) yields \overline{x-c}=\overline{0}_m in \mathbb{Z}_m, where \overline{0}_m=\{\dots,-2m,-m,0,m,2m,\dots\}. It has thus been concluded that \overline{0}_n=\overline{0}_m, which is a contradiction since for example m\in\overline{0}_m, while m\not\in\overline{0}_n for m<n. Thereby, m=n.

m=n and m = n/\mbox{gcd}(a,n) give \mbox{gcd}(a, n)=1, which completes the proof.


Proof: characterisation of Jordan measurability

The following proof is a solution to exercise 1.1.5 of the book “An introduction to measure theory” by Terence Tao.

Some notation will be established before proceeding with the exercise. Let \mathcal{E}(\mathbb{R}^d) be the set of elementary sets on \mathbb{R}^d. Denote by \mathcal{L}(E):=\left\{m(A):A\in\mathcal{E}(\mathbb{R}^d),A\subseteq E\right\} and by \mathcal{U}(E):=\left\{m(B):B\in\mathcal{E}(\mathbb{R}^d),E\subseteq B\right\} the sets of elementary measures of all elementary subsets and supersets of a bounded set E\subseteq\mathbb{R}^d, respectively. Let m_{*}(E)=\sup{\mathcal{L}(E)} and m^{*}(E)=\inf{\mathcal{U}(E)} be the Jordan inner and Jordan outer measures of E, respectively.

Exercise 1.1.5 requires to prove the equivalence of the following three statements, as a way of characterising Jordan measurability:

  1. E is Jordan measurable, which means that m_{*}(E)=m^{*}(E).
  2. For every \epsilon>0 there exist elementary sets A\subseteq E\subseteq B such that m(B\setminus A)\le\epsilon.
  3. For every \epsilon>0 there exists an elementary set A such that m^{*}(A\triangle E)\le\epsilon.

It suffices to prove that [1]\Rightarrow [2]\Rightarrow [3]\Rightarrow [1]. To provide further practice and familiarity with Jordan inner and outer measures, it will be additionally shown how to prove [1]\Rightarrow [3]\Rightarrow [2]\Rightarrow [1].

\boxed{[1]\Rightarrow [2]}

A reductio ad absurdum argument will be used. Assume that there exists \epsilon_0>0 such that for all elementary sets A,B\in\mathcal{E}(\mathbb{R}^d) with A\subseteq E\subseteq B the inequality m(B\setminus A)>\epsilon_0 holds.

Considering the set equality B=A\cup (B\setminus A), it follows from A\subseteq B that m(B\setminus A)=m(B)-m(A). Hence, m(B\setminus A)=m(B)-m(A)>\epsilon_0.

So, m(A)+\epsilon_0\le m^{*}(E), since m(A)+\epsilon is a lower bound of \mathcal{U}(E) and m^{*}(E)=\inf{\mathcal{U}(E)}. In turn, m^{*}(E)-\epsilon_0 is an upper bound of \mathcal{L}(E) and m_{*}(E)=\sup{\mathcal{L}(E)}, therefore m_{*}(E)\le m^{*}(E)-\epsilon_0.

Thus, m_{*}(E)<m^{*}(E), which contradicts the assumption m_{*}(E)=m^{*}(E).

\boxed{[2]\Rightarrow [3]}

Assume that there exists \epsilon_0>0 such that \forall C\in\mathcal{E}(\mathbb{R}^d) holds m^{*}(C\triangle E)>\epsilon_0.

According to the assumed statement [2], for \epsilon=\epsilon_0, \exists A, B\in\mathcal{E}(\mathbb{R}^d) with A\subseteq E\subseteq B such that m(B\setminus A)\le\epsilon_0.

Pick C=B, so m^{*}(B\triangle E)=m^{*}(C\triangle E)>\epsilon_0. It follows from E\subseteq B and B\triangle E=(B\setminus E)\cup(E\setminus B) that B\triangle E=B\setminus E, so m^{*}(B\setminus E)=m^{*}(B\triangle E)>\epsilon_0.

B\setminus E\subseteq B\setminus A, since A\subseteq E. By also taking into account that B\setminus A is elementary and the inequality m(B\setminus A)\le\epsilon_0, the conclusion is m^{*}(B\setminus E)\le m(B\setminus A)\le\epsilon_0.

A contradiction has been reached, as it has been deduced m^{*}(B\setminus E)>\epsilon_0 and m^{*}(B\setminus E)\le\epsilon_0 on the basis of the negation of statement [3].

\boxed{[3]\Rightarrow [1]}

Before proceeding with the proof of [3]\Rightarrow [1], four lemmas will be proved.

Lemma 1

The Jordan inner measure m_{*}(E) of any bounded set E\subseteq\mathbb{R}^d is less than or equal to its Jordan outer measure m^{*}(E), i.e. m_{*}(E)\le m^{*}(E).

Proof of lemma 1

For any elementary sets A,B with A\subseteq E\subseteq B, the set relation A\subseteq B yields m(A)\le m(B). This means that any m(A)\in\mathcal{L}(E) is a lower bound of \mathcal{U}(E), therefore m(A)\le m^{*}(E). Since the last inequality holds for any m(A)\in\mathcal{L}(E), it follows that m^{*}(E) is an upper bound of \mathcal{L}(E), thus m_{*}(E)\le m^{*}(E).

Lemma 2

The elementary and Jordan measures of any elementary set X coincide, that is m_{*}(X)=m^{*}(X)=m(X).

Proof of lemma 2

It suffices to notice that X\subseteq X, whence m^{*}(X)\le m(X) and m(X)\le m_{*}(X), thereby m^{*}(X)\le m_{*}(X). It is also known from lemma 1 that m_{*}(X)\le m^{*}(X), so m_{*}(X)=m^{*}(X). Finally, m^{*}(X)\le m(X), m(X)\le m_{*}(X) and m_{*}(X)=m^{*}(X) yield m(X)=m_{*}(X)=m^{*}(X).

Lemma 3

Let E\subseteq\mathbb{R}^d be a bounded set and B\in\mathcal{E}(\mathbb{R}^d) with E\subseteq B. It then holds that m(B)-m_{*}(E)\le m^{*}(B\setminus E).

Proof of lemma 3

Recall that for a set X\subseteq\mathbb{R} the following equivalences hold:

  • \ell=\inf{X}\Leftrightarrow(\forall\epsilon>0)(\exists x\in X)x<\ell+\epsilon.
  • u=\sup{X}\Leftrightarrow(\forall\epsilon>0)(\exists x\in X)u-\epsilon < x.

According to the former equivalence, for any \epsilon>0 there exists an elementary set B\setminus E\subseteq C such that

m(C)<m^{*}(B\setminus E)+\epsilon /2\ \ \ \ (1).

By using the latter equivalence, there exists an elementary set A\subseteq B such that

m_{*}(B)-\epsilon /2<m(A)\ \ \ \ (2).

It follows from lemma 2 and inequality (2) that m(B)-m_{*}(E)<m(A)-m_{*}(E)+\epsilon /2. B\setminus E\subseteq C and A\subseteq B lead to A\setminus C\subseteq E, so m(A\setminus C)\le m_{*}(E), which in turn gives

m(B)-m_{*}(E)<m(A)-m(A\setminus C)+\epsilon /2\ \ \ \ (3).

Inequality (3) connects the statement of lemma 3 to an analogous simpler statement for elementary sets; in particular, it will be shown that

m(A)-m(C)\le m(A\setminus C)\ \ \ \ (4).

Indeed, (A\setminus C)\cup (A\cap C)=A, so m(A)-m(A\cap C)=m(A\setminus C). Moreover, A\cap C\subseteq C, which means m(A\cap C)\le m(C), therefore m(A)-m(C)\le m(A)-m(A\cap C). Inequality (4) has thus been reached.

(3) and (4) yield m(B)-m_{*}(E)<m(C)+\epsilon/2, which is combined with (1) to give

m(B)-m_{*}(E)<m^{*}(B\setminus E)+\epsilon\ \ \ \ (5).

For two reals x, y, if (\forall\epsilon>0) x\le y+\epsilon, then x\le y. This can be shown by assuming x>y, whence 0<x-y\le\epsilon\Rightarrow x=y, contradiction. As inequality (5) holds for any \epsilon>0, the conclusion m(B)-m_{*}(E)\le m^{*}(B\setminus E) follows.

Lemma 4

The Jordan outer measure of the union of two bounded sets X\subseteq\mathbb{R}^d and Y\subseteq\mathbb{R}^d is less than or equal to the sum of the Jordan outer measures of the two sets, i.e. m^{*}(X\cup Y)\le m^{*}(X)+m^{*}(Y).

Proof of lemma 4

Let V\subseteq\mathbb{R}^d,W\subseteq\mathbb{R}^d be elementary sets with X\subseteq V,Y\subseteq W.

Start by noticing the set equality V\cup W=V\cup(W\setminus V). So, m(V\cup W)=m(V)+m(W\setminus V). Moreover, W\setminus V\subseteq W implies that m(W\setminus V)\le m(W). Thus,

m(V\cup W)\le m(V)+m(W)\ \ \ \ (6).

Inequality (6) holds for elementary sets, thereby it is a special case of lemma 4.

m^{*}(X\cup Y)\le m(V\cup W), since X\cup Y \subseteq V\cup W. By taking into account inequality (6), m^{*}(X\cup Y)\le m(V)+m(W). So, m^{*}(X\cup Y)-m(V) is a lower bound of \mathcal{U}(Y) and m^{*}(X\cup Y)-m(V)\le m^{*}(Y). Furthermore, m^{*}(X\cup Y)-m^{*}(Y) is a lower bound of \mathcal{U}(X), so m^{*}(X\cup Y)-m^{*}(Y)\le m^{*}(X), quod erat demonstrandum.

Proof of \boldsymbol{[3]\Rightarrow [1]} using the lemmas

The main idea of the proof of \boldsymbol{[3]\Rightarrow [1]} is to show that for any \epsilon>0 the inequality m^{*}(E)-m_{*}(E)\le \epsilon holds.

It is known from statement [3] that for any (\epsilon>0)(\exists X\in\mathcal{E}(\mathbb{R}^d)) such that

m^{*}(X\triangle E)\le\epsilon/3\ \ \ \ (7).

Using the relevant property of infimum, for any \epsilon>0 there exists an elementary superset E\setminus X\subseteq Y of E\setminus X such that

m(Y)<m^{*}(E\setminus X)+\epsilon/3\ \ \ \ (8).

Introduce the set B:=X\cup Y. Obviously B is an elementary set and E\subseteq B, so m^{*}(E)\le m(B). It thus follows from lemma 3 that

m^{*}(E)-m_{*}(E)\le m^{*}(B\setminus E)\ \ \ \ (9).

B\setminus E=(X\cup Y)\setminus E=(X\setminus E)\cup (Y\setminus E), so by lemma 4

m^{*}(B\setminus E)\le m^{*}(X\setminus E)+m^{*}(Y\setminus E)\ \ \ \ (10).

X\setminus E\subseteq X\triangle E\Rightarrow \mathcal{U}(X\triangle E)\subseteq\mathcal{U}(X\setminus E), so m^{*}(X\setminus E)=\inf{\mathcal{U}(X\setminus E)}\le \inf{\mathcal{U}(X\triangle E)}=m^{*}(X\triangle E). Consequently, inequality (10) gives

m^{*}(B\setminus E)\le m^{*}(X\triangle E)+m^{*}(Y\setminus E)\ \ \ \ (11).

In a similar way, E\setminus X\subseteq X\triangle E implies

m^{*}(E\setminus X)\le m^{*}(X\triangle E)\ \ \ \ (12).

Finally, Y\setminus E\subseteq Y means that

m^{*}(Y\setminus E)\le m(Y)\ \ \ \ (13).

All the components of the proof have been established. More concretely, inequalities (9), (11), (7), (13), (8) and (12) produce m^{*}(E)-m_{*}(E)\le \epsilon for any \epsilon>0, so m_{*}(E)=m^{*}(E).

\boxed{[1]\Rightarrow [3]}

Assume that (\exists\epsilon_0>0)(\forall X\in\mathcal{E}(\mathbb{R}^d))m^{*}(X\triangle E)>\epsilon_0.

As m^{*}(E)=\inf{\mathcal{U}(E)}, it is deduced that there exists an elementary set E\subseteq B such that

m(B)<m^{*}(E)+\epsilon_0/2\ \ \ \ (14).

Since m_{*}(E)=\sup{\mathcal{L}(E)}, there exists an elementary set A\subseteq E such that

m_{*}(E)-\epsilon_0/2<m(A)\ \ \ \ (15).

A\subseteq E\subseteq B\Rightarrow E\setminus A\subseteq B\setminus A, so m^{*}(E\setminus A)\le m(B\setminus A). Moreover, A\subseteq B means m(B\setminus A)=m(B)-m(A), hence

m_{*}(E\setminus A)\le m(B)-m(A)\ \ \ \ (16).

Combining (14), (15) and (16) gives m_{*}(E\setminus A)<m^{*}(E)-m_{*}(E)+\epsilon_0. By the assumed statement [1], m_{*}(E)=m^{*}(E), thus m_{*}(E\setminus A)<\epsilon_0. This is a contradiction, as the assumed negation of statement [3] gives m_{*}(A\triangle E)=m_{*}(E\setminus A)>\epsilon_0 for A\subseteq E.

\boxed{[3]\Rightarrow [2]}

The proof of [3]\Rightarrow [2] is similar in spirit to the proof of [3]\Rightarrow [1]. It will be shown that for any \epsilon >0 there exist A,B\in\mathcal{E}(\mathbb{R}^d) with A\subseteq E\subseteq B such that m(B\setminus A)\le \epsilon.

It is known from statement [3] that for any (\epsilon>0)(\exists X\in\mathcal{E}(\mathbb{R}^d)) such that

m^{*}(X\triangle E)\le\epsilon/4\ \ \ \ (17).

According to the relevant property of infimum, for any \epsilon>0 there exists an elementary set Y with E\setminus X\subseteq Y such that

m(Y)<m^{*}(E\setminus X)+\epsilon/4\ \ \ \ (18).

Consider the set B:=X\cup Y, which is an elementary set and E\subseteq B, so m^{*}(E)\le m(B). By application of lemma 3,

m(B)-m_{*}(E)\le m^{*}(B\setminus E)\ \ \ \ (19).

It is noted that B\setminus E=(X\cup Y)\setminus E=(X\setminus E)\cup (Y\setminus E), so by lemma 4

m^{*}(B\setminus E)\le m^{*}(X\setminus E)+m^{*}(Y\setminus E)\ \ \ \ (20).

X\setminus E\subseteq X\triangle E, so m^{*}(X\setminus E)\le m^{*}(X\triangle E). Thus, inequality (20) leads to

m^{*}(B\setminus E)\le m^{*}(X\triangle E)+m^{*}(Y\setminus E)\ \ \ \ (21).

Similarly, E\setminus X\subseteq X\triangle E implies

m^{*}(E\setminus X)\le m^{*}(X\triangle E)\ \ \ \ (22).

Moreover, Y\setminus E\subseteq Y gives

m^{*}(Y\setminus E)\le m(Y)\ \ \ \ (23).

Finally, there exists an elementary set A with A\subseteq E such that

m_{*}(E)-\epsilon /4<m(A) \ \ \ \ (24).

Inequalities (19), (21), (17), (23), (18) and (22) produce

m(B)-m_{*}(E)\le 3\epsilon /4 \ \ \ \ (25).

Combining inequalities (24) and (25) confirms that for any \epsilon > 0 there exist elementary sets A,B with A\subseteq E\subseteq B such that m(B)-m(A)\le \epsilon, which completes the proof of [3]\Rightarrow [2].

\boxed{[2]\Rightarrow [1]}

It is known from lemma 1 that m_{*}(E)\le m^{*}(E). Assume the negation of statement [1], that is assume m_{*}(E)\neq m^{*}(E). So, m_{*}(E)< m^{*}(E).

Set \epsilon_0=(m^{*}(E)-m_{*}(E))/2>0. From the assumed statement [2], it is known that there exist A,B\in\mathcal{E}(\mathbb{R}^d) with A\subseteq E\subseteq B such that

m(B\setminus A)=m(B)-m(A)\le  \epsilon_0=(m^{*}(E)-m_{*}(E))/2 \ \ \ \ (26).

Note that A\subseteq E\Rightarrow m(A)\le m_{*}(E) and E\subseteq B\Rightarrow m^{*}(E)\le m(B), which leads to

m^{*}(E)-m_{*}(E)\le m(B)-m(A) \ \ \ \ (27).

It follows from equation (26) that 2(m(B)-m(A))\le  m^{*}(E)-m_{*}(E), which is combined with equation (27) to give 2(m(B)-m(A))\le m(B)-m(A), and finally m(B)-m(A)\le 0. Moreover, equation (27) yields 0 < 2\epsilon_0=m^{*}(E)-m_{*}(E)\le m(B)-m(A), i.e. m(B)-m(A)>0. Thus, a contradiction has been reached.

Proof: uniqueness of elementary measure

The following proof is a solution to exercise 1.1.3 of the book “An introduction to measure theory” by Terence Tao.

A box B\in\mathbb{R}^d, d\in\mathbb{N}, is a Cartesian product B:={\sf X}_{i=1}^d I_i, where each interval I_i is I_i=(a, b) or I_i=(a, b] or I_i=[a, b) or I_i=[a, b] for a,b\in\mathbb{R} with a\le b. An elementary set E=\cup_{i=1}^n B_i\subseteq\mathbb{R}^d is a finite union of disjoint boxes B_i\in\mathbb{R}^d. Let \mathcal{E}(\mathbb{R}^d) denote the collection of elementary sets in \mathbb{R}^d. The measure m:\mathcal{E}(\mathbb{R}^d)\rightarrow R^{+}\cup\left\{0\right\} is defined as m(E)=\displaystyle\lim_{N\rightarrow\infty}\frac{1}{N^d}\#\left(E\cap\frac{1}{N}\mathbb{Z}^d\right), where \#(\cdot) denotes set cardinality and \displaystyle\frac{1}{N}\mathbb{Z}^d:=\left\{\frac{\mathbf{z}}{N}:\mathbf{z}\in\mathbb{Z}^d\right\}.

Let m^{'}:\mathcal{E}(\mathbb{R}^d)\rightarrow R^{+}\cup\left\{0\right\} be a function satisfying the non-negativity (m^{'}(E)\ge 0 for any elementary set E), finite additivity (m^{'}(\displaystyle\cup_{i=1}^n E_i)\le \sum_{i=1}^{n}m^{'}(E_i) for disjoint elementary sets E_i) and translation invariance (m^{'}(E+\mathbf{x})=m^{'}(E) for any elementary set E and any \mathbf{x} \in \mathbb{R}^d) properties.

It will be shown that there exists a positive constant c\in\mathbb{R}^+ such that m^{'}=cm, i.e. the functions m^{'} and m are equal up to a positive normalization constant c.

Observe that \left[0,1\right)=\displaystyle\cup_{i=0}^{n-1}\left[\frac{i}{n},\frac{i+1}{n}\right). Due to translation invariance, m^{'}\left(\displaystyle\left[\frac{i}{n},\frac{i+1}{n}\right)\right)=m^{'}\left(\displaystyle\left[\frac{i}{n},\frac{i+1}{n}\right)-\frac{i}{n}\right)=m^{'}\left(\left[0,\frac{1}{n}\right)\right). Using finite additivity, it follows that m^{'}\left(\left[0,1\right)^d\right)=n^dm^{'}\left(\left[0,\frac{1}{n}\right)^d\right). So, m^{'}\left(\left[0,\frac{1}{n}\right)^d\right)=\displaystyle\frac{c}{n^d} for c:=m^{'}\left(\left[0,1\right)^d\right). Since m\left(\left[0,\frac{1}{n}\right)^d\right)=\displaystyle\frac{1}{n^d}, it follows that m^{'}\left(\left[0,\frac{1}{n}\right)^d\right)=c m\left(\left[0,\frac{1}{n}\right)^d\right).

This result generalizes to intervals \left[0,q\right) for any rational q=\displaystyle\frac{s}{n} with s\in\mathbb{N} andn\in\mathbb{N}. Since \left[0,q\right)=\displaystyle\cup_{i=0}^{s-1}\left[\frac{i}{n},\frac{i+1}{n}\right), finite additivity and translation invariance lead to m^{'}\left(\left[0,q\right)^d\right)=c m\left(\left[0,q\right)^d\right)=cq^d.

It will be shown that the result holds also for intervals \left[0,p\right) for any irrational p\in\mathbb{P}.

The set of rationals is dense, which means that (\forall \epsilon>0)(\forall x\in\mathbb{R})(\exists q\in\mathbb{Q})|x-q|<\epsilon. For some irrational x=p and for each n\in\mathbb{N}, set \epsilon=\displaystyle\frac{1}{n}, so (\forall n\in\mathbb{N})(\exists q_n\in\mathbb{Q})|p-q_n|<\displaystyle\frac{1}{n}. Pick some n_0\in\mathbb{N} with n_0>\displaystyle\frac{2}{p}. For all n\in\mathbb{N} with n>n_0, it holds that \displaystyle \frac{2}{n}<\frac{2}{n_0}<p, whence 0<\displaystyle\frac{1}{n}<p-\frac{1}{n}<q_n. So, (\exists n_0\in\mathbb{N})(\forall n\in\mathbb{N}) with n>n_0, it is true that 0<q_n-\displaystyle\frac{1}{n}<p<q_n+\frac{1}{n} and consequently \left[0,q_n-\displaystyle\frac{1}{n}\right)^d\subseteq\left[0,p\right)^d\subseteq\left[0,q_n+\displaystyle\frac{1}{n}\right)^d.

For any two elementary sets E\subseteq F, it can be shown that m^{'}(E)\le m^{'}(F) via the equality F=E\cup(F\setminus E), non-negativity and finite additivity. Hence, m^{'}\left(\left[0,q_n-\displaystyle\frac{1}{n}\right)^d\right)\le m^{'}\left(\left[0,p\right)^d\right)\le m^{'}\left(\left[0,q_n+\displaystyle\frac{1}{n}\right)^d\right).

Since q_n\pm\displaystyle\frac{1}{n} are rationals, it is deduced that m^{'}\left(\left[0,q_n\pm\displaystyle\frac{1}{n}\right)^d\right)=c\left(q_n\pm\displaystyle\frac{1}{n}\right)^d. Thus, c\left(q_n-\displaystyle\frac{1}{n}\right)^d\le m^{'}\left(\left[0,p\right)^d\right)\le c\left(q_n+\displaystyle\frac{1}{n}\right)^d.

0<q_n-\displaystyle\frac{1}{n}<p<q_n+\frac{1}{n} gives 0<p-\displaystyle\frac{1}{n}<q_n<p+\frac{1}{n}, hence the sandwich theorem yields \displaystyle\lim_{n\rightarrow\infty}q_n=p.

Combining c\left(q_n-\displaystyle\frac{1}{n}\right)^d\le m^{'}\left(\left[0,p\right)^d\right)\le c\left(q_n+\displaystyle\frac{1}{n}\right)^d and \displaystyle\lim_{n\rightarrow\infty}q_n=p gives cp^d\le m^{'}\left(\left[0,p\right)^d\right)\le cp^d, so m^{'}\left(\left[0,p\right)^d\right)=cp^d for any irrational p.

This effectively completes the proof. There remains to show that m^{'}=cm is true for Cartesian products of unequal intervals \left[0,x_i\right) in each coordinate i=1,2,\dots,d, for any x_i\in\mathbb{R}, and for any subinterval of the real line. These are all trivial given the existing foundations.

Proof: the symmetry group of a complete graph is vertex transitive

The following self-contained post relates to definition 1.17, p. 11, of the book “Groups, graphs and trees: an introduction to the geometry of infinite groups” by John H. Meier.

Let K_n be a complete graph and \mbox{Sym}(K_n) its symmetry group. It will be shown that K_n is vertex transitive, i.e. that for any two vertices v_i\in K_n and v_j\in K_n there exists a symmetry a\in\mbox{Sym}(K_n) such that a(v_i)=v_j.

The proof is constructive. Given any two vertices v_i\in K_n and v_j\in K_n, it suffices to provide a symmetry a_{ij}:V(\Gamma)\bigcup E(\Gamma)\rightarrow V(\Gamma)\bigcup E(\Gamma)\in\mbox{Sym}(K_n) such that a_{ij}(v_i)=v_j. Although the suggested symmetry is different for every v_i and v_j, the simpler notation a is preferred in place of a_{ij}.

Given v_i\in K_n and v_j\in K_n, a suitable symmetry can be constructed by swapping v_i with v_j, by redirecting all edges with one end in v_i to edges with one end in v_j and vice versa, and by leaving all other vertices and edges intact.

To formalize the construction, given v_i\in K_n and v_j\in K_n, the symmetry a is set to satisfy the following seven properties:

  • a(v_i)=v_j,
  • a(v_j)=v_i,
  • a(v_k)=v_k for all vertices v_k other than v_i,~v_j,
  • a(e_{ij})=e_{ij}, where e_{ij} denotes the edge connecting v_i with v_j,
  • a(e_{ik})=e_{jk} for all k other than i,~j, where e_{ik} denotes the edge connecting v_i with v_k and e_{jk} the edge connecting v_j with v_k,
  • a(e_{jk})=e_{ik} for all k other than i,~j, and
  • a(e_{kl})=e_{kl} for all k,~l other than i,~j, where e_{kl} denotes the edge connecting v_k with v_l.

Note that all edges exist in the above construction since K_n is complete, so the symmetry is well-defined.

It is easy to check that the proposed symmetry maps vertices to vertices, edges to edges, and preserves the connectivity of \Gamma according to the definition of symmetry provided in this blog post. For instance, \mbox{ends}(e_{ik})=\{v_i,v_k\}, a(e_{ik})=e_{jk}, a(v_i)=v_j and a(v_k)=v_k yield \mbox{ends}(a(e_{ik}))=\mbox{ends}(e_{jk})=\{v_j,v_k\}=\{a(v_i),a(v_k)\}, which is an edge in K_n.

Clarifying the concept of symmetry group of a graph

The following self-contained post clarifies the definition of symmetry group of a graph (definition 1.14, p. 10-11, and example 1.15, p. 11, in the book “Groups, graphs and trees: an introduction to the geometry of infinite groups” by John H. Meier).

The symmetry group of a graph, also referred to as the automorphism group of a graph, is informally the set of all possible rearrangements of the graph that preserve its edge-vertex connectivity.

A practical way of understanding what rearrangements are permitted by the symmetry group of a graph is to assign labels to the vertices and edges of the graph. Consider moving around the vertices and edges of the graph in such way so that each graph resulting from the reconfiguration connects the same labelled vertices via the same labelled edges.

For example, the figure below displays a graph \Gamma with labelled vertices \{v_1, v_2, v_3, v_4\} and edges \{e_1, e_2, e_3, e_4, e_5, e_6\}:


Six symmetries come from permuting the edges \{e_4, e_5, e_6\} connecting the vertices \{v_1, v_4\} and one symmetry comes from reflecting through a horizontal axis. The symmetry arising from a reflection through the horizontal axis is shown below:


On the other hand, the graph below does not belong to the symmetry group of the initial graph \Gamma because of connecting v_2 with v_4 through e_2, which introduces a connection not present in \Gamma:


Let’s count the graphs in the symmetry group of \Gamma. Six graphs symmetric to \Gamma arise from permuting the edges \{e_4, e_5, e_6\}, and this set of six graphs is isomorphic to the symmetric group \mbox{Sym}_3. The original graph \Gamma along with its reflection through the horizontal axis gives a set of two graphs isomorphic to \mbox{Sym}_2. So the symmetry group of \Gamma contains 6\times 2 =12 graphs, and is isomorphic to the direct product \mbox{Sym}_3\bigoplus\mbox{Sym}_2.

The main aim of this blog post is to provide a mathematically rigorous definition of the symmetry group of a graph starting by the concept of graph symmetry. A symmetry of a graph \Gamma=(V,E) with vertex set V and edge set E is a permutation f of the vertex set V such that the pair of vertices \{v_1,v_2\} form an edge in \Gamma if and only if the pair \{f(v_1), f(v_2)\} also form an edge in \Gamma.

Unfortunately, there is an error in this first definition. In the previous example, if the edges e_4 and e_5 are permuted, a new distinct graph from the symmetry group of \Gamma arises, which is not accounted by the definition due to permuting the edges but not the vertices of \Gamma.

A more mathematically accurate definition is provided in John Meier’s book “Groups, graphs and trees: an introduction to the geometry of infinite groups” (definition 1.14, p. 10-11). According to Meier’s definition, a symmetry of a graph \Gamma with vertices V(\Gamma) and edges E(\Gamma) is a bijection f taking vertices to vertices and edges to edges such that if \mbox{ends}(e)=\{v_1,v_2\} in \Gamma, then \mbox{ends}(f(e))=\{f(v_1),f(v_2)\} in \Gamma, where \mbox{ends}(e)=\{v_1,v_2\} denotes the set of two vertices v_1 and v_2 connected by an edge e.

Meier’s definition corrects the issue of the previous definition, but does not tell what is the domain and range of f. In fact, trying to define the domain and range of f does not seem to be obvious due to the dubious notation f(e) and f(v_1).

One possible way round this problem is to define graph symmetry to be a set of two separate bijections, one on the vertices and one on the edges of the graph. The third definition would take the following form; a symmetry of a graph \Gamma with vertices V(\Gamma) and edges E(\Gamma) is a set \{f_1,f_2\} of bijections f_1:V(\Gamma)\rightarrow V(\Gamma) and f_2:E(\Gamma)\rightarrow E(\Gamma) such that if \mbox{ends}(e)=\{v_1,v_2\} in \Gamma, then \mbox{ends}(f_2(e))=\{f_1(v_1),f_1(v_2)\} in \Gamma.

Note that the converse holds in the above definition, i.e. \mbox{ends}(f_2(e))=\{f_1(v_1),f_1(v_2)\} in \Gamma\Rightarrow\mbox{ends}(e)=\{v_1,v_2\} in \Gamma. To prove this, assume that \mbox{ends}(e)=\{q,r\}\ne \{v_1,v_2\}. From the definition, \mbox{ends}(e)=\{q,r\}\Rightarrow\mbox{ends}(f_2(e))=\{f_1(q),f_1(r)\}. Given that \{q,r\}\ne \{v_1,v_2\}, assume without loss of generality that q\ne v_1 and q\ne v_2. Since f_1 is a bijection, it follows that f_1(q)\ne f_1(v_1) and f_1(q)\ne f_1(v_2), which is a contradiction since the edge f_2(e) connects two vertices according to \{f_1(v_1),f_1(v_2)\}=\mbox{ends}(f_2(e))=\{f_1(q),f_1(r)\}.

In light of the validity of the inverse, one can define graph symmetry using the equivalence \mbox{ends}(e)=\{v_1,v_2\} in \Gamma\iff \mbox{ends}(f_2(e))=\{f_1(v_1),f_1(v_2)\} in \Gamma.

However, this third definition raises one more question. Having two separate bijections, one for vertices and one for edges, does not allow to define graph symmetry as a single function. The solution is to define a symmetry f of a graph \Gamma by f(x)= \begin{cases} f_1(x) \mbox{ if } x\in V(\Gamma)\\ f_2(x)\mbox{ if } x\in E(\Gamma)\end{cases}.

So the third definition can now be corrected. A symmetry of a graph \Gamma with vertices V(\Gamma) and edges E(\Gamma) is a bijection f:V(\Gamma)\bigcup E(\Gamma)\rightarrow V(\Gamma)\bigcup E(\Gamma) defined byf(x)= \begin{cases} f_1(x) \mbox{ if } x\in V(\Gamma)\\ f_2(x)\mbox{ if } x\in E(\Gamma)\end{cases}, where f_1:V(\Gamma)\rightarrow V(\Gamma) and f_2:E(\Gamma)\rightarrow E(\Gamma) are two bijections such that if \mbox{ends}(e)=\{v_1,v_2\} in \Gamma, then \mbox{ends}(f_2(e))=\{f_1(v_1),f_1(v_2)\} in \Gamma.

This corrected third definition coincides with Meier’s definition if the latter is clarified formally as follows; a symmetry of a graph \Gamma with vertices V(\Gamma) and edges E(\Gamma) is a bijection f:V(\Gamma)\bigcup E(\Gamma)\rightarrow V(\Gamma)\bigcup E(\Gamma) with the following three properties:

  • v\in V(\Gamma)\Rightarrow f(v)\in V(\Gamma),
  • e\in E(\Gamma)\Rightarrow f(e)\in E(\Gamma),
  • \mbox{ends}(e)=\{v_1,v_2\} in \Gamma\Rightarrow\mbox{ends}(f(e))=\{f(v_1),f(v_2)\} in \Gamma.

Having defined graph symmetry, the symmetry group of a graph becomes simply the collection of all graph symmetries. Formally, the symmetry group \mbox{Sym}(\Gamma) of a graph \Gamma is the set of symmetries of \Gamma equipped with the operation of composition of graph symmetries.

Note that the composition f\circ g of two graph symmetriesf:V(\Gamma)\bigcup E(\Gamma)\rightarrow V(\Gamma)\bigcup E(\Gamma) andg:V(\Gamma)\bigcup E(\Gamma)\rightarrow V(\Gamma)\bigcup E(\Gamma) is essentially a composition of two vertex or edge permutations.

Understanding the concept of group action via an example

The following self-contained post clarifies the definition of group action (definition 1.2, p. 3, in the book “Groups, graphs and trees: an introduction to the geometry of infinite groups” by John H. Meier) by adapting an example from chapter 7, p. 72-74, of the book “A book of abstract algebra” by Charles C. Pinter.

A rigorous definition of group action and of its group homomorphism representation was provided in a previous post. This post presents the group action on the symmetries of the square to exemplify the concept.

Consider a square with numbered vertices, as shown in the figure below. The set of vertices of the square is X=\left\{1,2,3,4\right\}.


A symmetry of the square can be informally thought of as a way of moving the square so that it coincides with its former position. Every such move is fully described by its effect on the vertices, in the sense that every new vertex position coincides with a distinct former vertex position.

There are exactly 8 symmetric moves of the square, each of which can be described by a permutation of the square’s vertices. Here is the list of the 8 symmetries of the square:

  • The identity R_0=\left(\begin{matrix}1 & 2 & 3 & 4\\  1 & 2 & 3 & 4 \end{matrix}\right), which does not move the square.
  • Clockwise rotation of the square about its centre P by an angle of 90^{\circ}R_1=\left(\begin{matrix}1 & 2 & 3 & 4\\  2 & 3 & 4 & 1\end{matrix}\right).
  • Clockwise rotation of the square about its centre P by an angle of 180^{\circ}R_2=\left(\begin{matrix}1 & 2 & 3 & 4\\  3 & 4 & 1 & 2\end{matrix}\right).
  • Clockwise rotation of the square about its centre P by an angle of 270^{\circ}R_3=\left(\begin{matrix}1 & 2 & 3 & 4\\  4 & 1 & 2 & 3\end{matrix}\right).
  • Flip of the square about its diagonal A (see figure below): R_4=\left(\begin{matrix}1 & 2 & 3 & 4\\  1 & 4 & 3 & 2\end{matrix}\right).
  • Flip of the square about its diagonal CR_5=\left(\begin{matrix}1 & 2 & 3 & 4\\  3 & 2 & 1 & 4\end{matrix}\right).
  • Flip of the square about its vertical axis BR_6=\left(\begin{matrix}1 & 2 & 3 & 4\\  2 & 1 & 4 & 3\end{matrix}\right).
  • Flip of the square about its horizontal axis DR_7=\left(\begin{matrix}1 & 2 & 3 & 4\\  4 & 3 & 2 & 1\end{matrix}\right).


The set of all 8 symmetries of the square is denoted by D_4=\left\{R_0,R_1,R_2,R_3,R_4,R_5,R_6,R_7\right\}. Define the operation \circ : D_4\times D_4\rightarrow D_4 to be the function composition R_i\circ R_j(x)=R_i(R_j(x)) for x\in X\left\{1,2,3,4\right\}. For example,

R_1\circ R_6 is the result of first flipping the square about its vertical axis B and then rotating it clockwise by 90^{\circ}:

R_1\circ R_6 =\left(\begin{matrix}1 & 2 & 3 & 4\\  2 & 3 & 4 & 1\end{matrix}\right)\circ\left(\begin{matrix}1 & 2 & 3 & 4\\  2 & 1 & 4 & 3\end{matrix}\right)=\left(\begin{matrix}1 & 2 & 3 & 4\\  3 & 2 & 1 & 4\end{matrix}\right)=R_5.

R_1\circ R_6 =R_5 means that first flipping the square about its vertical axis B and then rotating it clockwise by 90^{\circ} is the same as flipping the square about its diagonal C.

The set of symmetric permutations D_4 of the square along with the operation of permutation composition induces the so-called dihedral group (D_4, \circ) of the symmetries of the square. |D_4| denotes the order of group (D_4, \circ), which is the number of elements of D_4. Obviously, |D_4|=8.

The symmetric group (\mbox{Sym}(X), \circ) of X=\left\{1,2,3,4\right\} is the set of all the permutations of X, i.e. the set of all the bijective functions from X to X. Since \mbox{Sym}(X) has 4! elements, the order of \mbox{Sym}(X) is |\mbox{Sym}(X)|=24.

Note that D_4\subseteq \mbox{Sym}(X) and that (D_4, \circ) is a subgroup of (\mbox{Sym}(X), \circ).

Any function \phi:D_4\times X\rightarrow X is a group action as long as it satisfies

  • \phi(R_0, x)=x for all x \in X=\left\{1,2,3,4\right\} (identity property) and
  • \phi(R_i\circ R_j, x)=\phi(R_i, \phi(R_j, x)) for all i, j=0,1,2,3,4,5,6,7 and for all x \in X=\left\{1,2,3,4\right\} (compatibility property).

One way of picking a specific group action \phi relies on defining the type of associated group homomorphism h:D_4\rightarrow\mbox{Sym}(X) in a way that respects the identity and compatibility properties of \phi:D_4\times X\rightarrow X.

The simplest possible example would be to choose the group homomorphism h:D_4\rightarrow\mbox{Sym}(X) to be the identity function h(R_i)=R_i,~i=0,1,2,3,4,5,6,7, in which case the group action \phi takes the form \phi(R_i, x)=R_i(x).

It is easy to check that the the group action \phi(R_i, x)=R_i(x), which arises by setting the group homomorphism h to be the identity function, satisfies the properties of identity and compatibility:

  • \phi(R_0, x)=R_0(x)=x,
  • \phi(R_i\circ R_j, x)=R_i\circ R_j(x)=R_i(R_j(x))=R_i(\phi(R_j,x))=\phi (R_i,\phi (R_j, x)).

It is also easy to see for instance that the group action \phi(R_i, x)=R_i(x) maps (R_1, 2)\in D_4\times X to 3\in X, since \phi(R_1, 2)=R_1(2)=3.

The group action \phi(R_i, x)=R_i(x) is interpreted as the function that maps every symmetric move (permutation) R_i of the square and every square vertex x to the square vertex R_i(x).

The group homomorphism h(R_i)=R_i is the identity, so it is an injective function. As elaborated in this previous post, since h with h(R_i)=R_i is injective, the group action \phi with \phi(R_i, x)=R_i(x) is faithful.