I’ve been thinking about how to teach measure theory starting from building a probability measure on the sequence of infinite tosses of a fair coin instead of the standard treatment which starts with Lebesgue measure. This post is an outline of how to translate Vitali’s proof of the existence of nonmeasurable sets for Lebesgue measures to this setup.

## The Problem

Our sample space \(\Omega\) consists of infinite strings from the alphabet \(\{H,T\}\). We want to prove that there is no probability measure \(\mu\) defined over all subsets of \(\Omega\) such that the different tosses are independent events with the probability of heads and tails on each toss equal to \(1/2\).

## The strategy

Our strategy is to closely follow Vitali’s classic proof by contradiction in the case of Lebesgue measure.

Suppose we have a probability measure defined over all subsets of \(\Omega\).

We find a countable series of transformations \(\tau_i\) which are measure preserving, i.e. \(\mu(\tau_i(A))=\mu(A)\) for all \(i\) and all \(A \subset \Omega\).

We then construct a set \(V\) with the property that the sets \(\tau_i(V)\) form a partition of \(\Omega\).

Now we have a contradiction. Since each \(\tau_i(V)\) has the same measure as \(V\), if \(V\) has a positive measure, \(\Omega\) would end up having infinite measure. If \(V\) has measure zero, \(\Omega\) would end up having measure zero. Either possibility contradicts the claim that \(\mu\) is a probability measure.

While Vitali’s original proof used the translation independence of the Lebesgue measure, our proof uses the symmetry arising from the assumption that the coin tosses are fair and independent.

## The transformations

Consider a finite set of indices \(D\). Then for any infinite sequence of outcomes \(\omega\) we define \(\tau_D(\omega)\) to be the sequence of outcomes obtained by taking \(\omega\) and flipping (i.e. changing heads to tails and vice versa) the outcome at each of the indices contained in \(D\).

So, for example, \(\tau_{\{5,6\}}(\omega)\) would flip the outcomes at the 5-th and 6-th positions in \(\omega\) and leave the rest of the sequence unchanged.

We first note two elementary properties of these transformations. First, because flipping the same indices twice restores them to their original state, each \(\tau_D\) is its own inverse. Second, for two sets \(D_1\) and \(D_2\) the composition \(\tau_{D_1} \circ \tau_{D_2}\) flips twice (and hence restores) the indices belonging to both \(D_1\) and \(D_2\) and flips once the indices belonging to exactly one of them. Thus it equals \(\tau_{D_1 \Delta D_2}\) (\(D_1 \Delta D_2\) being the symmetric difference of \(D_1\) and \(D_2\)) and hence belongs to the class of transformations we are considering.

Next, we will show that it follows from independence and fairness that the transformations \(\tau_D\) are measure preserving. The logic is simple. Independence lets us delink what happens at a given finite set of indices from what happens at the rest of the indices. Fairness implies that any sequence of outcomes of a given finite length \(l\) has the probability \(2^{-l}\). So a sequence like \(HTH\) and its flipped version \(THT\) both have the probability \(1/8\). So a flip over a finite index set should not change the probability of any event. Now we put in the grunt work to show this formally.

Let \(\phi\) the operation of flipping every outcome in a sequence, \(\omega_D\) to be the outcomes in \(\omega\) at indices in \(D\) in \(\omega\) and \(\omega_{-D}\) to be the the outcomes in \(\omega\) at indices in the complement of \(D\). Let \(S(l)\) the set of all finite sequences of \(\{H,T\}\) of length equal to \(l\).

Then for any event \(A\) and any finite set of indices \(D\) of size \(l\), we have by countable additivity, \[\mu(A)=\sum_{s \in S(l)}\mu(A \cap \{\omega_D=s\})\] [Here’s why we have to make \(D\) finite. If \(D\) were infinite the set of sequences of length equal to the length of \(D\) would be uncountable and we would not be able to use countable additivity.]

Using independence, we have \[=\sum_{s \in S(l)}\mu(\omega_D=s)\cdot\mu(\omega_{-D} \in (A \cap \omega_D=s)_{-D})\]

Using fairness to calculate \(\mu(\omega_D=s)\) \[=\sum_{s \in S(l)}2^{-l}\mu(\omega_{-D} \in (A \cap \{\omega_D=s\})_{-D}) \]

Using the definition of \(\phi\) (flip all outcomes) and \(\tau_D\) (flip outcomes in \(D\)) we have \[=\sum_{s \in S(l)} 2^{-l}\mu(\omega_{-D} \in (\tau_D(A) \cap (\omega_D=\phi(s))_{-D}) \]

Now \(S(l)\) contains *all* finite sequence of length \(l\). So as \(s\) ranges over \(S(l)\), \(\phi(s)\) also ranges over \(S(l)\). Therefore, \[=\sum_{s \in S(l)} 2^{-l}\mu(\omega_{-D} \in (\tau_D(A) \cap
(\omega_d=s)_{-D}) \]

Once again using fairness to calculate \(\mu(\omega_D=s)\) \[=\sum_{s \in S(l)} \mu(\omega_D=s)\cdot\mu(\omega_{-D} \in (\tau_D(A) \cap (\omega_d=s)_{-D}) \]

But by independence, \[=\mu(\tau_D(A))\] and our proof that \(\tau_D\) is measure-preserving is done.

## The partition

Now we construct a set \(V\) whose images under the different \(\tau_D\) will be a partition of \(\Omega\). To do so we first define an equivalence relation on elements of \(\Omega\). We say that \(\omega_1 \sim \omega_2\) if there is a \(\tau_D\) with \(\omega_1 = \tau_D \omega_2\). We check that this is really an equivalence relation.

- \(\omega \sim \omega\) [Since \(\tau_\emptyset\) is the identity.]
- \(\omega_1 \sim \omega_2 \Rightarrow \omega_2 \sim \omega_1\). [If \(\omega_1 = \tau_D \omega_2\) then because \(\tau_D\) is its own inverse it follows that \(\tau_D \omega_1=\omega_2\).]
- \(\omega_1 \sim \omega_2\) and \(\omega_2 \sim \omega_3\) implies \(\omega_1 \sim \omega_3\). [If \(\omega_1 = \tau_{D_1} \omega_2\) and \(\omega_3 = \tau_{D_2} \omega_2\) then \(\omega_3= \tau_{D_2} \circ \tau_{D_1} (\omega_1) = \tau_{D_1 \Delta D_2}(\omega_1)\).]

Let \(V\) be a set formed by taking one element each from each of the equivalence classes formed by the equivalence relation \(\sim\). Since this selection of an element from each class is arbitrary we have to rely on the Axiom of Choice to ensure that the set \(V\) exists.

Now we claim that \(\tau_D(V)\) as \(D\) ranges over all finite sets of indices is a partition of \(\Omega\).

First we see that every \(\omega \in \Omega\) belongs to some \(\tau_D(V)\). This is because \(\omega\) belongs to one of the equivalence classes of \(\sim\). If \(\omega'\) is the representative of this class that has been included in \(V\) then by the definition of \(\sim\) there is some \(D\) such that \(\omega=\tau_D(\omega')\) and hence \(\omega \in \tau_D(V)\).

Next we see that if \(D_1 \ne D_2\) then \(\tau_{D_1}(V) \cap \tau_{D_2}(V) = \emptyset\). Suppose on the contrary that \[\tau_{D_1} \omega_1 = \tau_{D_2} \omega_2\] for some \(\omega_1,\omega_2 \in V\). Using the properties of \(\tau\) we have \[\tau_{D_1 \Delta D_2}(\omega_1) = \omega_2.\] By the definition of \(\sim\) this means \[\omega_1 \sim \omega_2\] But since \(V\) contains exactly one element from each equivalence class this means that \(\omega_1=\omega_2=\omega\) (say), so that \[\tau_{D_1 \Delta D_2}(\omega) = \omega\] But since no sequence can remain unchanged if some outcomes are flipped it must be the case that \(D_1 \Delta D_2=\emptyset\) which implies \(D_1=D_2\) and contradicts our assumption that \(D_1 \ne D_2\).

Thus we have established that the \(\tau_D(V)\) form a partition of \(\Omega\).

## The contradiction

We have from the previous section \[\Omega = \bigcup_D \tau_D(V)\] where the union is disjoint and \(D\) ranges over all finite sets of indices. The collection of finite sets of indices is countable (check). Hence we are justified in using countable additivity to get \[\mu(\Omega) = \sum_D \mu(\tau_D(V))\]

which because the \(\tau_D\)’s are measure-preserving gives us \[\mu(\Omega) =\sum_D \mu(V)\]

Now there are two possibilities. \(\mu(V)=0\) in which case we get \(\mu(\Omega)=0\), or that \(\mu(V)>0\) in which case we get \(\mu(\Omega)=\infty\). In either case we have a contradition with the requirement from a probability measure that \(\mu(\Omega)=1\).

## References

Other proofs of this result can be found in Blackwell and Diaconis (1996) and in Chapter 5 of Oxtoby’s Measure and Category.