Nonmeasurable sets of infinite independent fair coin tosses

Jun. 20, 2020

I’ve been thinking about how to teach measure theory starting from building a probability measure on the sequence of infinite tosses of a fair coin instead of the standard treatment which starts with Lebesgue measure. This post is an outline of how to translate Vitali’s proof of the existence of nonmeasurable sets for Lebesgue measures to this setup.

The Problem

Our sample space Ω\Omega consists of infinite strings from the alphabet {H,T}\{H,T\}. We want to prove that there is no probability measure μ\mu defined over all subsets of Ω\Omega such that the different tosses are independent events with the probability of heads and tails on each toss equal to 1/21/2.

The strategy

Our strategy is to closely follow Vitali’s classic proof by contradiction in the case of Lebesgue measure.

Suppose we have a probability measure defined over all subsets of Ω\Omega.

We find a countable series of transformations τi\tau_i which are measure preserving, i.e. μ(τi(A))=μ(A)\mu(\tau_i(A))=\mu(A) for all ii and all AΩA \subset \Omega.

We then construct a set VV with the property that the sets τi(V)\tau_i(V) form a partition of Ω\Omega.

Now we have a contradiction. Since each τi(V)\tau_i(V) has the same measure as VV, if VV has a positive measure, Ω\Omega would end up having infinite measure. If VV has measure zero, Ω\Omega would end up having measure zero. Either possibility contradicts the claim that μ\mu is a probability measure.

While Vitali’s original proof used the translation independence of the Lebesgue measure, our proof uses the symmetry arising from the assumption that the coin tosses are fair and independent.

The transformations

Consider a finite set of indices DD. Then for any infinite sequence of outcomes ω\omega we define τD(ω)\tau_D(\omega) to be the sequence of outcomes obtained by taking ω\omega and flipping (i.e. changing heads to tails and vice versa) the outcome at each of the indices contained in DD.

So, for example, τ{5,6}(ω)\tau_{\{5,6\}}(\omega) would flip the outcomes at the 5-th and 6-th positions in ω\omega and leave the rest of the sequence unchanged.

We first note two elementary properties of these transformations. First, because flipping the same indices twice restores them to their original state, each τD\tau_D is its own inverse. Second, for two sets D1D_1 and D2D_2 the composition τD1τD2\tau_{D_1} \circ \tau_{D_2} flips twice (and hence restores) the indices belonging to both D1D_1 and D2D_2 and flips once the indices belonging to exactly one of them. Thus it equals τD1ΔD2\tau_{D_1 \Delta D_2} (D1ΔD2D_1 \Delta D_2 being the symmetric difference of D1D_1 and D2D_2) and hence belongs to the class of transformations we are considering.

Next, we will show that it follows from independence and fairness that the transformations τD\tau_D are measure preserving. The logic is simple. Independence lets us delink what happens at a given finite set of indices from what happens at the rest of the indices. Fairness implies that any sequence of outcomes of a given finite length ll has the probability 2l2^{-l}. So a sequence like HTHHTH and its flipped version THTTHT both have the probability 1/81/8. So a flip over a finite index set should not change the probability of any event. Now we put in the grunt work to show this formally.

Let ϕ\phi the operation of flipping every outcome in a sequence, ωD\omega_D to be the outcomes in ω\omega at indices in DD in ω\omega and ωD\omega_{-D} to be the the outcomes in ω\omega at indices in the complement of DD. Let S(l)S(l) the set of all finite sequences of {H,T}\{H,T\} of length equal to ll.

Then for any event AA and any finite set of indices DD of size ll, we have by countable additivity, μ(A)=sS(l)μ(A{ωD=s})\mu(A)=\sum_{s \in S(l)}\mu(A \cap \{\omega_D=s\}) [Here’s why we have to make DD finite. If DD were infinite the set of sequences of length equal to the length of DD would be uncountable and we would not be able to use countable additivity.]

Using independence, we have =sS(l)μ(ωD=s)μ(ωD(AωD=s)D)=\sum_{s \in S(l)}\mu(\omega_D=s)\cdot\mu(\omega_{-D} \in (A \cap \omega_D=s)_{-D})

Using fairness to calculate μ(ωD=s)\mu(\omega_D=s) =sS(l)2lμ(ωD(A{ωD=s})D)=\sum_{s \in S(l)}2^{-l}\mu(\omega_{-D} \in (A \cap \{\omega_D=s\})_{-D})

Using the definition of ϕ\phi (flip all outcomes) and τD\tau_D (flip outcomes in DD) we have =sS(l)2lμ(ωD(τD(A)(ωD=ϕ(s))D)=\sum_{s \in S(l)} 2^{-l}\mu(\omega_{-D} \in (\tau_D(A) \cap (\omega_D=\phi(s))_{-D})

Now S(l)S(l) contains all finite sequence of length ll. So as ss ranges over S(l)S(l), ϕ(s)\phi(s) also ranges over S(l)S(l). Therefore, =sS(l)2lμ(ωD(τD(A)(ωd=s)D)=\sum_{s \in S(l)} 2^{-l}\mu(\omega_{-D} \in (\tau_D(A) \cap (\omega_d=s)_{-D})

Once again using fairness to calculate μ(ωD=s)\mu(\omega_D=s) =sS(l)μ(ωD=s)μ(ωD(τD(A)(ωd=s)D)=\sum_{s \in S(l)} \mu(\omega_D=s)\cdot\mu(\omega_{-D} \in (\tau_D(A) \cap (\omega_d=s)_{-D})

But by independence, =μ(τD(A))=\mu(\tau_D(A)) and our proof that τD\tau_D is measure-preserving is done.

The partition

Now we construct a set VV whose images under the different τD\tau_D will be a partition of Ω\Omega. To do so we first define an equivalence relation on elements of Ω\Omega. We say that ω1ω2\omega_1 \sim \omega_2 if there is a τD\tau_D with ω1=τDω2\omega_1 = \tau_D \omega_2. We check that this is really an equivalence relation.

  1. ωω\omega \sim \omega [Since τ\tau_\emptyset is the identity.]
  2. ω1ω2ω2ω1\omega_1 \sim \omega_2 \Rightarrow \omega_2 \sim \omega_1. [If ω1=τDω2\omega_1 = \tau_D \omega_2 then because τD\tau_D is its own inverse it follows that τDω1=ω2\tau_D \omega_1=\omega_2.]
  3. ω1ω2\omega_1 \sim \omega_2 and ω2ω3\omega_2 \sim \omega_3 implies ω1ω3\omega_1 \sim \omega_3. [If ω1=τD1ω2\omega_1 = \tau_{D_1} \omega_2 and ω3=τD2ω2\omega_3 = \tau_{D_2} \omega_2 then ω3=τD2τD1(ω1)=τD1ΔD2(ω1)\omega_3= \tau_{D_2} \circ \tau_{D_1} (\omega_1) = \tau_{D_1 \Delta D_2}(\omega_1).]

Let VV be a set formed by taking one element each from each of the equivalence classes formed by the equivalence relation \sim. Since this selection of an element from each class is arbitrary we have to rely on the Axiom of Choice to ensure that the set VV exists.

Now we claim that τD(V)\tau_D(V) as DD ranges over all finite sets of indices is a partition of Ω\Omega.

First we see that every ωΩ\omega \in \Omega belongs to some τD(V)\tau_D(V). This is because ω\omega belongs to one of the equivalence classes of \sim. If ω\omega' is the representative of this class that has been included in VV then by the definition of \sim there is some DD such that ω=τD(ω)\omega=\tau_D(\omega') and hence ωτD(V)\omega \in \tau_D(V).

Next we see that if D1D2D_1 \ne D_2 then τD1(V)τD2(V)=\tau_{D_1}(V) \cap \tau_{D_2}(V) = \emptyset. Suppose on the contrary that τD1ω1=τD2ω2\tau_{D_1} \omega_1 = \tau_{D_2} \omega_2 for some ω1,ω2V\omega_1,\omega_2 \in V. Using the properties of τ\tau we have τD1ΔD2(ω1)=ω2.\tau_{D_1 \Delta D_2}(\omega_1) = \omega_2. By the definition of \sim this means ω1ω2\omega_1 \sim \omega_2 But since VV contains exactly one element from each equivalence class this means that ω1=ω2=ω\omega_1=\omega_2=\omega (say), so that τD1ΔD2(ω)=ω\tau_{D_1 \Delta D_2}(\omega) = \omega But since no sequence can remain unchanged if some outcomes are flipped it must be the case that D1ΔD2=D_1 \Delta D_2=\emptyset which implies D1=D2D_1=D_2 and contradicts our assumption that D1D2D_1 \ne D_2.

Thus we have established that the τD(V)\tau_D(V) form a partition of Ω\Omega.

The contradiction

We have from the previous section Ω=DτD(V)\Omega = \bigcup_D \tau_D(V) where the union is disjoint and DD ranges over all finite sets of indices. The collection of finite sets of indices is countable (check). Hence we are justified in using countable additivity to get μ(Ω)=Dμ(τD(V))\mu(\Omega) = \sum_D \mu(\tau_D(V))

which because the τD\tau_D’s are measure-preserving gives us μ(Ω)=Dμ(V)\mu(\Omega) =\sum_D \mu(V)

Now there are two possibilities. μ(V)=0\mu(V)=0 in which case we get μ(Ω)=0\mu(\Omega)=0, or that μ(V)>0\mu(V)>0 in which case we get μ(Ω)=\mu(\Omega)=\infty. In either case we have a contradition with the requirement from a probability measure that μ(Ω)=1\mu(\Omega)=1.


Other proofs of this result can be found in Blackwell and Diaconis (1996) and in Chapter 5 of Oxtoby’s Measure and Category.