Thursday, November 9, 2017

A simple "construction" of non-measurable sets from coin-toss sequences

Here’s a simple “construction” of a non-measurable set out of coin-toss sequences, i.e., of an event that doesn’t have a well-defined probability, going back to Blackwell and Diaconis, but simplified by me not to use ultrafilters. I’m grateful to John Norton for drawing my attention to this.

Let Ω be the set of all countably infinite coin-toss sequences. If a and b are two such sequences, say that a ∼ b if and only if a and b differ only in finitely many places. Clearly ∼ is an equivalence relation (it is reflexive, symmetric and transitive).

For any infinite coin-toss sequence a, let ra be the reversed sequence: the one that is heads wherever a is tails and vice-versa. For any set A of sequences, let rA be the set of the corresponding sequences. Observe that we never have a ∼ ra, and that U is an equivalence class under ∼ (i.e., a maximal set all of whose members are ∼-equivalent) if and only if rU is an equivalence class. Also, if U is an equivalence class, then rU ≠ U.

Let C be the set of all unordered pairs {U, rU} where U is an equivalence class under ∼. (Note that every equivalence class lies in exactly one such unordered pair.) By the Axiom of Choice (for collections of two-membered sets), choose one member of each pair in C. Call the chosen member “selected”. Then let N be the union of all the selected sets.

Here are two cool properties of N:

  1. Every coin-toss sequence is in exactly one of N and rN.

  2. If a and b are coin-toss sequences that differ in only finitely many places, then a is in N if and only if b is in N.

We can now prove that N is not measurable. Suppose N is measurable. Then by symmetry P(rN)=P(N). By (1) and additivity, 1 = P(N)+P(rN), so P(N)=1/2. But by (2), N is a tail set, i.e., an event independent of any finite subset of the tosses. The Kolmogorov Zero-One Law says that every (measurable) tail set has probability 0 or 1. But that contradicts the fact that P(N)=1/2, so N cannot be measurable.

An interesting property of N is that intuitively we would think that P(N)=1/2, given that for every sequence a, exactly one of a and ra is in N. But if we do say that P(N)=1/2, then no finite number of observations of coin tosses provides any Bayesian information on whether the whole infinite sequence is in N, because no finite subsequence has any bearing on whether the whole sequence is in N by (2). Thus, if we were to assign the intuitive probability 1/2 to P(N), then no matter what finite number of observations we made of coin tosses, our posterior probability that the sequence is in N would still have to be 1/2—we would not be getting any Bayesian convergence. This is another way to see that N is non-measurable—if it were measurable, it would violate Bayesian convergence theorems.

And this is another way of highlighting how non-measurability vitiates Bayesian reasoning (see also this).

We can now use Bayesian convergence to sketch a proof that N is saturated non-measurable, i.e., that if A ⊆ N is measurable, then P(A)=0 and if A ⊇ N is measurable, then P(A)=1. For suppose A ⊆ N is measurable. Suppose that we are sequentially observing coin tosses and forming posteriors for A. These posteriors cannot ever exceed 1/2. Here is why. For a coin toss sequence a, let rna be the sequence obtained by keeping the first n tosses fixed and reversing the rest of the tosses. For any any finite sequence o1, ..., on of observations, and any infinite sequence a of coin-tosses compatible with these observations, at most one of a and rna is a member of N (this follows from (1) and the fact that ra ∈ N if and only if rna ∈ N by (2)). By symmetry P(A ∣ o1...on)=P(rnA ∣ rn(o1...on)) (where rnA is the result of applying rn to every member of A). But rn(o1...on) is the same as o1...on, so P(A ∣ o1...on)=P(rnA ∣ o1...on). But A and rnA are disjoint, so P(A ∣ o1...on)+P(rnA ∣ o1...on)≤1 by additivity, and hence P(A ∣ o1...on)≤1/2. Thus, the posteriors for A are always at most 1/2. By Bayesian convergence, however, almost surely the posteriors will converge to 1 or 0, respectively, depending on whether the sequence being observed is actually in A. They cannot converge to 1, so the probability that the sequence is in A must be equal to 0. Thus, P(A)=0. The claim that if A ⊇ N is measurable then P(A)=1 is proved by noting that then N − A ⊇ rN (as rN is the complement of N), and so by the above argument with rN in place of N, we have P(N − A)=0 and thus P(A)=1.

No comments: