Zorn's lemma

Zorn's lemma other maximal principles

Definitions. In a partial order (P,<)(P, <), we say a subset CPC \subseteq P is a chain if CC is totally ordered by << (its elements are pairwise-comparable). An element uu is an upper bound of a set AA if aua \le u for all aAa \in A. An element mm is maximal if there is no yy such that m<ym < y. A chain CC is maximal if there is no chain CC' such that CCC \subset C'.

Zorn’s lemma. If PP \ne \emptyset is partially-ordered and every chain has an upper bound, then it has a maximal element.

Hausdorff maximal principle. Every partially-ordered set has a maximal chain.

Both of these are equivalent as will be seen in the proposition below. Hausdorff proved his version in 1914, while Kuratowski proved what’s now known as Zorn’s lemma in 1922, while Zorn himself proved it independently in 1935. For this reason, some places may call it the “Kuratowski-Zorn lemma”.

Definitions. A family F\mathcal{F} of sets has finite character when: AFA \in \mathcal{F} if and only if FFF \in \mathcal{F} for every finite subset FF of AA. A family of sets is always a partially-ordered set under inclusion (ABA \le B iff ABA \subseteq B). Given a family F\mathcal{F} partially-ordered by inclusion, it is “closed under unions of chains” if, for every chain CFC \subseteq \mathcal{F}, XCX=CF\bigcup_{X \in C} X = \bigcup C \in \mathcal{F}.

Zorn’s lemma for inclusions. If F\mathcal{F} is closed under unions of chains, then it has a maximal element.

Tukey’s lemma. If F\mathcal{F} has finite character, it has a maximal element (with respect to inclusion).

Proposition. The following statements are equivalent.

  1. Hausdorff maximal principle

  2. Zorn’s lemma

  3. Zorn’s lemma for inclusions

  4. Tukey’s lemma

  5. Axiom of choice

Proof. (1) ⇒ (2) Since PP is nonempty, let MM be any maximal chain. Being a chain, it has an upper bound uu. Since M{u}M \cup \{u\} is a chain, then uMu \in M. There can’t be a y>uy > u since then M{y}MM \cup \{y\} \supset M would be a bigger chain. So uu is maximal.

(2) ⇒ (3) If the partial-order is inclusion, then for any given chain CC, we have that XCX \subseteq \bigcup C for all XCX \in C, which implies C\bigcup C is an upper bound of CC, so every chain has an upper bound.

(3) ⇒ (4) If F\mathcal{F} has finite character, we just have to show it’s closed under unions of chains. If CFC \subseteq \mathcal{F} is a chain, let {x1,,xn}C\{x_1, \dots, x_n\} \subseteq \bigcup C. For each 1in1 \le i \le n, let XiCX_i \in C such that xiXix_i \in X_i. Since {X1,,Xn}\{X_1, \dots, X_n\} is a finite subset of a chain, one of them is a maximum element, so that {x1,,xn}Xk\{x_1, \dots, x_n\} \subseteq X_k for some 1kn1 \le k \le n. But then {x1,,xn}F\{x_1, \dots, x_n\} \in \mathcal{F} since XkFX_k \in \mathcal{F}. This proves every finite subset of C\bigcup C is in F\mathcal{F} and so CF\bigcup C \in \mathcal{F}.

(4) ⇒ (5) Let XX be any set and define F\mathcal{F} as the family of functions ff which are choice functions on their domain and dom(f)X\dom(f) \subseteq X. A function is, set-theoretically, a set of ordered pairs (x,f(x))(x, f(x)). So any finite subset of fFf \in \mathcal{F} must be a choice function and its domain a subset of XX. Conversely, if fFf \notin \mathcal{F}, either there is either some xdom(f)Xx \in \dom(f) \setminus X or ff is not a choice function, which means f(x)xf(x) \notin x for some xx. In both cases, {(x,f(x))}f\{(x, f(x))\} \subseteq f is a finite subset not in F\mathcal{F}. This proves F\mathcal{F} has a finite character, and it has a maximal function ff. If xx \ne \emptyset with xdom(f)x \notin \dom(f), we could choose yxy \in x and extend ff to f{(x,y)}f \cup \{(x, y)\} which would contradict its maximality, so dom(f)=X{}\dom(f) = X \setminus \{\emptyset\}, and therefore, it is a choice function on XX.

(5) ⇒ (1) Let C\mathcal{C} be the set of chains of the partially-ordered set PP. Let ff be a choice function on P(P)\mathcal{P}(P). For each chain CCC \in \mathcal{C}, define C={xPC : C{x}C}C^* = \{ x \in P \setminus C \ : \ C \cup \{x\} \in \mathcal{C} \}, and

C~={C{f(C)} CC C=\tilde{C} = \begin{cases} C \cup \{f(C^*)\} & \ C^* \ne \emptyset \\ C & \ C^* = \emptyset \end{cases}

It follows that a CC is a maximal chain iff C~C\tilde{C} \subseteq C. A set TC\mathcal{T} \subseteq \mathcal{C} is a tower when

  1. T\emptyset \in \mathcal{T}

  2. If TTT \subseteq \mathcal{T} is a chain, then TT\bigcup T \in \mathcal{T}.

  3. If CTC \in \mathcal{T}, then C~T\tilde{C} \in \mathcal{T}.

Let T0T_0 be the intersection of all towers, which is well-defined, since F\mathcal{F} itself is a tower. It follows that T0T_0 must also be a tower. If we prove T0T_0 is a chain, then C=T0T0C = \bigcup T_0 \in T_0 by property 2, and C~T0\tilde{C} \in T_0 by property 3. That implies C~T0=C\tilde{C} \subseteq \bigcup T_0 = C, so we’ll be done.

To do so, let Γ\Gamma be the set of elements which are comparable with all elements of T0T_0. If we show Γ\Gamma is a tower, by definition of T0T_0, we get Γ=T0\Gamma = T_0, and that would imply T0T_0 is a chain. Clearly Γ\Gamma satisfies 1, and if every element of a chain TT is comparable with all of T0T_0, for any given yT0y \in T_0, either yxTy \subseteq x \subseteq \bigcup T for some xTx \in T, or xyx \subseteq y for all xTx \in T, and then Ty\bigcup T \subseteq y, so Γ\Gamma satisfies 2.

To prove Γ\Gamma satisfies 3, let CΓC \in \Gamma be fixed and define ΓC={AT0 : AC or C~A}\Gamma_C = \{ A \in T_0 \ : \ A \subseteq C \text{ or } \tilde{C} \subseteq A \}. By definition, C~\tilde{C} is comparable with every element of ΓC\Gamma_C. Once again, if we prove ΓC\Gamma_C is a tower, then ΓC=T0\Gamma_C = T_0, meaning C~Γ\tilde{C} \in \Gamma, proving Γ\Gamma satisfies 3 and is a tower. Clearly ΓC\Gamma_C satisfies 1, and like above, if TΓCT \subseteq \Gamma_C is a chain, either C~AT\tilde{C} \subseteq A \subseteq \bigcup T for some ATA \in T, or ACA \subseteq C for all ATA \in T, in which TC\bigcup T \subseteq C, and so ΓC\Gamma_C satisfies 2. Finally, suppose AΓCA \in \Gamma_C. If C~A\tilde{C} \subseteq A, then C~A~\tilde{C} \subseteq \tilde{A}. If ACA \subseteq C, since CΓC \in \Gamma and A~T0\tilde{A} \in T_0, it must be that either CA~C \subseteq \tilde{A} or A~C\tilde{A} \subseteq C. If the latter does not happen, it must be that ACA \subset C and ACA~A \subseteq C \subset \tilde{A}. But now we use the fact that A~A\tilde{A} \setminus A is either empty or at most a singleton, which must force A=CA = C, forcing C~A~\tilde{C} \subseteq \tilde{A}. So in both cases A~ΓC\tilde{A} \in \Gamma_C and so it satisfies 3. \square