I’m supervising an undergraduate student, Blake, this summer who is doing a project on the interaction between category theory and set theory, or more accurately, on showing the Algebraic Set Theory (AST) axioms follow from the NBG axioms. There are two things that I feel I much better understand about set theory in working with Blake.

The first is that the Axiom of Class Comprehension, which states that for any formula quantifying over only sets and not proper classes, there is a class . Alternative approaches, for instance the finite axiomatisation of NBG, give a list of seven specific cases that are enough to prove each instance of Class Comprehension using induction on the length of . *Unrestricted* (set) Comprehension is an axiom one might try to use in set theory almost identical to Class Comprehension but concludes that a *set* exists satisfying the condition. This axiom is famously inconsistent, leading to Russell’s Paradox. So what makes Class Comprehension so different? Morse–Kelley set theory has an analogous axiom schema where quantifiers over classes are permitted, but this is legitimately a stronger system. What I realised this week is that Class Comprehension is actually a secret analogue of *Bounded Separation*, but for the specific class of all sets. The axiom of Bounded Separation appeared in Zermelo’s original set theory, and says that for any set and formula where all quantifiers in are of the form and , there is a set . This axiom is more natural from the point of view of topos-theoretic foundations, as Bounded Separation corresponds to having a Heyting category. Whereas in ZFC the sets, the only entities that exist, are in some sense floating free in the universe, in NBG they are all elements of another entity in the theory, namely the class . While working in ZFC people still refer to , but it is not something the theory talks about. So Class Comprehension is secretly Bounded Separation for the class , which is cute.

The other thing I feel I understand better is ‘Scott’s trick‘ (named after Dana Scott), which is a way of constructing a quotient class by an equivalence relation on a class, in the absence of, eg Global Choice. The method requires the (or at least ‘a’) cumulative hierarchy of sets, so that , where each is a set. Given a class with an equivalence relation , consider the problem of how to construct . If everything were sets, then one can build the quotient as a subset of the power set (try it!). In any case, elements of the quotient set are the equivalence classes. But proper classes cannot be elements of classes, and if every -equivalence class is a proper class, there’s no way to make the equivalence classes themselves into elements. In the presence of a strong enough choice principle, then one could in fact choose one element from each equivalence class, as the elements of a class—a transversal to the equivalence relation. Scott’s trick does something that at first glance looks kinda weird: form a new class each whose elements is a set in a fixed equivalence class, and which have *minimal rank* in the given cumulative hierarchy. If that seems like a mouthful to you, then indeed it is. But the idea is actually not so terrible: instead of using some global choice principle to take *one* element from each equivalence class , one uses the stratification of to pick a *subset* of . In fact any non-empty intersection would define a set that can stand in for the appropriate element in the quotient, but the intersection with the smallest is canonical. The real content of Scott’s trick is that it gives a *canonical* subclass such that in the restricted equivalence relation, each equivalence class is only a set, and intersects every equivalence class.

Now, I was thinking about Scott’s trick in the context of the axioms of AST, one of which is that the category of classes has coequalisers of kernel pairs. Naturally, since kernel pairs give equivalence relations, and coequalisers are quotients, one might naturally think to use Scott’s trick to show such coequalisers do indeed exist. If one is in NBG, then since Regularity is an axiom, then the standard cumulative hierarchy exists, so Scott’s trick is indeed available—but the point of AST is that it is a more flexible structural framework for classes, and shouldn’t rule out Anti-Foundation or other non-well-founded systems. So it was simultaneously a relief and somewhat amusing that Scott’s trick is not needed to construct the necessary coequalisers, though not a deep fact in itself. The key insight is that given a class function , the kernel pair give an equivalence relation on , and the quotient of this is exactly the image of ! So in fact only (at most) Class Comprehension is needed for this step, and not Regularity, which is nice.