Type-theoretic considerations in functional language software development

In my current day job, I’m involved in the development of a research-based software tool, where the language being used is Haskell. While I can’t talk about the specifics of either the tool or the mathematics as yet, there was something interesting that we had to go through last month in the development process.

As far as I am aware, (core) Haskell uses the type theory System FC, which is not dependently typed (I’m not a type theorist, and certainly not a PL theorist! The details are not critical for what follows in any case, so be gentle). So when we are implementing data structures that might be more naturally encoded in a dependently-typed system, we instead have validity checks which verify if the various constraints are satisfied or not. In practice, the functions that produce and manipulate these data types are intended to only ever produce objects that satisfy the constraints, but this isn’t enforced by the type system. Earlier in the year, in a previous version of the code, we had a working construction that manipulated some data to produce new data of the same type, and then we re-implemented the prototype software based on richer theoretical foundations, where the types carried more information, and the validity checks then had more work to do.

In testing this new version, once it was mature enough to do so, using Hedgehog, we were running into weird corner cases whereby our function was failing to return a valid object according to the validity tests. There was indeed a bug in the code, because the mathematical definition of the operation was a naive adjustment from the previous iteration. So ensued some time of panic whereby I had to find the correct definition of the data type as well as of the construction underlying the function.

To provide a strong level of confidence, we used Isabelle to formalise candidate mathematical definitions underlying our type (or rather the validity check), and the construction. I found it slightly ironic that in trying to verify the correctness of the mathematics of one non-dependently typed language we turned to another one. But the automation in Isabelle allowed me, a complete novice in formal verification, to break down my proofs into baby steps and supply lots of fairly obvious lemmas so that Sledgehammer tools could fill in the proofs, without having to fight the formalism. Were I someone with lots of time on my hands, I would learn Lean as well.

The Isabelle implementation of the prerequisite mathematical ideas was thankfully not horrible (and mostly handled by my colleague, who has more Isabelle experience than me, i.e. a nonzero amount!) and then we could thrash out exploring the design space of plausible definitions. This might seem a little odd, but since what we are developing is using research we are doing as we go, it’s not always clear what the right choices are, until we implement things and see what happens. Thankfully, we found a robust definition that also makes conceptual sense (and, of course, in hindsight, it’s exactly the right thing), and moreover which makes all the operations we want work, once they are suitably defined.

Another thing which I’ve come to appreciate is that in testing our definition, we had a complete formalisation of what amounts to something like a terminal object in some category. And to prove this really is an object of the mathematical sort we are thinking of was hundreds of lines of Isabelle. (This reminds me of Buzzard et al’s formalisation of a perfectoid space, even if not of the same scale, and then the only example they formalise is the empty space!) What this additionally gave us is that the axioms that are encoded in the validity checks on our data structure are well-typed, for the general case. That is, we ask, for example, that some operation is associative, but this is an operation on indexed data, and so the equality in the associativity law has the potential to be trying to equate objects of different types, were we in a fully-fledged dependently-typed system. But since we weren’t, everything has to be packaged up together in a single type and checked post-hoc.

I’m aware there is work on at least simulating dependent types in Haskell, current work on trying to actually implement dependent types in an extension to Haskell, and for that matter, languages that are based on dependent type theories, but until one of these is performant enough and has enough library support so as to write production code in, we are stuck with what we have (and this is also informed by the expertise of the people involved, to which I add relatively less on the coding side). Being able to work in a dependently-typed language would mean that the type system would take care of ensuring the operations are well-typed as mathematically intended, but certainly having to work in this setting makes things interesting, in trying to work around the limitations of the type system and be confident in the result.

Two items: a new rare high-rank elliptic curve, and a beautifully-organised orchard of Diophantine equations

I got an email the other day from mathematician Bogdan Grechuk, whose book Polynomial Diophantine Equations: A Systematic Approach (https://doi.org/10.1007/978-3-031-62949-5) was recently released. This is to my mind a rather remarkable book. The blurb starts:

This book proposes a novel approach to the study of Diophantine equations: define an appropriate version of the equation’s size, order all polynomial Diophantine equations by size, and then solve the equations in order. [emphasis added]

The book really does do this systematically. The start of the book solves extremely easy examples, but pays close attention to the process, so that it can identify classes of equations that this method will apply to. And as a new method of solution is provided, hence solving a new class of equations, this class of equations is removed from considerations going forward. Grechuk, halfway through the book (>400 pages in!) takes stock and provides an outline of the algorithm built up over the four chapters to that point. It has approximately 21 main cases, of increasing difficulty, spread over pages, referring constantly back through the book to various methods and algorithms for different cases. As the equations keep getting harder, the problem of solution shifts focus: from “describe all solutions parametrically”, to “write down a family of polynomials (or rational functions) that give all solutions”, to “find a recurrence relation to describe solutions”, to “determine if there are finitely or infinitely many solutions”, to “determine if there is any one solution”. The methods start out wholly elementary, and by the end the discussion uses the cutting edge of current techniques for attacking number-theoretic problems of this sort, for instance Chabauty methods.

Long-time readers of the blog may recall the post Diophantine fruit. which was inspired by a MathOverflow question of Grechuk on the problem of “the smallest unsolved Diophantine equation”. This blog post led to a chain of papers using the phrase “fruit equations” by authors wholly unconnected with me, with the first solving what was at the time an open problem on Grechuk’s list. So it’s worth consulting the companion paper A systematic approach to Diophantine equations: open problems (https://arxiv.org/abs/2404.08518) if you have any particular hankering to solve an equation (in any of the various ways: from parametrising a family of solutions to just finding one single solution) and have the honour of being the first person ever (modulo the problem of finding some equation equivalent to it buried in the literature) to solve a hard equation. The book ends with a summary that is a line-in-the-sand version of the just-cited arXiv preprint, stating the earliest/smallest/shortest equations that the various types of solutions are not known. And the last open problem is this:

What is the smallest (in H) equation for which the existence of integer solutions (Problem 7.1) is a problem which is independent from the standard axioms of mathematics (Zermelo–Fraenkel set theory with the axiom of choice)?

Here “H” refers to the function that gives the “size” of a polynomial Diophantine equation, that allows a systematic ordering. Other natural orderings are given in the book, which end up being more-or-less comparable, if not the same, and the above problem is also posed for these orderings. From the MDRP solution to Hilbert’s Tenth Problem we know that at some point equations whose solvability is unprovable in ZFC will turn up. By work of Zhi-Wei Sun (https://arxiv.org/abs/1704.03504) we know that the unsolvability result (that is, there is no algorithm that can solve in integers all equations in a given class) is true taking even just Diophantine equations with no more than 11 variables. But identifying an explicit equation, let alone the smallest one, seems very hard. Moreover, trying to optimise to find a small ZFC-undecidable equation, rather than an algorithmically unsolvable one, is another whole kettle of fish; compare how the value BB(745) of the Busy Beaver function is not possible to calculate in ZFC, through a line of work whose current endpoint is by Johannes Riebel’s 2023 Bachelor thesis. (ADDED: I just now found that in fact BB(636) is known to uncalculable in ZFC, by very recent work of Rohan Ridenour in the past two months)


Just as Grechuk’s book starts small and tries to find the smallest so-far unsolved Diophantine equations, here is an example of one such equation, but one that is far from small:

y^2 + xy = x^3 - 27006183241630922218434652145297453784768054621836357954737385x + 55258058551342376475736699591118191821521067032535079608372404779149413277716173425636721497

(source: https://web.math.pmf.unizg.hr/~duje/tors/rk29.html). This equation defines an elliptic curve, and it has the largest-known number of solutions in rational numbers of any such equation. By “largest-known” I mean that there is a copy of \mathbb{Z}^{29} in the set of rational solutions (where we are in fact finding solutions in the projective plane, not just in affine coordinates as presented). That is, there are 29 rational solutions (shown at the previous link) that are linearly-idependent (over the integers) under the abelian group operation on the rational points. It is not proved that there are no other linearly independent solutions (this number is known as the rank of the elliptic curve described by the equation). The announcement of the result as well as a summary of how much more is known is given in the email:

The above elliptic curve has been proved to have rank exactly 29 using the Generalised Riemann Hypothesis for zeta functions of number fields, which is very far from being known (the usual Riemann Hypothesis is the special case of taking the number field to be the rationals). So the specific two-variable cubic equation above is an example of a polynomial Diophantine equation whose complete solution—and I haven’t even mentioned the possible (finite) torsion subgroup of the elliptic curve—requires knowing a the resolution of a conjecture that is wholly out of reach of current mathematics.

(source for image: Mark Hughes, How Elliptic Curve Cryptography Works, 2019)

Transcription and translation of Grothendieck’s 1972 CERN talk

This talk, titled Allons-nous continuer la recherche scientifique? [Will we continue scientific research?] is rather different. It’s not a mathematics research talk, but a sociological one, based on Grothendieck’s marked shift in personal motivations. This talk survived as a scratchy audio recording, and I had no idea that it had been transcribed, let alone subsequently translated. I haven’t read much of it, but this extract is I think a good one to ponder

So, for the past year or two, I’ve been asking myself questions. And not just to myself. I’ve also been asking them to colleagues and, particularly over the last few months, six months perhaps, I’ve been taking every opportunity to meet scientists, whether in public discussions like this one or in private, and raise these questions. In particular: “Why do we do scientific research?”. A question that is virtually the same perhaps, in the long run at least, as the question: “Are we going to continue scientific research?”. The extraordinary thing is to see how incapable my colleagues are of answering this question. In fact, for most of them, the question is simply so strange, so extraordinary, that they refuse even to contemplate it. In any case, they are extremely reluctant to give any kind of answer. When we manage to elicit an answer in public or private discussions, what we generally hear is, in order of frequency of response: “Scientific research? I do it because it gives me pleasure, because it gives me intellectual satisfaction”. Sometimes people say, “I do scientific research because I have to make a living, because I get paid for it.”

As far as the first motivation is concerned, I can say that it was my main motivation during my life as a researcher. Indeed scientific research gave me pleasure and I didn’t ask myself many questions beyond that. In fact if it gave me pleasure, it was largely because social consensus told me that it was a noble, positive activity, one worth undertaking; by the way without any detail on how it was positive, noble, etc. Obviously direct experience told me that with my colleagues we were building something, a certain edifice. There was a sense of progress that gave a certain feeling of achievement… of plenitude, and at the same time a certain fascination with the problems at hand.

But all this, in the end doesn’t answer the question: “What is the social purpose of scientific research?”. Because, if its sole purpose were to give pleasure say to a handful of mathematicians or other scientists, society would doubtless hesitate to invest considerable funds in it—in mathematics they’re not very considerable but in the other sciences they can be. Society would also be reluctant to pay tribute to this type of activity whereas it is rather silent about activities that may require just as much effort, but of a different type, such as playing marbles or similar. We can develop to the extreme certain facilities, certain technical faculties, be they intellectual, manual or other, but why is there this emphasis on scientific research? It’s a question worth asking.

Talking to many of my colleagues over the past year, I’ve come to realize that in fact this satisfaction that scientists are supposed to derive from exercising their cherished profession, is a pleasure… which is not a pleasure for everyone! I was astonished to discover that for most scientists, scientific research is felt to be a constraint, a servitude. Doing scientific research is a matter of life and death for any considered member of the scientific community. Scientific research is a prerequisite for getting a job, once you’ve embarked on this path without really knowing what it’s all about. Once you’ve got your job, it’s an imperative to move up the ladder. Once you’ve moved up the ladder, even assuming you’ve made it to the top, it’s an imperative to be considered in the running. You’re expected to produce. Scientific production—like any other kind of production in the ambient civilization—is considered an imperative in itself. The remarkable thing about all this is that in the end, the content of research becomes a second thought. It’s all about producing a certain number of “papers”. In extreme cases, a scientist’s productivity is measured by the number of pages they publish. Under these conditions, for a large number of scientists—certainly for the overwhelming majority, with the real exception of a few who are fortunate enough to have an exceptional gift or to be in a social position and disposition that enables them to free themselves from these feelings of constraint—for most, scientific research is a real constraint that kills the pleasure one can have in carrying it out.

This is something I discovered with amazement, because we don’t talk about it. Between my students and myself, I thought there was a spontaneous and egalitarian relationship. In fact, it was an illusion in which I was trapped. Without even realizing it there was a real hierarchical relationship. The mathematicians who were my students or who considered themselves less well placed than me and who felt alienated in their work, would never have had the idea of talking to me about it until of my own accord, I left the scientific ghetto in which I was confined and tried to talk to people who weren’t from my milieu. That milieu of esoteric scientists who did high mathematics.

The full text, including a vibrant question and answer session at the end, can be found at https://github.com/Lapin0t/grothendieck-cern.

DOIs for Theory and Applications of Categories!!

Nearly 11 years ago I spoke with the then editor-in-chief of the open access journal Theory and Applications of Categories (TAC) about the option of registering Digital Object Identifiers (DOIs) for articles. This is a modern piece of publishing infrastructure without which a journal looks it is still living in the 1990s—which was when TAC was started, in the first wave of new, online journals that were designed to be low-frills and run by academics. The response then was that it wasn’t going to happen. A year later I wrote to the whole editorial board and pointed out that TAC might not be recognised by the Directory of Open Access Journals as being a quality outfit (which, apart from its extremely budget infrastructure, it certainly is), amongst what was at the time a flood of predatory “journals” coming into the playing field. Still no movement. There has been in the meantime a general fluctuation among editors, in particular a younger EiC. This is not to place the blame on the previous holder of that role: DOIs aren’t free, and there’s a bit of work to do so they can be set up, and also, more importantly, sorted out for all the old articles going back 30 years.

It’s been my hope for a long time that TAC can slowly adopt current best practices for open access journals, and today we got the fantastic news that the journal is going to get DOIs! Here’s Chris Grossack in the Category Theory Zulip discussion server:

I’ve been talking to Geoff Crutwell about this, and I have the goahead to start doing some real work to make it happen. In the next few weeks I’ll be setting up a crossref account for TAC (and sister journals) and figuring out how to automatically create DOIs for both the TAC backlog and all new publications going forwards. I don’t know exactly how long it will take, but it should hopefully be fully set up within a month or so!

A month, after over a decade of hoping, is really not long!

Translation project: Dedekind’s second definition of finite set

[tl;dr please help improve the translation if you can!]

Over the local long weekend I played around with using Google Lens to translate a scan of a text from Dedekind’s Nachlass, in the collected works edited by Robert Fricke, Emmy Noether and Øystein Ore and published in 1932. It concerns a different definition given by Dedekind of what it means to be a finite set without referencing the natural numbers, that turns out to be equivalent to the definition of finiteness that does use the natural numbers (even in the absence of Choice, unlike Dedekind’s much more famous definition). [EDITED the following, as I was too hasty and used the wrong quantifier. Mea culpa!]

There exists an \phi\colon S\to S, such that if A\subseteq S satisfies \phi(A) \subseteq A, then A is either \emptyset or S.

This note has barely been cited, but the definition was given by Dedekind in the foreword to the second edition of his famous “What are numbers and what should they be?”, and he hints that the work to establish the equivalence of this new definition and the usual definition of “finite” is nontrivial. But Dedekind had actually done (most of) the work! He just didn’t in the end edit “What are numbers…” in the second or third edition to include this work (and I don’t blame him, this is all very fiddly, low-level stuff).

One paper that I know of that engages with this unpublished note of Dedekind is cited by Noether in the collected works, by her co-editor (of the Cantor–Dedekind correspondence) Jean Cavaillès, a logician-philosopher. He apparently filled in a part of the proofs that Dedekind left out, but I haven’t yet read this paper. The zbMath review of Cavaillès’ paper tells us what he proved about this second definition of Dedekind: that a subset of a finite set is finite, that the successor of a finite set is finite, that induction holds for this definition of finite, and that the powerset of a finite set is finite.

And this paper of Cavaillès has likewise almost never been cited. Compare this to Tarski’s definition of what it means to be a finite set (the powerset is classically well-founded with respect to subset inclusion), which appeared a decade earlier in the same journal (Fundamenta Mathematicae), or Kuratowski‘s also in the same journal, before that. Note that zbMath has no citations for Cavaillès’ paper, and MathSciNet somehow doesn’t have the volumes of Fund. Math. in 1932 indexed. Sierpinski’s book Cardinal and Ordinal Numbers cites Cavaillès (in what context I do not know), but all Google Scholar can give me is 4 citations total: two post-1990 theses in Spanish, Sierpinski’s book, and somehow a review of another paper in JSL from the 60s that I haven’t yet checked out.

The upshot is, that I want to get together a good critical edition of Dedekind’s note in English, in order to make it better-known. My efforts so far have just been to format the mathematics, and get a first working draft of the machine-translation, in a TeX file. I have made this a public GitHub repository so that German speakers can give advice on smoothing out and improving the translation, say in GH issues, in pull requests, or by comments here or elsewhere (I will post a version of this announcement around the place).

A next step is to get Dedekind’s definition onto the nLab , and also (once the text is stable) to start thinking about more refined relationships to other definitions of finiteness, particularly in non-classical settings. Since the definition itself singles out the special subsets \emptyset, S\subseteq S, I can imagine that constructive variants might be clunky, but my intuition is not great.

Ctrl-z, 18 years later — “Yang-Mills theory for bundle gerbes” is now retracted

tl;dr – my very first paper, co-authored with Mathai Varghese when I was a PhD student (and he was advising me), had a critically flawed assumption, which I only discovered in the second half of last year.

It was tremendously exciting for me in 2005 to have my rough ideas turned into a real draft of a real paper by Mathai, to work on the paper together a bit more before sending it off to the arXiv, for publication, and then to get it published in early 2006. That my own idea could be made into a scientific publication was a thrill. At the time my PhD supervision was a bit in flux. Students were required to have at least two supervisors (a primary and a secondary), and my primary supervisor—Michael Murray—had just taken up the role of Head of School, the School in question being newly formed from the merge of three departments. Meetings were often rescheduled or cancelled due to time constraints. My original secondary supervisor could contribute little specific to the topic I was looking at, and it was arranged I would swap Mathai in as secondary. This started a period of about 18 months, going from memory, of mostly working with him, though ultimately I moved on and found a different project for my thesis. Our joint paper was the first thing we did in short order, and I was still very fresh and inexperienced. I had moved to the maths department from the physics department, switching disciplines. My thinking was still very approximate and “physicsy” (how times have changed!). From what I can only guess are my notes going into the meeting that kicked this project off, it looks like I was hoping we could reproduce the derivation of the sourceless vacuum Maxwell equations from the appropriate Lagrangian/action, except now from the 3-form curvature of a bundle gerbe, rather than from the 2-form curvature (the Faraday tensor) of a connection on a U(1)-bundle (i.e. the EM vector potential).

Page from my notebook dated 10th August 2005, and showing a first pass at a variational calculus approach to a bundle gerbe Yang–Mills action.

The details of who did what in the paper will be left vague, but suffice it to say that I did at one point “verify” the gauge group in the paper did indeed satisfy the conditions of having a group action on our space of bundle gerbe curvings. Looking at my calculations, they are not wrong—they just took the assumption behind the definition for granted. This assumption is that there is a certain 2-form on the Banach Lie group PU(\mathcal{H}) with a reasonable-seeming “primitivity” property (which certainly holds at the level of de Rham cohomology, and even in the linearised level, at the identity element). Our paper cites no source for this assumption. Given such a 2-form, everything works fine, and my calculation in 2005 is sound.

However, as is my wont, I like to read over my past papers to keep the ideas from being completely forgotten, and particularly papers like this one that has left its original rationale dangling: can we generalise not just electromagnetism to higher-degree forms, but non-abelian Yang–Mills theory? As one can tell from the 2006 paper’s title, this was at the forefront of my mind at the time. The knowledge of non-abelian higher gauge theory was at the time not developed enough for my to attack this problem, though it was the original goal of the PhD project proposal. But in recent years, the kind of nitty-gritty down-to-earth tech has pretty much arrived, and so while I’m not planning on picking up my original goal, I do still care about supplying physicists with concrete examples to illustrate the theory.

In the process of re-reading our paper, I noticed, for the first time, the primitivity assumption as being a bit of a bold claim that it is:

…recall that the line bundle L associated with the central extension … is primitive in the sense that there are canonical isomorphisms …, and there is a connection \nabla, on the line bundle L, called a primitive connection, which is compatible with these isomorphisms.

Mathai and Roberts (2006), section 2.2

In the treatment by Murray and Stevenson of connections on central extensions (see section 3 at the link), one generically does not get such a primitive connection; rather there is a 1-form on the square of the quotient group measuring the failure of the compatibility of the connection with the isomorphisms. Back in 2005–06, this was not a paper I had spent time with, but in the past decade it has become a standard reference for me. And so only now does the unwarranted assumption stand out like a sore thumb. The problem is not so much that the connection fails to be compatible with the isomorphisms, but that there is in all cases that I knew of, an resulting exact 2-form, the exterior derivative of the 1-form mentioned above. To satisfy the axiom for a group action of our gauge group C^\infty(X,PU(\mathcal{H})) on the affine space of bundle gerbe curvings, the exact 2-form needed to vanish everywhere.

I was somewhat perturbed, and to gloss over the subsequent discussions of the awkward situation, I ended up writing a paper that not only showed that there was no 2-form on PU(\mathcal{H}) as we had assumed, but classifying the more general class of bi-invariant 2-forms on all (reasonable) infinite-dimensional Lie groups in terms of the topological abelianisation of the Lie algebra. In many examples of interest this more general space of 2-forms still turns out to be trivial (i.e. only the identically vanishing 2-form is bi-invariant), so certainly there can be no primitive 2-forms as we assumed in all of these cases. Thankfully, one can wring some mildly interesting examples out of this result, namely that one can classify bi-invariant forms on some infnite-dimensional structured diffeomorphism groups (in the volume-preserving and symplectomorphism cases, for instance), in terms of specific de Rham cohomology spaces of the original compact manifold. Whether this is useful or even interesting, it isn’t clear to me. But to provide a positive result out of such a negative situation helped ease the disappointment.

Whereas at the point of finding this unjustified assumption I was hoping that perhaps there was something particularly special about the group PU(\mathcal{H}) that I didn’t yet know, I was now in the position where I knew the definition using this assumption was unfixable. Since the resulting analysis of the moduli space of solutions uses this group action in multiple instances, there was no way I could be confident in the result. So I had to (with permission from Mathai) contact the journal. I wrote a note (technically, a “corrigendum”), and submitted it, with the comment that if it had to be a retraction after all, I would be ok with that. Given the amount of time I put into correcting the literature in other instances (eg the two-year process starting with this blog post, and ending with this formal erratum, and the flow-on effects from that process), I couldn’t very well leave our own mistake in the literature to cause problems down the line.

Ultimately, the journal decided that the paper really should be retracted, and now you can find the notice at the journal website. I was thinking last year throughout this process how I might approach this problem afresh with what I know now, and find a similarly concrete description of the moduli space (as opposed to as a higher stack with some universal property). While I had some fruitful ideas, I hadn’t time then to dedicate to them (I was mostly working late at night sitting up in bed in the dark, sketching notes on my tablet in dark mode!). Were I able to fix the problems easily I would have sent them to the journal and had a corrigendum published, rather than retract the paper. Certainly if people are interested, I can share my ideas as they exist so far. As far as my coauthor goes, he was content to authorise me take charge throughout this whole affair, and I think he is willing to let the matter slide. However, as a matter of personal and professional pride, it would be nice to be able to eventually rectify this error by producing a theorem analogous to the one we had once claimed.

DeMarr’s theorem, part 1

I just learned of a very cool result that shows that the usual assumptions that uniquely characterise the real numbers—for example as a Dedekind complete ordered field—can be relaxed and still arrive at the same conclusion. Without qualification, that’s a bit too vague, but suffice it to say that a) one can just take the ordering to be a partial order, instead of a total order (partial in the sense that, for example, the powerset is only partially ordered by inclusion) and b) one only needs to know that sequences x_1 \geq x_2 \geq \ldots \geq 0 of elements that are greater than or equal to 0 have a greatest lower bound. This latter is weaker than Dedekind completeness, which is equivalent to the statement that every bounded-below set has a greatest lower bound. And, finally, c) the multiplication doesn’t need to be assumed to be commutative, it follows from the other axioms.

This theorem was used in the quantum reconstruction program to prove that the endomorphism ring of the tensor unit object of a dagger category is, under some extra assumptions, is in fact either the real numbers or the complex numbers. The point was to avoid a dependence on Solèr’s theorem, which characterises the ring of scalars of what turns out to be an infinite-dimensional separable Hilbert space, to be able to one day capture the category of finite-dimensional Hilbert spaces. You can see the talk Bypassing Solèr’s Theorem by Matthew Di Meglio for details on this, but here I just want to talk about the theorem of DeMarr, below the fold.

Continue reading “DeMarr’s theorem, part 1”

A fun “little” example of a crossed module

I’ve been looking back at my very first published paper (for “reasons”), and in it there is the group C^\infty(X,\mathrm{PU}), where X is a compact manifold, and \mathrm{PU} is the Banach Lie group of projective unitary operators on an infinite-dimensional separable Hilbert space (one needs to take the norm topology here). This very large group turns out to be a Lie group (not something that’s in the paper, but it follows from work of other people in infinite-dimensional Lie groups). It’s somewhat implicit, but what is really being considered is the homomorphism C^\infty(X,\mathrm{U}) \to C^\infty(X,\mathrm{PU}) induced by the projection map \mathrm{U}\to \mathrm{PU}. Let us call this map \pi_*\colon \widehat{\mathcal{K}} \to \mathcal{L}. Now the projection map is both a central extension and a principal U(1)-bundle, in particular it’s smoothly locally trivial. I will denote by \mathcal{K} the subgroup of \mathcal{L} those maps to \mathrm{PU} that lift to \mathrm{U}. It so happens that \mathcal{K} \subset \mathcal{L} is the connected component of the identity element, since a map lifts iff it is null-homotopic, as \mathcal{U} is contractible, X has the homotopy type of a finite CW complex, and \mathrm{U}\to \mathrm{PU}, being a principal bundle, is a Serre fibration (in fact I think \mathrm{PU} is paracompact, which implies it’s a numerable principal bundle, which implies it’s a Hurewicz fibration).

Something that I realised is that we in fact have that \widehat{\mathcal{K}} \to \mathcal{L} is a crossed module of Lie groups (in fact I think Fréchet Lie groups). What is missing is the action of \mathcal{L} on \widehat{\mathcal{K}}, lifting the adjoint action of \mathcal{L} on \mathcal{K}. To get this, fix g\colon X\to \mathrm{PU} and consider a trivialising cover \{W_\alpha\} on X of the U(1)-bundle g^*\mathrm{U}\to X. On each W_\alpha there is a lift g_\alpha\colon W_\alpha \to \mathrm{U} of g\big|_{W_\alpha} and we can form, for any f\colon X\to \mathrm{U}, the map \mathrm{Ad}_{g_\alpha}f\colon W_\alpha \to \mathrm{U}. Since on W_{\alpha\beta} = W_\alpha \cap W_\beta the two lifts g_\alpha,\ g_\beta differ by a multiplicative factor c_{\alpha\beta}\colon W_{\alpha\beta}\to U(1), we have \mathrm{Ad}_{g_\alpha}f = \mathrm{Ad}_{g_\beta}f\colon W_{\alpha\beta} \to \mathrm{U}. And thus there is a smooth map which I will denote by \widehat{\mathrm{Ad}}_g f\colon X\to \mathrm{U}. An argument involving common refinements shows that this is indeed an action (of abstract groups) of \mathcal{L} on \widehat{\mathcal{K}}.

One can check that this map satisfies the required conditions (to be a crossed module) of covering the adjoint map, by construction, and also that the composite \widehat{\mathcal{K}} \times \widehat{\mathcal{K}} \to \widehat{\mathcal{K}}\times \mathcal{L} \to \widehat{\mathcal{K}} of the projection \mathrm{id}\times \pi_* followed by the action is just the adjoint action of \widehat{\mathcal{K}} on itself.

The only missing thing is that the action should be smooth. I claim this is true, and is an exercise for the reader to check using the charts supplied by the Lie algebras C^\infty(X,u) and C^\infty(X,pu) (Banach Lie groups are locally exponential, so the exponential map does indeed furnish a chart around the identity for \mathrm{U} and \mathrm{PU}, and taking mapping groups into a locally exponential Lie group is again locally exponential).

We have the nice property that \mathcal{K} \subset \mathcal{L} is a closed subgroup, and in fact, as noted, it is the connected component of the identity, so that the projection map \mathcal{L} \to \mathrm{coker}(\pi_*) \simeq H^2(X,\mathbb{Z}) is not something pathological (like the quotient map of a non-closed or non-Lie subgroup). However, I don’t know as yet (because I’m juggling too many balls at the moment) if \widehat{\mathcal{K}} \to \mathcal{K} is a locally trivial bundle, which is what one wants for a crossed module of Lie groups. One has that \ker(\pi_*)= C^\infty(X,U(1)), which is a particularly nice Fréchet Lie group (eg it is nuclear), and so the idea is that

1 \to C^\infty(X,U(1)) \to C^\infty(X,\mathrm{U}) \to C^\infty(X,\mathrm{PU})_0\to 1

should be a locally split extension of Fréchet Lie groups. In particular, it should be the case that the extension of Fréchet spaces

1 \to C^\infty(X,i\mathbb{R}) \to C^\infty(X,u) \to C^\infty(X,pu)\to 1

is continuously split. I believe there is a continuous linear section pu\to u of the quotient map, which is enough, here. However, I haven’t thought too hard about checking that one indeed gets submersion charts for \pi_*, let alone local trivialisations for the putative bundle.

Now clearly there is nothing particularly special here about this specific choice of central extension of Banach Lie groups. One could have started with any A\to \widehat{G} \to G. Probably requiring Banach here is not necessary, and a central extension of (locally exponential) Fréchet Lie groups might be fine to make the argument go through.

However, what is special is that that the crossed module above seems to be very close to a presentation of the stack (of monoidal groupoids) of principal U(1)-bundles on X! However, the symmetric nature of the tensor product of bundles here is not apparent, since the crossed module is decidedly made up of non-abelian groups! However, we do know that the product on the Lie groups is commutative up to homotopy. So it might well be that the differentiable stack that (I claim) is presented by this strictly monoidal Lie groupoid has the same “shape” (in the technical sense of cohesive higher toposes) as the stack of U(1)-bundles. This is something to think about.

No order-10 projective planes via SAT

The proof that there is no finite projective plane of “order 10” (namely, one with 11 points per line) was a gory and large multi-part computation. The last phase, on a CRAY supercomputer (at low priority)

“…started running at IDA in the fall of 1986 and was to await completion in two to three years.”

C.W.H. Lam, The Search for a Finite Projective Plane of Order 10

before ultimately completing in 1988 (not without a final wrinkle, see the paper for more).

But I just learned there was an independent newer proof, this time using a SAT-solver, rather than a patchwork of custom code and by-hand computation!

This is roughly analogous to how the original proof of the Four Colour Theorem was a much more bespoke operation, then the next proof in the 90s was more clever about removing the error-prone by-hand part. But! This is in fact the third independent proof of the non-existence result. There was an intermediate proof by Dominique Roy in 2011, but this was apparently still a matter of custom-written research code. A SAT-solver, on the other hand, is a generic tool, and their theory is intensely studied. Further, the software outputs a “proof certificate” at the end, rather than a boolean yes/no: this is a data object that can be fed into a piece of software that verifies the proof really does do what it says it does (you can get them here!). However, this is not yet a completely formalised proof (in the sense of how the Four Colour Theorem has been by now proved in the proof assistant Coq): there is some unformalised theorem(s) from the 1970s that are relied on. As you might imagine, the proof runtime took a lot less than 2 years of supercomputer time! Or rather, it took about 24 months on a standard personal computer, three months faster than Roy, compared to 30 months of compute time for Lam and collaborators in the 1980s, including their access to a CRAY machine.

Catching myself

I just recently found myself in an interesting position. I was looking at a recently-published paper very close to my own area of expertise. I had missed seeing the preprint, and I found myself thinking “Isn’t this result a trivial consequence of already known facts?” What complicated matters was this was from a junior researcher (who for this blog post will remain anonymous).

Those who have been around in category theory for a while may recognise this type of statement. Indeed, there was a major public dust-up in the not-so-distant past between topos theorists over just such an issue. But it is the type of thing that I’m sure has happened multiple times. The problem with category theory is that the formalism and the intuition of category theorists are extremely close. So that if one is sufficiently experienced, theorems can sometimes immediately suggest themselves, and it’s a matter of setting up the framework so that the theorem holds, then showing that known results or examples are “just” special cases. Instead of proving the theorem holds for objects X, one can be reduced to proving objects X satisfy the conditions of a definition.

However, this is merely often, and not always the case. Sometimes there are really interesting wrinkles that happen when hypotheses are relaxed. Counterexamples to claims at the boundaries. Or techniques need to be wholly reworked to make them feasibly applicable as the complexity of the constructions grow (proving something is a symmetric monoidal closed bicategory in the weakest possible sense? Have fun with that…). Or perhaps there’s a standing assumption that holds in practice in the applications….or almost all applications…and the “well-known-to-experts theorem” has only ever been proved with that standing assumption.

This is the situation I found myself in.

After looking at the paper, the main definitions, the statement of the main results, I wondered why this paper didn’t cite the results I was familiar with. The paper is written well. It actually follows a philosophy I agree with, in supplying details where authors trying to save themselves time and effort in diagram typesetting brush off the proof as an exercise for the reader. For all the talk of intuition, when matters get complex enough, it is worth actually recording the proof properly, so we don’t find ourselves in the situation of Voevodsky and his early work on higher categories (which turned out to have a fatal flaw). But for some reason I was reacting emotionally.

I had to consciously stop myself. This was a junior researcher. I should not be firing off an email claiming the results are trivial applications of known results. This paper was published for a reason. I had another look.

This time, I noticed a claim that a certain property didn’t always hold, when I thought it did. At least, in the situation I was familiar with. Under a certain standing assumption. So now I was wondering if something had been missed, making everything a lot harder than it needed to be. Surely, this property follows reasonably trivially, and then we can all cite the literature and go home.

But again, this was just my intuition talking. I read through the paper to see if a counterexample was supplied. There was not. But….the author can’t just be pulling this fact from nowhere.

So I slowed down more: pull apart the definition. Prove for myself some helper lemmas that would help me prove this property always holds. Do a bunch of reductions. Find a general statement that, if true, leads to the property. Then, armed with the general statement (which was of a class I knew should be known), go hunting online. Surely someone has written about this already.

And then I found it: the general statement I thought should be true was true … if and only if the standing assumption was true.

But, surely, the setup without the standing assumption was, while non-empty, not as interesting/applicable/important as the extremely wide and well-known entire area with the standing assumption? Actually….not so. A bunch of nontrivial and important special constructions turn out to fall outside the standing assumption. Ones that I myself had never really looked at, as they are from a field of mathematics rather distant to my own particular non-category-theoretic speciality.

So there really was a non-trivial use case in applications that people cared about, and, so far as I can see, the result was indeed new in this case. It took me a day of returning to the problem a few times, giving the author the benefit of the doubt, and really checking I’d read things properly, and giving their thoughts the respect they deserved.

I didn’t think I’d become that guy in category theory who claims everything is a special case or a trivial application of existing work. Or it would be a trivial exercise with someone of my expertise. And I’m glad I stopped to check myself, especially in this case. Yes, in principle I could have proved these results. But I didn’t. I didn’t stop to think if there was this little wrinkle that meant the nice property could fail. And so there is now a new piece of mathematics I think is rather nice, that I didn’t know before. And kudos to the author. May they become established as a valuable part of our research community.