An analytic-flavoured proof of an algebraic result

It is often suggested that algebra and analysis, as subdisciplines of mathematics, are in some sense opposing schools of thought—perhaps even ideologies. ‘algebraists like equalities; analysts like inequalities’, I’ve heard people say. An amusing and thought-provoking shibboleth, no doubt. However, I would argue straight away that, to be precise, algebra is more akin to the study of isomorphisms or congruences… but perhaps that is to take the comparison too seriously.

Putting such superficial differences aside for a moment, it seems to me that typical algebraic proofs tend to consist of sequences of manipulations, each of which is aimed at proving that some set of axioms holds in order to deduce ever more powerful and wide-ranging truths about the objects at hand. The results are exact and often demonstrate a one-to-one correspondence between the objects, or perhaps even show us a new way of viewing something unfamiliar in terms of something more well understood.

Analytic proofs, on the other hand, have a more ‘approximate’ flavour. They make use of limiting processes, inequalities and usually aim to establish the existence of objects satisfying some property that is ‘good enough’ for the result to hold. That’s not to say analytic proofs aren’t beautiful—they absolutely are. But to my eyes, they seem almost apologetic in tone. There is a notion of ‘sufficiency’ that is less common in algebraic proofs.

It’s not very often (at least in fairly elementary mathematics) that the two flavours combine. But here is one place where they do.

The Gaussian integers are the complex numbers with integer real and imaginary part. Formally, we denote the set of Gaussian integers $\mathbb{Z}[i]$ and define it by

$$\mathbb{Z}[i] := \{ s + ti : s, t \in \mathbb{Z} \}$$

The square bracket notation is used because in general $R[x]$ denotes the ring of polynomials with coefficients in a ring $R$. So, sort of by analogy, $\mathbb{Z}[i]$ is the ring of polynomials with integer coefficients and ‘indeterminate’ $i$. Each element of the Gaussian integers is therefore of the form

$$\begin{aligned}

\sum_{n=0}^{m} k_n i^{n} &= k_0 i^0 + k_1 i^1 + k_2 i^2 + k_3 i^3 + k_4 i^4 + \dots + k_m i^m \\
&= k_0 (1) + k_1 (i) + k_2 (-1) + k_3 (-i) + k_4 (1) + \dots \\[1em]
&= \underbrace{(k_0 + k_4 – k_2 + \dots)}_{\in \mathbb{Z}} 1 + \underbrace{(k_1 – k_i + \dots)}_{\in \mathbb{Z}} i \\[1em]
&= s + ti, \; s, t \in \mathbb{Z} \end{aligned}$$

Another way of thinking about $R[x]$ is that it is the smallest ring containing all the elements of $R$, and the element $x$. If we add $x$ into a ring, the ring axioms (like closure under the two operations) force us to admit all possible polynomials in $x$ into the ring too.

…so the notation is consistent. But that’s not really relevant.

It can be shown that the set $\mathbb{Z}[i]$, with the standard operations of complex addition and multiplication, obeys certain rules: it’s an abelian group under addition, it’s closed under multiplication, multiplication is associative, and multiplication distributes over addition. That is to say, it forms a ring.

But in fact the Gaussian integers are a little bit more special.

Let $R$ be a commutative ring. An element $r \in R \setminus \{0\}$ is said to be a zero divisor if there is an element $s \in R \setminus \{0\}$ such that $rs = 0$. Intuitively, the zero divisors are the elements that can be reduced to zero by multiplication with another nonzero element.

Of course, in the integers there are no zero divisors. If $a$ and $b$ are both nonzero, $ab$ cannot possibly be zero. Equivalently (using the contrapositive), if we at some point find out that that $ab = 0$, we can instantly deduce that either $a = 0$ or $b = 0$. We’ve been assuming this property of the integers without thinking since we were little to solve quadratics by factorisation.

In a general ring, however, we cannot make this assumption. For example, in $\mathbb{Z} / 4 \mathbb{Z}$ (the integers modulo $4$) $2$ is a zero divisor since $2 \neq 0$ but $2 \cdot 2 \equiv 0 \pmod 4$. (If you’re wondering, it’s not too hard to show that $\mathbb{Z} / n \mathbb{Z}$ contains zero divisors if and only if $n$ is prime. Think about why $2$ worked in this case…)

But the Gaussian integers don’t have this kind of weirdness. Just like the usual integers, they have no zero divisors. We have a special name for rings like this.

A ring $R$ is said to be an integral domain if it has a $1$ (a multiplicative identity), $1_R \neq 0_R$ (it is non trivial), it is commutative, and it has no zero divisors.

We can be sure that $\mathbb{Z}[i]$ is an integral domain because it is a subring of $\mathbb{C}$, which is itself an integral domain because it is a field.

To see this, take $a, b$ in some field $\mathbb{F}$ and suppose that $a \neq 0$ and $ab = 0$. Since $a$ is nonzero we may multiply by its inverse in $\mathbb{F}$ to get $a^{-1}ab = a^{-1} 0$ which implies that $b = 0$. So either of $a$ or $b$ (or both) are zero.

Something I think is worth noting here is the fact that ‘sub-objects’ automatically inherit some, but not all, properties from their parent. In this particular case, the implication $a,b \neq 0 \implies ab \neq 0$ is true for all $a,b$ in $\mathbb{C}$ and hence must be true for all $a,b$ in $\mathbb{Z}[i] \subseteq \mathbb{C}$. Whether an operation commutes is another example of a property that is preserved in subsets, and in general any statement purely consisting of universal quantifiers will be. Statements that include existential quantifiers, though, like the existence of an identity or inverses for a certain operation, will not be inherited—at least a priori. This is the reason we really have no need to check associativity in verifying that a given subset of a group is a subgroup, for example.

But wait—there’s more! The Gaussian integers have another nice property in common with the usual integers: we can perform Euclidean division.

In a field, of course, we can divide things perfectly. Given any two numbers we can write one as the product of the other with something else. In the integers though, we can’t necessarily do this (there is no such $x \in \mathbb{Z}$ such that $2x = 5$, for example), but we can at least divide a number into another to a small degree of error. In my example before, $\frac{5}{2}$ could be said to equal $2$ remainder $1$. It’s probably exactly how you learnt to do division before you knew what fractions were.

It may seem obvious that the integers have this property, but for a general ring it is not obvious and in fact not necessarily true! In the integers the fact that we can do this is referred to as the ‘division theorem’.

Anyway, the Gaussian integers do have this property—and that’s what we’ll show now.

Let’s define some more stuff first.

An integral domain $R$ is said to be a Euclidean domain if there is a map $V: R \setminus \{0\} \to \mathbb{Z}_{\geq 0}$, known as the valuation, satisfying:

[1] $V(a) \leq V(ab) \; \; \forall a, b \in R \setminus \{0\}$

[2] Given $a, b \in R$ with $b \neq 0$, there are elements $q, r \in R$ such that $a = qb + r$ and either $r = 0$ or $V(r) < V(b)$.

The valuation can be thought of as a generalisation of the absolute value in the case of the usual integers. It is the function that effectively measures how small our remainder is when we do the Euclidean division.

The first condition says that the valuation should never decrease when we multiply two elements of the ring together, the second condition is just the statement that Euclidean division is possible (using the valuation in place of the absolute value in the integers).

In the case of the integers, then, we can obviously use the absolute value as the valuation, which will satisfy both conditions. So the integers are a Euclidean domain.

The function $N: \mathbb{Z}[i] \setminus \{0\} \to \mathbb{Z}_{\geq 0}, s + ti \mapsto |s + ti|^2 = s^2 + t^2$ is called the norm on $\mathbb{Z}[i]$.

We will prove the following statement:

The ring $\mathbb{Z}[i]$ is a Euclidean domain with valuation $N$.

First, we will note that $N(a) \geq 1$ for all $a \in \mathbb{Z}[i]\setminus \{0\}$.

This should be obvious because integer lattice points in the complex plane (excluding the origin itself) are at least $1$ unit away from the origin, and so the minimum norm is the square of this—which is of course $1$. Alternatively, we can just see algebraically that a sum of two squared integers can only be less than $1$ if it is zero, in which case both $a$ and $b$ must be zero, which is not allowed. So the norm is always at least $1$.

From this it is also clear that for $a,b, \in \mathbb{Z}[i] \setminus \{0\}$, $N(ab) = |ab|^2 = |a|^2 |b|^2 \geq |a|^2 = N(a)$, so the first condition is satisfied.

Let $a,b \in \mathbb{Z}[i]$ with $b \neq 0$. Write $a = s + ti$ and $b = u + vi$ with $s,t,u,v \in \mathbb{Z}$. We aim to show that the second condition is satisfied: that we can write $a$ as $b$ multiplied by some other element of $\mathbb{Z}[i]$, plus another element with valuation either zero or strictly less than that of $b$.

Well, we can’t necessarily just divide $a$ by $b$ as usual in the Gaussian integers because they’re not a field. But the Gaussian integers are embedded in the complex numbers, so if we shut our eyes, zoom out, and imagine we’re in that perfect world just for a moment, we can divide $a$ by $b$ exactly.

Form the quotient $\frac{a}{b} = l + mi \in \mathbb{C}$, where $l,m \in \mathbb{R}$.

This isn’t quite what we wanted, but maybe we can approximate this quotient by an element of the Gaussian integers, and then add on a little bit extra to get us all the way there.

Let $L, M \in \mathbb{Z}$ be such that $|l – L| \leq \frac{1}{2}$ and $|m – M| \leq \frac{1}{2}$.

It is obvious that such an $L, M$ can always be found, since every real number lies at most $\frac{1}{2}$ away from an integer. To be pedantic, it follows from the Archimedean property of $\mathbb{R}$, which itself can be deduced from the standard properties of $\mathbb{R}$ including, most crucially, completeness.

Then $\frac{a}{b} = L + Mi + (l-L) + (m-M)i$. This is easily seen algebraically, but intuitively it just says that the quotient $\frac{a}{b} \in \mathbb{C}$ is equal to our nearest Gaussian integer $L + Mi$ plus the ‘error term’ $(l-L) + (m-M)i$ which is the difference between the approximation and the exact quotient.

Multiplying by $b$, we have

$$a = (L+Mi)b + ((l-L) + (m-M)i)b$$

Since $a – (L+Mi)b$ is in $\mathbb{Z}[i]$ (because of closure), the term $((l – L) + (m-M)i)b$ must be in $\mathbb{Z}[i]$ too. If it is zero, we can choose $q = L + Mi$ and $r = 0$ and we are done. If this is the case then we didn’t actually need to imagine we were in the complex numbers because $b$ did divide $a$ exactly.

If it isn’t zero, we want to show that it at least has valuation less than that of $b$.

So assume it is nonzero, so that $((l – L) + (m-M)i) \neq 0$. Then we have

$$\begin{aligned}
N(l-L + (m-M)i)b) &= |(l – L) + (m-M)i|^2 |b|^2 \\
&\leq \left( (\frac{1}{2})^2 + (\frac{1}{2})^2) \right) |b|^2 \\
&= \frac{1}{2} |b|^2 \\
&= \frac{1}{2} N(b) < N(b).
\end{aligned}$$

If we then choose $q = L + Mi$ and $r = ((l-L) + (m-M)i)b$, the second condition holds since we have shown $N(r) < N(b)$.

Therefore $\mathbb{Z}[i]$ is a Euclidean domain.

What we have shown seems to be a thoroughly algebraic result—that a particular ring defined in a certain way is actually part of another class of even more special rings. But the proof of this is not typical of a proof in elementary algebra, and maybe that’s because this ring is made up of elements from a more ‘complete’ ring of the kind that are studied in analysis.

From the fact that $\mathbb{Z}[i]$ is a Euclidean domain we can deduce that it is a PID (principal ideal domain), and from this it follows that (among many other things) every element of $\mathbb{Z}[i]$ can be factored uniquely into irreducible elements, much like fundamental theorem of arithmetic for the integers.

If we tried to prove this fact directly, I think we’d probably flounder around endlessly, wondering where to even start. But though the graceful, unhurried steps of algebra and abstraction, we can get there—and uncover some more general truths on the way.

The first isomorphism theorem is really about homomorphisms

I often think that there are essentially two flavours of theorem in mathematics. Some are direct statements asserting particular truths about important special cases, while others are simple-looking statements asserting more general truths that, rather than merely giving us ‘predictive power’ over a particular locale of the abstract mathematical landscape, give us a true taste of the underlying nature of some object (or commonly, some relation between objects—although that is in itself an object, of course).

Theorems of the first kind, I think, could include the celebrated and very recent Green-Tao theorem that asserts the existence of arbitrarily long arithmetic progressions in the primes, or, to take a random example, the amusingly-named Hairy Ball theorem which tells us a certain property of tangent vector fields on even-dimensional n-spheres. They tell us that something is true, there is a proof given to convince us, and they more or less provide their own motivation for existence. They’re just good things to know. It’s likely these theorems have deeper implications that I’m not aware of, but I think I can rightly claim they are to an extent self-motivating and fairly restricted in scope.

A prime example of the second kind is what is often referred to as the First Isomorphism Theorem.

(In this article I will talk about the theorem in the context of group theory, but it has analogues throughout algebra that apply to other structures such as rings, vector spaces, modules and other even more general objects)

When I first saw this theorem for the first time I felt like I understood it. I mean, it seems simple enough.

Let $\varphi: G \to H$ be a group homomorphism.

Then $\ker(\varphi) \trianglelefteq G$ and $G / \ker(\varphi) \cong \varphi(G)$

I thought, OK, so there’s a homomorphism between two groups and it says that that particular quotient (by the kernel of the homomorphism) is isomorphic to the image of that homomorphism. Also, the kernel (the set of elements that map to the identity) is a normal subgroup of the domain. Wait, so there are two maps going on here—a homomorphism and an isomorphism. And they’re related? Actually, I’m confused now. Who cares that this correspondence exists anyway?

Sometimes it’s stated in a slightly more complicated-looking way where the isomorphism is explicitly defined.

… then there is a group isomorphism $\bar{\varphi}: G / \ker(\varphi) \to \varphi(G)$ such that $\ker(\varphi) g \mapsto \varphi(g)$.

This map $\bar{\varphi}$ is referred to as the induced (or canonical, or something) isomorphism, but is really just extraneous strength that I don’t think is part of the important point of the theorem, so we can safely ignore it for now.

Now, I had two doubts at this point. The first is the question of what this theorem means, and the second is the question of what the point of homomorphisms actually is. Isomorphisms are clearly useful—they tell us when to stop looking. Two things are isomorphic if they’re basically the same thing, and further differences can be considered unimportant facts of the particular representation or implementation details, if you like. But homomorphisms? They seem to preserve something but corrupt some of it at the same time.

When I realised the true importance of this theorem and what it really says—what Eugenia Cheng and others might call its ‘moral message’—I wondered why it was called the first isomorphism theorem. Surely it should rather be called the first homomorphism theorem, or perhaps since there is really only one of these theorems, the fundamental homomorphism theorem. Alas, upon googling briefly I find that I am not as profound as I think I am—it seems to be referred to by some as exactly that.

So what is its moral message?

The fundamental homomorphism theorem (I’ll call it that now) tell us fundamentally, what homomorphisms do. Surprise, surprise!

We can split all possible homomorphisms into essentially two kinds: embeddings and quotients. Embeddings are injective; quotients aren’t.

Embeddings do just what they say on the tin. They map a particular group into another group without blurring the distinction between elements. Examples include maps like $\theta_1: \mathbb{Z} \to \mathbb{R}$ such that $x \mapsto x$ or $\theta_2: \mathbb{Z}_2 \to \mathbb{Z}_6$ such that $x \mapsto 2x$.

You can think of the first group as being ’embedded’ in the second in the sense that there is an actual lookalike copy of it (an isomorphic subgroup) contained within the second. In fact, the image of a homomorphism is always a subgroup (though not necessarily a normal one) of the codomain. This is sometimes included in the statement of the first isomorphism theorem, although it is really a straightforward corollary. In the case of $\theta_1$, the integers are embedded in the reals in the obvious sense, and in the case of $\theta_2$, the integers under addition modulo $2$ are embedded in the integers modulo $6$.

Quotients, on the other hand, do blur the distinction between group elements as they are mapped into the new group. This is clearly more complicated than a simple embedding. Is there an intuitive picture we can use to visualise the image of these kind of homomorphisms in the same way we can with embeddings? Yes, and the fundamental homomorphism theorem tells us how. It says that images of quotient maps are quotient groups—in particular, quotients by the kernel. The clue was in the name!

So the theorem is really quite intuitive: it tells us exactly how homomorphisms blur groups as they map them into other groups. If the kernel is trivial, the homomorphism preserves the group entirely and we have an embedding, and if it’s not trivial, it has the effect of mapping it to a subgroup that is isomorphic to the original group ‘quotiented out by’ the kernel. Of course, if we map a group into another group using some arbitrary function, we may end up with some strange subset that isn’t even a group. But the theorem tells us that as long as we have a homomorphism, that is, a function that is compatible with the group operation, the subset we map to will always be a group—and even better, we know exactly what kind of group it will be.

That’s the first question sorted; we now know what the fundamental homomorphism theorem says. How about the second question? What is the point of homomorphisms, anyway?

I think an example will be enlightening here.

Take the determinant map $\det: GL_{n}(\mathbb{R}) \to \mathbb{R}^*$ mapping from the general linear group of degree $n$ (the set of all $n \times n$ real-valued invertible matrices) to the nonzero real numbers. Considering $GL_n(\mathbb{R})$ and $\mathbb{R}^*$ each as groups under multiplication, the determinant is a homomorphism of groups since

$$\det(AB) = \det(A)\det(B) \text{ for } A, B \in M_{n \times n}(\mathbb{R})$$

as is fairly easily proven.

The point of the determinant is that takes a matrix, which is a rather complicated thing, and associates with it a relatively simpler thing, a real number, so that we can deal with that instead. The determinant of a matrix doesn’t capture all of the information about that matrix, but it certainly tells us something important. For example, matrices with nonzero determinant are precisely the invertible matrices, and are the ones that represent injective linear maps. The determinant, geometrically, can be thought of as the scaling factor of the linear transformation the matrix represents.

The fact that it is a homomorphism allows us to freely move between the parallel worlds of matrices and their associated real numbers without running into contradictions. If, say, we want to find the determinant of a product of two matrices, we can compute the product and then find its determinant, or we can just multiply the two determinants as real numbers. The two worlds have a nice correspondence in this way. If our function wasn’t a homomorphism, the correspondence would not be exact and moving between the two could cause some issues.

Despite this niceness, even with a homomorphism something is clearly lost. We can easily find two matrices which have the same determinant, which means we can’t even seem to distinguish between them purely on the basis of their determinants. But this is the tradeoff for simplifying our lives. What we have found is that the determinant is a non-injective homomorphism (a quotient map!) and the fundamental homomorphism theorem will tell us its structure, no questions asked.

The kernel of this particular determinant map is $\{A \in GL_n(\mathbb{R}) : \det(A) = 1\}$, the set of real matrices with determinant $1$ (the multiplicative identity in $\mathbb{R}$). It’s usually known as the special linear group $SL_n(\mathbb{R})$ and it’s the set of linear transformations that preserve volume (and orientation—meaning that they don’t ‘flip’ space).

By the fundamental theorem and the fact that the determinant map is surjective (into the nonzeroreals), we have

$$GL_n(\mathbb{R}) / SL_n(\mathbb{R}) \cong \det(GL_n(\mathbb{R})) = \mathbb{R} \setminus \{0\}$$

and so we essentially have the real numbers (without zero). What the determinant has done is to identify certain matrices that have some deep property in common. The group element corresponding to the real number $3$, for example, is in reality the infinite set $\{A \in GL_n(\mathbb{R}) : \det(A) = 3\}$ in a way analogous to the way that the single element $1$ of $\mathbb{Z}_2$ can really be thought of as the infinite set $2\mathbb{Z} + 1 = \{\dots,-3,-1,1,3,\dots\}$ of $\mathbb{Z} / 2\mathbb{Z}$, which is no surprise since we are dealing with quotient groups after all. Determinants of matrices behave just like the real numbers with respect to their multiplication.

Another example can be seen in the complex numbers.

Consider the set of nonzero complex numbers, $\mathbb{C}^*$. Again, they form a group with respect to multiplication. The modulus, $\varphi: \mathbb{C}^* \to \mathbb{R}$ such that $a + bi \mapsto |a+bi| := \sqrt{a^2 + b^2}$, is a homomorphism into the reals, since

$$|xy| = |x||y| \text{ for } x, y \in \mathbb{C}$$

as is also easily proven (use the relationship between the modulus and the complex conjugate!)

Since $\ker(\varphi) = \{x \in \mathbb{C}^* : |x| = 1\}$ (the unit circle in the complex plane), the fundamental theorem tells us (with the knowledge that $\varphi$ is surjective into the nonnegative reals) that

$$\mathbb{C}^* / \{x \in \mathbb{C}^* : |x| = 1\} \cong \varphi(\mathbb{C}^*) = \mathbb{R}_{>0}$$

So if we take the complex plane and ‘mod out’ by the unit circle, we end up with the positive real axis! The geometric interpretation of this is that we have identified complex numbers (two-dimensional vectors) whose differences lie in the kernel, that is, whose difference is a unit complex number. Each set of identified vectors only differ in their direction, since multiplying by a unit complex number doesn’t change the magnitude. The plane is partitioned into an infinite family of circular sets of points, one for each possible length of vector (there is one for each nonnegative real number) from the origin, with each set containing all possible angles from $0$ to $2\pi$.

The fact that this quotient gives us the reals is perhaps surprising at first glance, but by considering a well-known homomorphism through the lens of the fundamental theorem it becomes obvious.

To sum up, the power of the fundamental theorem of homomorphisms is that it identifies quotient maps (non-injective homomorphisms) with quotient groups (groups of cosets). They are really two sides of the same coin.

In full generality, it says that a structure-preserving map achieves exactly the same thing as a quotient object.

Lastly, another neat feature of the theorem worth mentioning is that it gives us a different vantage point from which to view the notion of a normal subgroup. The set of normal subgroups of a group $G$ is precisely the set of kernels of homomorphisms from $G$ out to some other group. Clearly if we adopt this viewpoint it becomes easier to see why normality is required when forming a quotient group.

Abelianness and equivalent definitions in general

In group theory, we say that a group $(G, \cdot)$ is abelian if (and only if) $a \cdot b = b \cdot a$ for all $a, b \in G$.

Abelian groups have certain properties that are generally (that is in the mathematical sense – meaning always) true, so if we know that a group we are dealing with is abelian, we can instantly deduce a variety of things about the group and how it behaves.

Once we know a group is abelian, we can make some useful deductions. Does it work the other way around? Are there truths that will allow us to conclude that a group is abelian?

For what statements are both of these true? What statements imply, and are implied by, the fact that a group is abelian? Essentially, we are asking:

What properties are equivalent to abelianness?

From here onwards we’ll use juxtaposition (like $ab$) rather than an explicit symbol (like $a \cdot b$) to indicate the group operation acting on elements $a$ and $b$.

As an example, take $$(xy)^2 = x^2 y^2 \;\forall x, y \in G$$ Clearly, if G is abelian this is true. Why? Because $$(xy)^2 = (xy)(xy) = x(yx)y = x(xy)y = (xx)(yy) = x^2 y^2$$ using the fact that the group operation is associative and $G$ is abelian.

However, if we write out our first condition slightly differently

$$xyxy = xxyy$$

and then apply the ‘cancellation property’ of groups (cancelling the leftmost $x$ and the rightmost $y$), we end up with

$$yx = xy$$

Succinctly, we have proven

$$(xy)^2 = x^2 y^2 \iff xy = yx$$

for arbitrary elements $x,y$ of a group $G$. Since the elements are arbitrary, we have proven that $G$ is abelian if and only if our original condition is true.

In fact, we might as well define a group to be abelian if it satisfies this property. It may seem strange, but this doesn’t change anything about the mathematics at all. Which definition of the long (and in some sense infinite) list of equivalent definitions we happen to choose to be ‘the definition’ is an entirely human distinction. We can take whichever one we like, and the others are then just consequences of this definition (theorems in our theory).

How about another example? Define the direct product of groups $(G, \circ_{G})$ and $(H, \circ_{H})$, denoted $G \oplus H$, to be the cartesian product $G \times H = \{ (g,h) : g \in G, h \in H \}$ with the group operation $\circ$ defined such that $(g_1, h_1) \circ (g_2, h_2) = (g_1 \circ_{G} g_2, h_1 \circ_{H} h_2)$.

Clearly, the direct product of some number of cyclic groups is abelian, since the cyclic groups themselves are abelian, and the componentwise multiplication of the direct product reduces abelianness in the product to abelianness in each participating group.

It is a much deeper fact that the converse is true too: every (finitely generated) abelian group, say $G$, is isomorphic to the direct product of some number of cyclic groups: this is the so-called Fundamental Theorem:

$$G \cong \bigoplus_{i=1}^{n} \mathbb{Z}_{k_i}$$

where $k_1, k_2, \dots, k_n$ are prime powers.

Again, we might as well define an abelian group to be one that has this form, and then the fact that an abelian group’s elements all commute with each other is just another provable theorem.

I think this subtle and perhaps surprising notion of equivalence is rather interesting. It is conceptually easy to prove that two statements are imply each other: just assume one and prove the other, and then repeat the other way around. But to state that the two statements are in fact perfectly equivalent—there exists no universe in which one of these statements is true and the other false—seems like a much stronger thing. And yet it isn’t.

What other interesting equivalences (or alternative definitions) are there in mathematics?

Well, for example: the definition of a prime number.

Most people, if asked, will probably state that a number is prime if and only if it has no factors other than itself and $1$. This seems reasonable, and indeed it is the original motivating definition. However, we can also say that a number $n$ is prime if and only if whenever $n$ divides $ab$, it divides $a$ or $b$ (or both). The first implication (that this is true if $n$ is prime) is a classical theorem called Euclid’s lemma, but in fact the converse is true too and so the statements are equivalent. We might as well define prime numbers this way.

However, something interesting happens when we move from the integers to something more general. In a general ring, these apparently equivalent definitions separate into two distinct notions called ‘prime’ and ‘irreducible’. It just so happens that in the integers (and in general, in a certain class of rings called GCD domains) they coincide.

Another famous example of an interesting equivalence is of course due to the French mathematican Augustin-Louis Cauchy.

The standard definition of convergence of a real sequence is the following:

A sequence $(a_n): \mathbb{N} \to \mathbb{R}$ is said to converge (to $L \in \mathbb{R}$) if for all $\epsilon > 0$ there exists an $N \in \mathbb{N}$ such that for all $n > N$ we have $\lvert a_n – L \rvert < \epsilon$.

In words, it is saying that a sequence converges if and only if we can make the value of the sequence arbitrarily close (as close as we like) to the limit point $L$ by moving sufficiently far along the sequence towards infinity.

There are some functions that satisfy this definition for some (unique!) value of $L$, and some that don’t for any. Another similar but subtly different property a function might have is the following:

A sequence $(a_n): \mathbb{N} \to \mathbb{R}$ is said to be Cauchy if for all $\epsilon > 0$ there exists an $N \in \mathbb{N}$ such that for all $n, m > N$ we have $\lvert a_n – a_m \rvert < \epsilon$.

This one is saying, in words, that a sequence is Cauchy if and only if the terms in the sequence can be made arbitrarily close to each other by moving sufficiently far along the sequence.

As you may have suspected, these two properties turn out to imply each other: convergent sequences are Cauchy, and Cauchy sequences converge. The first implication is the easier one to prove, but both are true and therefore the two properties are equivalent. They are both perfectly good characterisations of convergence. This is particularly convenient since they contain an important conceptual difference: the second definition makes no mention of a limit, so we can prove convergence to some limit even when we don’t know what that limit actually is. Having these kind of equivalences is actually quite powerful!

So, what is the meaning of all of this? The question is perhaps not a mathematical one, and even then I’m not exactly sure of the answer.

My impression is this: definitions are less important than often assumed; it is the underlying properties of mathematical objects that are fundamental. Definitions are just a human way of putting an abstract concept into one-to-one correspondence with a more concrete ‘testable’ truth.

There are some sequences that are Cauchy, and some sequences that converge. In fact, we are talking about the same set of sequences that all have some ‘underlying property’ in common, but as humans we like to anchor this property to things we can more readily pin down in symbols—even if we end up doing so from two quite different angles.