## The first isomorphism theorem is really about homomorphisms

I often think that there are essentially two flavours of theorem in mathematics. Some are direct statements asserting particular truths about important special cases, while others are simple-looking statements asserting more general truths that, rather than merely giving us ‘predictive power’ over a particular locale of the abstract mathematical landscape, give us a true taste of the underlying nature of some object (or commonly, some relation between objects—although that is in itself an object, of course).

Theorems of the first kind, I think, could include the celebrated and very recent Green-Tao theorem that asserts the existence of arbitrarily long arithmetic progressions in the primes, or, to take a random example, the amusingly-named Hairy Ball theorem which tells us a certain property of tangent vector fields on even-dimensional n-spheres. They tell us that something is true, there is a proof given to convince us, and they more or less provide their own motivation for existence. They’re just good things to know. It’s likely these theorems have deeper implications that I’m not aware of, but I think I can rightly claim they are to an extent self-motivating and fairly restricted in scope.

A prime example of the second kind is what is often referred to as the First Isomorphism Theorem.

(In this article I will talk about the theorem in the context of group theory, but it has analogues throughout algebra that apply to other structures such as rings, vector spaces, modules and other even more general objects)

When I first saw this theorem for the first time I felt like I understood it. I mean, it seems simple enough.

Let $\varphi: G \to H$ be a group homomorphism.

Then $\ker(\varphi) \trianglelefteq G$ and $G / \ker(\varphi) \cong \varphi(G)$

I thought, OK, so there’s a homomorphism between two groups and it says that that particular quotient (by the kernel of the homomorphism) is isomorphic to the image of that homomorphism. Also, the kernel (the set of elements that map to the identity) is a normal subgroup of the domain. Wait, so there are two maps going on here—a homomorphism and an isomorphism. And they’re related? Actually, I’m confused now. Who cares that this correspondence exists anyway?

Sometimes it’s stated in a slightly more complicated-looking way where the isomorphism is explicitly defined.

… then there is a group isomorphism $\bar{\varphi}: G / \ker(\varphi) \to \varphi(G)$ such that $\ker(\varphi) g \mapsto \varphi(g)$.

This map $\bar{\varphi}$ is referred to as the induced (or canonical, or something) isomorphism, but is really just extraneous strength that I don’t think is part of the important point of the theorem, so we can safely ignore it for now.

Now, I had two doubts at this point. The first is the question of what this theorem means, and the second is the question of what the point of homomorphisms actually is. Isomorphisms are clearly useful—they tell us when to stop looking. Two things are isomorphic if they’re basically the same thing, and further differences can be considered unimportant facts of the particular representation or implementation details, if you like. But homomorphisms? They seem to preserve something but corrupt some of it at the same time.

When I realised the true importance of this theorem and what it really says—what Eugenia Cheng and others might call its ‘moral message’—I wondered why it was called the first isomorphism theorem. Surely it should rather be called the first homomorphism theorem, or perhaps since there is really only one of these theorems, the fundamental homomorphism theorem. Alas, upon googling briefly I find that I am not as profound as I think I am—it seems to be referred to by some as exactly that.

So what is its moral message?

The fundamental homomorphism theorem (I’ll call it that now) tell us fundamentally, what homomorphisms do. Surprise, surprise!

We can split all possible homomorphisms into essentially two kinds: embeddings and quotients. Embeddings are injective; quotients aren’t.

Embeddings do just what they say on the tin. They map a particular group into another group without blurring the distinction between elements. Examples include maps like $\theta_1: \mathbb{Z} \to \mathbb{R}$ such that $x \mapsto x$ or $\theta_2: \mathbb{Z}_2 \to \mathbb{Z}_6$ such that $x \mapsto 2x$.

You can think of the first group as being ’embedded’ in the second in the sense that there is an actual lookalike copy of it (an isomorphic subgroup) contained within the second. In fact, the image of a homomorphism is always a subgroup (though not necessarily a normal one) of the codomain. This is sometimes included in the statement of the first isomorphism theorem, although it is really a straightforward corollary. In the case of $\theta_1$, the integers are embedded in the reals in the obvious sense, and in the case of $\theta_2$, the integers under addition modulo $2$ are embedded in the integers modulo $6$.

Quotients, on the other hand, do blur the distinction between group elements as they are mapped into the new group. This is clearly more complicated than a simple embedding. Is there an intuitive picture we can use to visualise the image of these kind of homomorphisms in the same way we can with embeddings? Yes, and the fundamental homomorphism theorem tells us how. It says that images of quotient maps are quotient groups—in particular, quotients by the kernel. The clue was in the name!

So the theorem is really quite intuitive: it tells us exactly how homomorphisms blur groups as they map them into other groups. If the kernel is trivial, the homomorphism preserves the group entirely and we have an embedding, and if it’s not trivial, it has the effect of mapping it to a subgroup that is isomorphic to the original group ‘quotiented out by’ the kernel. Of course, if we map a group into another group using some arbitrary function, we may end up with some strange subset that isn’t even a group. But the theorem tells us that as long as we have a homomorphism, that is, a function that is compatible with the group operation, the subset we map to will always be a group—and even better, we know exactly what kind of group it will be.

That’s the first question sorted; we now know what the fundamental homomorphism theorem says. How about the second question? What is the point of homomorphisms, anyway?

I think an example will be enlightening here.

Take the determinant map $\det: GL_{n}(\mathbb{R}) \to \mathbb{R}^*$ mapping from the general linear group of degree $n$ (the set of all $n \times n$ real-valued invertible matrices) to the nonzero real numbers. Considering $GL_n(\mathbb{R})$ and $\mathbb{R}^*$ each as groups under multiplication, the determinant is a homomorphism of groups since

$$\det(AB) = \det(A)\det(B) \text{ for } A, B \in M_{n \times n}(\mathbb{R})$$

as is fairly easily proven.

The point of the determinant is that takes a matrix, which is a rather complicated thing, and associates with it a relatively simpler thing, a real number, so that we can deal with that instead. The determinant of a matrix doesn’t capture all of the information about that matrix, but it certainly tells us something important. For example, matrices with nonzero determinant are precisely the invertible matrices, and are the ones that represent injective linear maps. The determinant, geometrically, can be thought of as the scaling factor of the linear transformation the matrix represents.

The fact that it is a homomorphism allows us to freely move between the parallel worlds of matrices and their associated real numbers without running into contradictions. If, say, we want to find the determinant of a product of two matrices, we can compute the product and then find its determinant, or we can just multiply the two determinants as real numbers. The two worlds have a nice correspondence in this way. If our function wasn’t a homomorphism, the correspondence would not be exact and moving between the two could cause some issues.

Despite this niceness, even with a homomorphism something is clearly lost. We can easily find two matrices which have the same determinant, which means we can’t even seem to distinguish between them purely on the basis of their determinants. But this is the tradeoff for simplifying our lives. What we have found is that the determinant is a non-injective homomorphism (a quotient map!) and the fundamental homomorphism theorem will tell us its structure, no questions asked.

The kernel of this particular determinant map is $\{A \in GL_n(\mathbb{R}) : \det(A) = 1\}$, the set of real matrices with determinant $1$ (the multiplicative identity in $\mathbb{R}$). It’s usually known as the special linear group $SL_n(\mathbb{R})$ and it’s the set of linear transformations that preserve volume (and orientation—meaning that they don’t ‘flip’ space).

By the fundamental theorem and the fact that the determinant map is surjective (into the nonzeroreals), we have

$$GL_n(\mathbb{R}) / SL_n(\mathbb{R}) \cong \det(GL_n(\mathbb{R})) = \mathbb{R} \setminus \{0\}$$

and so we essentially have the real numbers (without zero). What the determinant has done is to identify certain matrices that have some deep property in common. The group element corresponding to the real number $3$, for example, is in reality the infinite set $\{A \in GL_n(\mathbb{R}) : \det(A) = 3\}$ in a way analogous to the way that the single element $1$ of $\mathbb{Z}_2$ can really be thought of as the infinite set $2\mathbb{Z} + 1 = \{\dots,-3,-1,1,3,\dots\}$ of $\mathbb{Z} / 2\mathbb{Z}$, which is no surprise since we are dealing with quotient groups after all. Determinants of matrices behave just like the real numbers with respect to their multiplication.

Another example can be seen in the complex numbers.

Consider the set of nonzero complex numbers, $\mathbb{C}^*$. Again, they form a group with respect to multiplication. The modulus, $\varphi: \mathbb{C}^* \to \mathbb{R}$ such that $a + bi \mapsto |a+bi| := \sqrt{a^2 + b^2}$, is a homomorphism into the reals, since

$$|xy| = |x||y| \text{ for } x, y \in \mathbb{C}$$

as is also easily proven (use the relationship between the modulus and the complex conjugate!)

Since $\ker(\varphi) = \{x \in \mathbb{C}^* : |x| = 1\}$ (the unit circle in the complex plane), the fundamental theorem tells us (with the knowledge that $\varphi$ is surjective into the nonnegative reals) that

$$\mathbb{C}^* / \{x \in \mathbb{C}^* : |x| = 1\} \cong \varphi(\mathbb{C}^*) = \mathbb{R}_{>0}$$

So if we take the complex plane and ‘mod out’ by the unit circle, we end up with the positive real axis! The geometric interpretation of this is that we have identified complex numbers (two-dimensional vectors) whose differences lie in the kernel, that is, whose difference is a unit complex number. Each set of identified vectors only differ in their direction, since multiplying by a unit complex number doesn’t change the magnitude. The plane is partitioned into an infinite family of circular sets of points, one for each possible length of vector (there is one for each nonnegative real number) from the origin, with each set containing all possible angles from $0$ to $2\pi$.

The fact that this quotient gives us the reals is perhaps surprising at first glance, but by considering a well-known homomorphism through the lens of the fundamental theorem it becomes obvious.

To sum up, the power of the fundamental theorem of homomorphisms is that it identifies quotient maps (non-injective homomorphisms) with quotient groups (groups of cosets). They are really two sides of the same coin.

In full generality, it says that a structure-preserving map achieves exactly the same thing as a quotient object.

Lastly, another neat feature of the theorem worth mentioning is that it gives us a different vantage point from which to view the notion of a normal subgroup. The set of normal subgroups of a group $G$ is precisely the set of kernels of homomorphisms from $G$ out to some other group. Clearly if we adopt this viewpoint it becomes easier to see why normality is required when forming a quotient group.

## Abelianness and equivalent definitions in general

In group theory, we say that a group $(G, \cdot)$ is abelian if (and only if) $a \cdot b = b \cdot a$ for all $a, b \in G$.

Abelian groups have certain properties that are generally (that is in the mathematical sense – meaning always) true, so if we know that a group we are dealing with is abelian, we can instantly deduce a variety of things about the group and how it behaves.

Once we know a group is abelian, we can make some useful deductions. Does it work the other way around? Are there truths that will allow us to conclude that a group is abelian?

For what statements are both of these true? What statements imply, and are implied by, the fact that a group is abelian? Essentially, we are asking:

From here onwards we’ll use juxtaposition (like $ab$) rather than an explicit symbol (like $a \cdot b$) to indicate the group operation acting on elements $a$ and $b$.

As an example, take $$(xy)^2 = x^2 y^2 \;\forall x, y \in G$$ Clearly, if G is abelian this is true. Why? Because $$(xy)^2 = (xy)(xy) = x(yx)y = x(xy)y = (xx)(yy) = x^2 y^2$$ using the fact that the group operation is associative and $G$ is abelian.

However, if we write out our first condition slightly differently

$$xyxy = xxyy$$

and then apply the ‘cancellation property’ of groups (cancelling the leftmost $x$ and the rightmost $y$), we end up with

$$yx = xy$$

Succinctly, we have proven

$$(xy)^2 = x^2 y^2 \iff xy = yx$$

for arbitrary elements $x,y$ of a group $G$. Since the elements are arbitrary, we have proven that $G$ is abelian if and only if our original condition is true.

In fact, we might as well define a group to be abelian if it satisfies this property. It may seem strange, but this doesn’t change anything about the mathematics at all. Which definition of the long (and in some sense infinite) list of equivalent definitions we happen to choose to be ‘the definition’ is an entirely human distinction. We can take whichever one we like, and the others are then just consequences of this definition (theorems in our theory).

How about another example? Define the direct product of groups $(G, \circ_{G})$ and $(H, \circ_{H})$, denoted $G \oplus H$, to be the cartesian product $G \times H = \{ (g,h) : g \in G, h \in H \}$ with the group operation $\circ$ defined such that $(g_1, h_1) \circ (g_2, h_2) = (g_1 \circ_{G} g_2, h_1 \circ_{H} h_2)$.

Clearly, the direct product of some number of cyclic groups is abelian, since the cyclic groups themselves are abelian, and the componentwise multiplication of the direct product reduces abelianness in the product to abelianness in each participating group.

It is a much deeper fact that the converse is true too: every (finitely generated) abelian group, say $G$, is isomorphic to the direct product of some number of cyclic groups: this is the so-called Fundamental Theorem:

$$G \cong \bigoplus_{i=1}^{n} \mathbb{Z}_{k_i}$$

where $k_1, k_2, \dots, k_n$ are prime powers.

Again, we might as well define an abelian group to be one that has this form, and then the fact that an abelian group’s elements all commute with each other is just another provable theorem.

I think this subtle and perhaps surprising notion of equivalence is rather interesting. It is conceptually easy to prove that two statements are imply each other: just assume one and prove the other, and then repeat the other way around. But to state that the two statements are in fact perfectly equivalent—there exists no universe in which one of these statements is true and the other false—seems like a much stronger thing. And yet it isn’t.

What other interesting equivalences (or alternative definitions) are there in mathematics?

Well, for example: the definition of a prime number.

Most people, if asked, will probably state that a number is prime if and only if it has no factors other than itself and $1$. This seems reasonable, and indeed it is the original motivating definition. However, we can also say that a number $n$ is prime if and only if whenever $n$ divides $ab$, it divides $a$ or $b$ (or both). The first implication (that this is true if $n$ is prime) is a classical theorem called Euclid’s lemma, but in fact the converse is true too and so the statements are equivalent. We might as well define prime numbers this way.

However, something interesting happens when we move from the integers to something more general. In a general ring, these apparently equivalent definitions separate into two distinct notions called ‘prime’ and ‘irreducible’. It just so happens that in the integers (and in general, in a certain class of rings called GCD domains) they coincide.

Another famous example of an interesting equivalence is of course due to the French mathematican Augustin-Louis Cauchy.

The standard definition of convergence of a real sequence is the following:

A sequence $(a_n): \mathbb{N} \to \mathbb{R}$ is said to converge (to $L \in \mathbb{R}$) if for all $\epsilon > 0$ there exists an $N \in \mathbb{N}$ such that for all $n > N$ we have $\lvert a_n – L \rvert < \epsilon$.

In words, it is saying that a sequence converges if and only if we can make the value of the sequence arbitrarily close (as close as we like) to the limit point $L$ by moving sufficiently far along the sequence towards infinity.

There are some functions that satisfy this definition for some (unique!) value of $L$, and some that don’t for any. Another similar but subtly different property a function might have is the following:

A sequence $(a_n): \mathbb{N} \to \mathbb{R}$ is said to be Cauchy if for all $\epsilon > 0$ there exists an $N \in \mathbb{N}$ such that for all $n, m > N$ we have $\lvert a_n – a_m \rvert < \epsilon$.

This one is saying, in words, that a sequence is Cauchy if and only if the terms in the sequence can be made arbitrarily close to each other by moving sufficiently far along the sequence.

As you may have suspected, these two properties turn out to imply each other: convergent sequences are Cauchy, and Cauchy sequences converge. The first implication is the easier one to prove, but both are true and therefore the two properties are equivalent. They are both perfectly good characterisations of convergence. This is particularly convenient since they contain an important conceptual difference: the second definition makes no mention of a limit, so we can prove convergence to some limit even when we don’t know what that limit actually is. Having these kind of equivalences is actually quite powerful!

So, what is the meaning of all of this? The question is perhaps not a mathematical one, and even then I’m not exactly sure of the answer.

My impression is this: definitions are less important than often assumed; it is the underlying properties of mathematical objects that are fundamental. Definitions are just a human way of putting an abstract concept into one-to-one correspondence with a more concrete ‘testable’ truth.

There are some sequences that are Cauchy, and some sequences that converge. In fact, we are talking about the same set of sequences that all have some ‘underlying property’ in common, but as humans we like to anchor this property to things we can more readily pin down in symbols—even if we end up doing so from two quite different angles.