Towards the p-adic World – Part III

In the last post, we studied the algebraic definition of p-adic numbers and Hensel’s lemma. In Part III here, we are study Krasner’s lemma and its applications. Our goal is to understand how analysis governs algebra in the p-adic world, which is quite different from the archimedean world.

Before we start, we’ll briefly study about p-adic norms on finite extensions of \mathbb{Q}_{p}. Let’s start with a simplest nontrivial example: a quadratic extension. Recall that the equation x^{2} = 2 has no zero in \mathbb{Q}_{5}. So we can consider degree 2 extension of \mathbb{Q}_{5}, \mathbb{Q}_{5}(\sqrt{2}), by adjoining \sqrt{2}. Now, we want to extend our 5-adic norm |\cdot |_{5} to \mathbb{Q}_{5}(\sqrt{2}). The only condition we’ll impose is the following: we want the extended norm is invariant under conjugation. More precisely, in the case of \mathbb{Q}_{5}(\sqrt{2}), we want to have |a + b\sqrt{2}|_5 = |a - b\sqrt{2}|_5 for all a, b\in \mathbb{Q}_5. This implies

\displaystyle |a+b\sqrt{2}|_5 = |(a+b\sqrt{2})(a-b\sqrt{2})|_{5}^{1/2} = |a^{2} -5b^{2}|_{5}^{1/2}

and one can show that this actually defines a non-archimedean norm on \mathbb{Q}_{5}(\sqrt{2}). More generally, for a given irreducible polynomial f(x) = x^{n} + a_{n-1}x^{n-1}  + \cdots + a_{1} x + a_{0} \in \mathbb{Q}_{p} and its root \alpha \in \overline{\mathbb{Q}_{p}}, we define its p-adic norm as

\displaystyle |\alpha|_{p} := |\mathrm{Norm}(\alpha)|_{p}^{1/n} = |a_{0}|_{p}^{1/n}.

(Here \mathrm{Norm} = \mathrm{Norm}_{\mathbb{Q}_{p}(\alpha)/\mathbb{Q}_{p}} is determinant of the multiplication map m_{\alpha}:\mathbb{Q}_{p}(\alpha)\to \mathbb{Q}_{p}(\alpha), x\mapsto \alpha x, which is just (-1)^{n-1}a_{0}.) So p-adic norms of elements \alpha \in \overline{\mathbb{Q}_p} has a form p^{r} with r\in \mathbb{Q}.

Our main theorem for this post is the following:

Theorem (informal). For two given irreducible polynomials f(x), g(x) \in \mathbb{Q}_{p}[x], splitting fields of f, g over \mathbb{Q}_{p} coincide when f and g are sufficiently close.

To prove this, we first need to define what “sufficiently close polynomials” means.

Definition 1. For a given polynomial f(x) = a_{n}x^{n} + \cdots + a_{1}x + a_{0} \in \mathbb{Q}_p[x], we define its norm as

\displaystyle |f| := \max_{0\leq i\leq n}\{|a_{i}|\}

Theorem 1. Let f(x) be a polynomial in \mathbb{Q}_p[x]. If {\alpha} is a zero of {f(x)}, we have

\displaystyle |\alpha| \leq \max\left\{ 1, \sum_{i=0}^{n-1} \Big| \frac{a_{i}}{a_{n}} \Big|\right\}.

Proof. Indeed, there’s nothing to prove for {|\alpha| \leq 1}, and if {|\alpha|>1}, we have

\displaystyle |\alpha|^{n} = \Big| \frac{a_{n-1}}{a_{n}} \alpha^{n-1} + \cdots + \frac{a_{0}}{a_{n}} \Big| \leq \max_{0\leq i\leq n-1} \left\{ \Big| \frac{a_{i}}{a_{n}}\alpha^{i}\Big| \right\} \leq |\alpha|^{n-1}\sum_{i=0}^{n-1} \Big| \frac{a_{i}}{a_{n}}\Big|

which proves the inequality.

Now we will show that the zeros of a polynomial over \mathbb{Q}_p are continuous functions on its coefficients.

Theorem 2. Let {f(x)\in \mathbb{Q}_p[x]} be a polynomial of degree {n} with {n} distinct zeros {\alpha_1, \dots, \alpha_n} in \overline{\mathbb{Q}_{p}}. If a polynomial {g(x)} of degree {n} has all coefficients sufficiently close to those of {f(x)}, i.e. if |f-g| is sufficiently small, then it has {n} roots {\beta_1, \dots, \beta_n} which approximate the roots {\alpha_1, \dots, \alpha_n} to sufficiently high precision.

Before proving the theorem, we note that the theorem is false for polynomials over \mathbb{R}. In fact, if we consider f(x) = x^{2}, its root is 0, but zeros of f_{\epsilon}(x) = x^{2} +\epsilon aren’t even in \mathbb{R} for \epsilon > 0.

Proof. Let {f(x) = a_{n}x^{n} + \cdots + a_{n} = a_{n}(x-\alpha_{1})(x-\alpha_{2})\cdots (x-\alpha_{n})} and {\epsilon >0}. Choose {\delta >0} so that

\displaystyle \delta \leq \min \left\{ \frac{|a_{n}|}{2}, \frac{|a_{n}|\epsilon^{n}}{\sum_{i=0}^{n}M^{i}}, \frac{|a_{n}|\epsilon^{n}}{2\sum_{i=0}^{n} N^{i}}\right\}


\displaystyle M = \sum_{i=0}^{n-1}\left( 1 + 2\frac{|a_{i}|}{|a_{n}|} \right), \qquad N = \max\left\{ 1, \sum_{i=0}^{n-1} \Big| \frac{a_{i}}{a_{n}} \Big|\right\}.

Suppose that {g(x) \in \mathbb{Q}_p[x]} satisfies {|f-g|<\delta}, and let {\beta} be any root of {g}. Then {|a_{n}| \leq |a_{n} - b_{n}| + |b_{n}| < \delta + |b_{n}| \leq \frac{\delta}{2} + |b_{n}|} gives {\frac{|b_{i}|}{|b_{n}|} \leq 2\cdot \frac{|a_{i}-b_{i}| + |a_i|}{|b_n|} \leq \frac{2\delta}{|a_{n}|} + 2\frac{|a_{i}|}{|a_{n}|} \leq 1 + 2\frac{|a_{i}|}{|a_n|}}. So we get {|\beta| \leq M}. Thus

\displaystyle |f(\beta)| = |f(\beta) - g(\beta)| \leq \sum_{i=0}^{n} |a_i - b_i| |\beta|^{i}< \delta\sum_{i=0}^{n} M^{i} < |a_n| \epsilon^{n}.

Therefore {|a_{n}| \prod_{i=1}^{n} |\beta - \alpha_i| < |a_n|\epsilon^{n}\Leftrightarrow \prod_{i=1}^{n} |\beta - \alpha_i| < \epsilon^{n}}, hence one of the factors {|\beta - \alpha_i|} must be smaller than {\epsilon}. This shows that {\beta} is within {\epsilon} of a root of {f}. Conversely, we can prove that for any zero {\alpha} of {f(x)}, there exists a zero of {g(x)} very close to {\alpha} by the same argument (using {|\alpha| \leq N} and {\frac{|a_n|}{2} \leq |b_n|}). \Box

To prove our main theorem, we need one more crucial “lemma” : Krasner’s lemma.

Lemma 1 (Krasner). Let \alpha \in \overline{\mathbb{Q}_{p}} and let {\alpha = \alpha_1, \alpha_2, \dots, \alpha_n} be its conjugates over \mathbb{Q}_{p}. If {\beta\in \overline{\mathbb{Q}_{p}}} and

\displaystyle |\alpha - \beta| < |\alpha - \alpha_i|, \qquad i = 2, \dots, n,

(which means that \alpha is closer to \beta than any other its conjugates) then {\mathbb{Q}_{p}(\alpha)\subseteq \mathbb{Q}_{p}(\beta)}.

Proof. Assume that {\alpha\not\in \mathbb{Q}_{p}(\beta)}. Then {\mathbb{Q}_{p}(\alpha, \beta)/\mathbb{Q}_{p}(\beta)} is a field extension of degree {>1}. So we have a field embedding {\sigma:\mathbb{Q}_{p}(\alpha, \beta)\hookrightarrow \overline{\mathbb{Q}_{p}}} such that {\sigma|_{\mathbb{Q}_{p}(\beta)} = \mathrm{id}}, but {\sigma(\alpha) \neq \alpha}. Then {\sigma(\alpha) = \alpha_i} for some {2\leq i\leq n}, and since the p-adic norm is invariant under the conjugation

\displaystyle |\beta - \alpha| = |\sigma(\beta -\alpha)| = |\beta - \alpha_i|.

However, this implies

\displaystyle |\alpha - \alpha_i| \leq \max\{ |\alpha- \beta|, |\beta - \alpha_i|\} = |\beta - \alpha_i|,

which contradicts our assumption. \Box

From the above theorems and Krasner’s lemma, we can finally prove our main theorem which illustrates how analysis can govern algebra in {p}-adic fields.

Theorem 3 Let {f(x)\in \mathbb{Q}_p[x]} be an irreducible polynomial and let {\alpha\in \overline{\mathbb{Q}_p}} be a zero of {f(x)}. Then there exists {\delta >0} such that for all {g(x)\in \mathbb{Q}_{p}[x]} with {|f-g| <\delta}, there exists a zero {\beta\in \overline{\mathbb{Q}_p}} of {g(x)} such that {\mathbb{Q}_p(\alpha) = \mathbb{Q}_p(\beta)}. In particular, {g(x)} is also irreducible.

Proof: By Theorem 2, there exists {\delta >0} such that any polynomial {g(x) \in \mathbb{Q}_p[x]} with {|f - g| < \delta} satisfies

\displaystyle |\alpha_{i} - \beta_{i}| < \min\{ |\alpha_{i} - \beta_{j}|, |\alpha_{j} - \beta_{i}|\}

for all {1\leq i\neq j\leq }, where {\beta_{1}, \dots, \beta_{n} \in \overline{\mathbb{Q}_p}} are zeros of {g(x)}. Then Krasner’s lemma gives {\mathbb{Q}_p(\alpha_i) = \mathbb{Q}_p(\beta_i)}. \Box

Using these theorems, we can prove the following:

Theorem 4. For any {p}, the algebraic closure {\overline{\mathbb{Q}_{p}}} of {\mathbb{Q}_{p}} is not complete under the {p}-adic metric extended to {\overline{\mathbb{Q}_{p}}}. However, its completion {\mathbb{C}_{p} :=\widehat{\overline{\mathbb{Q}_{p}}}} is algebraically closed.

Proof: For the first statement, consider the sequence

\displaystyle a_{n} = \sum_{k=1}^{n} p^{n + \frac{1}{n}} \in \overline{\mathbb{Q}_{p}}

It is clear that {a_{n} \in \overline{\mathbb{Q}_{p}}}, and also the sequence {\{a_{n}\}_{n\geq 1}} is a Cauchy sequence since {|a_{n} - a_{n-1}| = |p^{n+1/n}| \leq p^{-(n+1/n)}} converges to 0. However, the limit does not exist in {\overline{\mathbb{Q}_{p}}}. Indeed, let {\beta} be any element in {\overline{\mathbb{Q}_p}} which has degree {m} over {\mathbb{Q}_p}. Then norms of each zeros should be a form of p^{a/b} for some {1\leq b \leq m}. In other words, denominators of the exponents in the norm are bounded by {m!}. However, it is not hard to check that |a_{n}|_{p} = p^{a/b} for {b = \mathrm{lcm}\{1, 2, \dots, n\}}, which tends to infinity as {n} grows. Thus {\lim_{n\rightarrow \infty} a_n} does not exist in {\overline{\mathbb{Q}_{p}}}.

To prove the second statement, let {\alpha} be a zero of some monic polynomial {f(x) = a_{n}x^{n} + \cdots + a_{0} \in \mathbb{C}_p[x]}. We want to show that {\alpha\in \mathbb{C}_p}. By the Theorem 3, we can find {g(x) \in \overline{\mathbb{Q}_{p}}[x]} such that {|f-g|} is sufficiently small so {\mathbb{Q}_{p}(\alpha) = \mathbb{Q}_{p}(\beta)} for some zero {\beta \in \overline{\mathbb{Q}_p}} of {g(x)}. This proves {\alpha \in \mathbb{C}_{p}}. \Box

In fact, the above theorem is true for any field which is complete with respect to some non-trivial non-Archimedean absolute value {|\cdot |}. See [2] for the proof.

In the next post, we will study further about finite (and infinite?) extensions of \mathbb{Q}_{p}, such as their Galois groups.


[1] Brian Conrad, Completion of Algebraic Closure,, Online note.

[2] Brian Conrad, Higher ramification groups,, Online note.

[3] Jürgen Neukirch, Algebraic number theory. Vol. 322. Springer Science & Business Media, 2013.

Towards the p-adic World – Part II

In the last post, we studied a definition of p-adic numbers and their analytic properties. In Part II, we will define \mathbb{Q}_p in an algebraic way by defining the ring of p-adic integers first. After that, we introduce Hensel’s lemma, which gives a way to compute roots of polynomials over \mathbb{Q}_p by lifting solutions over \mathbb{Z}/p^{n}\mathbb{Z}. We’ll also do some handy examples which are simple but important for extensions of p-adic fields (or more generally, non-Archimedean local fields).

First, let us define \mathbb{Q}_{p} algebraically. To do this, we will define \mathbb{Z}_{p}, the ring of p-adic integers first, and then define \mathbb{Q}_{p} as a fractional field of \mathbb{Z}_{p}.

What is \mathbb{Z}_{p}? It is the set of p-adic numbers whose expansion have zero fractional parts, i.e. p-adic numbers of the form x = \sum_{n\geq N} a_{n}p^{n} with N \geq 0. Another way to describe the set of p-adic integers \mathbb{Z}_p is to use (inverse) limit, which is

\displaystyle \mathbb{Z}_p = \lim_{\substack{\longleftarrow \\ n}} \mathbb{Z} / p^{n}\mathbb{Z}

where the maps \mathbb{Z}/p^{n+1}\mathbb{Z} \to \mathbb{Z}/p^{n}\mathbb{Z} are given by the standard reduction map. You may realize that there is no difference between the new definition and the previous definition, except that the second \mathbb{Z}_p does not yet have a norm. From this, we get a ring structure on \mathbb{Z}_p, and it can be shown that \mathbb{Z}_p is an integral domain. So we can finally define \mathbb{Q}_p as \mathbb{Q}_p := \mathrm{Frac}(\mathbb{Z}_p).

It is also possible to recover \mathbb{Z}_p from the analytic definition of \mathbb{Q}_p. In fact, a p-adic number \alpha = \sum_{n\geq N} a_{n}p^{n} has zero fractional parts if and only if N \geq 0, and from |\alpha|_{p} =  p^{-N}, this is also equivalent to |\alpha|_{p} \leq 1. So we have

\displaystyle \mathbb{Z}_{p} = \{ x \in \mathbb{Q}_{p}\,:\, |x|_{p}\leq 1\}.

We can describe the group of units of \mathbb{Z}_{p} in a similar way. It is not hard to show that, if \alpha = a_{0} +a_{1}p + a_{2}p^{2}+\cdots satisfies a_{0} \not \equiv 0\,(\mathrm{mod}\,\,p), then \alpha is a unit in \mathbb{Z}_p; this can be done by solving a bunch of equations that arise from \alpha\beta = 1 with \beta = b_{0} + b_{1}p + b_{2}p^{2} + \cdots. From this, we get

\displaystyle \mathbb{Z}_{p}^{\times} = \{x \in \mathbb{Q}_{p}\,:\, |x|_{p} = 1\}.

Now, let’s get back to our main question. We want to solve polynomial equations over \mathbb{Q}_p, and it is reasonable to start with the simplest non-trivial ones: quadratic equations.

Does there exist a solution to x^2 = 2 in \mathbb{Q}_{p}? Clearly, the answer is yes for p= \infty, i.e. \mathbb{Q}_{\infty} = \mathbb{R}, but this is not our main concern. How about other values of p? For p=2, we can check that there is no zero in \mathbb{Q}_2 by considering its 2-adic norm. Indeed, if \alpha \in \mathbb{Q}_2 satisfies \alpha^{2} = 2, then |\alpha|_{2}^{2} = |2|_{2} = 2^{-1} implies |\alpha|_{2} = 2^{-1/2}, which is impossible since every norm of element in \mathbb{Q}_2 is a form of 2^{m} for some m \in \mathbb{Z}.

Let us now consider the odd primes. For p=3, if \alpha^{2} = 2 for some \alpha \in \mathbb{Q}_3, then |\alpha|_{3}^{2} = |2|_{3} = 1 so |\alpha|_{3} = 1, i.e. \alpha \in \mathbb{Z}_3^{\times}. Then we have 3-adic expansion \alpha = a_{0} + a_{1}\cdot 3 + a_{2}\cdot 3^{2} + \cdots of \alpha. However, if we consider the equation \alpha^{2} = (a_{0} + a_{1} \cdot 3 + \cdots)^{2} = 2 mod 3, then we should have a_{0}^{2} \equiv 2 \,(\mathrm{mod}\,\,3), which is impossible since all the squares of integers are equivalent to 0 or 1 modulo 3. Hense there is no \sqrt{2} in \mathbb{Q}_{3}, and the same logic also applies to \mathbb{Q}_{5} or general \mathbb{Q}_{p} with \left(\frac{2}{p}\right) = -1 \Leftrightarrow p \equiv \pm 3\,(\mathrm{mod}\,\, 8).

How about \mathbb{Q}_7? since 3^{2} \equiv 2\,(\mathrm{mod}\,\,7), we do not encounter the same problem as in \mathbb{Q}_3. Then we can move on to mod 7^{2}; if there exists \alpha = a_{0} + a_{1}\cdot 7 + a_{2} \cdot 7^{2} + \cdots \in \mathbb{Z}_{7} with \alpha^{2} = 2, then we must have (a_{0} + a_{1}\cdot 7)^{2} \equiv 2 \,(\mathrm{mod}\,\,7). Again, we have a solution (3 + 1\cdot 7)^{2} = 10^{2} \equiv 2\,(\mathrm{mod}\,\,7). If we continue this argument, we should be able to solve the equation x^{2} \equiv 2\,(\mathrm{mod}\,\,7^{n}) for all n\geq 1, if \sqrt{2} exists in \mathbb{Z}_{7}. Now, here’s a main point: to find a solution of x^{2}= 2 in \mathbb{Z}_{7}, we’ll solve the equations x^{2} \equiv 2 \,(\mathrm{mod}\,\,7^{n}) inductively. More precisely, we’ll lift a solution \alpha_{n} \in \mathbb{Z}/7^{n}\mathbb{Z} of the equation \alpha_{n}^2 \equiv 2 \,(\mathrm{mod}\,\,7^{n}), to the solution of the same equation modulo one-step higher power 7^{n+1}. The lifted solution \alpha_{n+1} will satisfy \alpha_{n+1}^{2} \equiv 2\,(\mathrm{mod} \,\,7^{n+1}) and \alpha_{n+1} \equiv \alpha_{n} \,(\mathrm{mod}\,\,7^{n}), so that the sequence (\alpha_1, \alpha_2, \alpha_3, \dots, ) defines an element of \mathbb{Z}_7 via its algebraic definition by (inverse) limit.

Assume that we have a solution \alpha_n \in \mathbb{Z}/7^{n}\mathbb{Z} with \alpha_{n}^{2} \equiv 2\,(\mathrm{mod}\,\,7^{n}). Then we set \alpha_{n+1} := \alpha_{n} + t\cdot 7^{n} and we hope that there exists t \in \{0, 1, \dots, 6\} with \alpha_{n+1}^{2} \equiv 2 \,(\mathrm{mod}\,\, 7^{n+1}) to lift a solution over \mathbb{Z}/7^{n}\mathbb{Z} as a solution over \mathbb{Z}/7^{n+1}\mathbb{Z}. We have

\displaystyle \alpha_{n+1}^{2} = \alpha_{n}^{2} + 2t\cdot 7^{n} + t^{2}\cdot 7^{2n} \equiv \alpha_{n}^{2} + 2t\cdot 7^{n} \,(\mathrm{mod}\,\,7^{n+1})

If we write \alpha_{n}^{2} = 2 + s\cdot 7^{n} for some s\in \mathbb{Z}, then

\alpha_{n+1}^{2} \equiv 2 \,(\mathrm{mod}\,\,7^{n+1})\Leftrightarrow (2t + s)\cdot 7^{n}\equiv 0 \,(\mathrm{mod}\,\, 7^{n+1}) \Leftrightarrow 2t+s \equiv 0\,(\mathrm{mod}\,\, 7),

and such t uniquely exists as t\equiv 3s \,(\mathrm{mod}\,\,7). This shows existence of \sqrt{2} in \mathbb{Z}_{7}\subset \mathbb{Q}_{7}, and also gives us an algorithm to compute its 7-adic expansion.

Here 7 is not special at all, and we can show that for any prime p with \left(\frac{2}{p}\right) = 1\Leftrightarrow p\equiv \pm 1\,(\mathrm{mod}\,\,8), \sqrt{2} \in \mathbb{Q}_{p}. The above algorithmic approach can be generalized to any polynomials as follows.

Theorem 1. (Hensel’s lemma) Let f(x) \in \mathbb{Z}_p[x] and a \in \mathbb{Z}\subset \mathbb{Z}_p. If f(a)\equiv 0\,(\mathrm{mod}\,\,p) and f'(a)\not\equiv 0\,(\mathrm{mod}\,\,p), then there exists a unique \alpha \in \mathbb{Z}_p such that f(\alpha) = 0.

Proof. The proof is essentially same as the algorithm we described for \sqrt{2} \in \mathbb{Q}_{7}, by considering the Taylor expansion of f(x) at x = a. Invertibility of 2 in \mathbb{Z}/7\mathbb{Z} corresponds to f'(a)\not\equiv 0\,(\mathrm{mod}\,\,p) in this theorem. See [1] for the details of the proof.

Using Hensel’s lemma, we can find all roots of unity of \mathbb{Q}_p. Consider f(x) = x^{p} - x. Since f(k) = k^{p} - k \equiv 0\,(\mathrm{mod}\,\, p) and f'(k) = pk^{p-1} - 1 \equiv -1 \,(\mathrm{mod}\,\,p), Hensel’s lemma shows that there exists \omega_{k} \in \mathbb{Z}_{p} with \omega_{k}^{p} = \omega_{k} and \omega_{k}\equiv k\,(\mathrm{mod}\,\,p) for all 0\leq k < p-1. Uniqueness in Hensel’s lemma gives \omega_{0} = 0, \omega_{1} = 1, and \omega_{k}^{p-1} = 1 for 0<k<p. In fact, these are all roots of unity of \mathbb{Q}_p.

Theorem 2. The roots of unity in \mathbb{Q}_p are the (p-1)-th root of unity for odd p and \pm 1 for p = 2.

Proof. We only need to concentrate on \mathbb{Z}_p^{\times} by observing p-adic norms. First, we will show that all roots of unity with order prime to p are zeros of x^{p-1} -1=0. If \zeta_1, \zeta_2 are roots of unity with orders m_1, m_2 prime to p, then both are zeros of g(x) = x^{m}-1=0 with m = m_{1}m_{2}. Since |g'(\zeta_i)|_{p} = |m\zeta_{i-1}^{m-1}|_{p} = 1, the uniqueness of Hensel’s lemma implies that \zeta_{1} = \zeta_{2} if and only if \zeta_{1} \equiv \zeta_{2}\,(\mathrm{mod}\,\,p). Since we have (p-1)-th root of unity for each coset of p\mathbb{Z}_{p} (\omega_{k}‘s as above), the only roots of unity of order prime to p are (p-1)-th root of unity.

To show that these are all roots of unity, it is enough to show that there’s no p-th root of unity in \mathbb{Q}_p except 1. First, assume that p is odd. If \zeta^{p} = 1 for some \zeta \in \mathbb{Z}_p, then \zeta^{p} \equiv \zeta \,(\mathrm{mod}\,\,p) implies \zeta = 1 + py for some y\in\mathbb{Z}_p. By the binomial theorem, we have

\begin{aligned} 0 &= 1 + \zeta + \cdots + \zeta^{p-1} \\ &= \sum_{k=0}^{p-1} (1+py)^{k} \\ &\equiv \sum_{k=0}^{p-1} (1+kpy) \,(\mathrm{mod}\,\,p^{2}) \\ &= p + \frac{p^{2}(p-1)}{2}y \equiv p\,(\mathrm{mod} \,\,p^{2})\end{aligned}

which is impossible. For p =2, one can show that the only 4-th root of unity in \mathbb{Z}_2 are \pm 1 by looking at the equation \zeta ^{2} = -1 mod 4.

Hensel’s lemma describes a way to compute zeros of a given polynomial by lifting a simple root of the same polynomial modulo p. There’s a stronger version of Hensel’s lemma which can be applied to multiple roots.

Theorem 3 (Hensel’s lemma, Stronger version). Let f(x) \in \mathbb{Z}_{p}[x] and a \in \mathbb{Z}\subset \mathbb{Z}_{p} satisfying

\displaystyle |f(a)|_{p} < |f'(a)|_{p}^{2}.

Then there exists \alpha \in \mathbb{Z}_{p} such that f(\alpha)=0 and |\alpha - a|_p < |f'(a)|_p. Moreover, we have

(1) |\alpha - a|_{p} = \frac{|f(a)|_p}{|f'(a)|_p} < |f'(a)|_p

(2) |f'(\alpha)|_p = |f'(a)|_p.

Proof. Newton’s method. See [1] for details.

For example, we can’t apply the Theorem 1 for f(x) = x^{3} - 10 over \mathbb{Q}_3 since f(x) \equiv (x - 1)^{3}\,(\mathrm{mod}\,\,3). However, a = 4 satisfies the assumption of the Theorem 2, so we can show that there exists \alpha \in \mathbb{Z}_3 with \alpha^{3} = 10 and \alpha \equiv 4\,(\mathrm{mod} \,\,9). (Note that a = 1 also does not work, so we need to go further, i.e. to higher power of 3, to get a=4 that satisfies the assumption.)

In Part III, we are going to understand the statement “Analysis governs algebra in p-adic worlds” via Krasner’s lemma.


[1] Keith Conrad, “Hensel’s Lemma”,

Towards the p-adic World – Part I

In the following series of posts, we are going to introduce some interesting properties of p-adic numbers some of which are quite well-known or a few not so much. Let’s start with a definition of p-adic numbers. For today, we give an analytic definition. We will give an algebraic definition in the next post.

The most standard way to express a natural number is to use a decimal expansion, such as 123 = 1 \times 10^{2} + 2 \times 10^{1} + 3 \times 10^{0} or 1000000007 = 1 \times 10^{9} + 7 \times 10^{0}. Also, we can express them using a binary expansion, such as 59 = 2^{5} + 2^{4} + 2^{3} + 2^{1} + 2^{0} =  111011_{(2)}.

Now, consider the following infinite binary expansion:

\cdots 1111_{(2)} = \cdots + 2^{3} + 2^{2} + 2^{1} + 2^{0}

This is a kind of binary expansion that doesn’t seem to converge at all. If you try to apply the formula for a geometric series, we may get

\large 2^{0} + 2^{1} + 2^{2} + 2^{3} + \cdots \mathbf{=} \frac{1}{1-2} = -1

which doesn’t make sense. Right? Our aim is to make the above equation hold in some sense, which leads to a new absolute value on rational numbers \mathbb{Q}. For a given nonzero rational number r \in \mathbb{Q}, we can write it uniquely as

r = 2^{k} \cdot \frac{b}{a}

where a, b are coprime odd integers. Then, we define its 2-adic norm |r|_{2} as

|r|_{2} := 2^{-k}

and suppose |0|_{2}= 0. Clearly, this is different from the usual norm on \mathbb{Q}. This norm satisfies the following properties:

  1. ultrametric inequality: |r+s|_{2} \leq \max\{|r|_{2}, |s|_{2}\} for all r, s\in \mathbb{Q};
  2. multiplicativity: |rs|_{2} = |r|_{2}|s|_{2}.

The inequality that the 2-adic norm |\cdot |_{2} satisfies is much stronger than the usual triangle inequality. Also such a norm lets us make the above weird series (1) makes sense. More generally, for any prime p, we can do the same thing. We can express a rational number uniquely as

r = p^{k} \cdot \frac{b}{a}

where a, b are coprime integers that are each coprime to p. Define the p-adic norm |r|_{p} of r as

|r|_{p}:= p^{-k}.

A p-adic norm behaves completely differently from the usual norm on \mathbb{Q}. For example, the usual norm of p^{n} (which we denote |p^{n}|_{\infty}) grows exponentially as n gets larger while its p-adic norm |p^{n}|_{p} = p^{-n} decreases exponentially. How about q-adic norms for a different prime q \neq p?

To obtain the real numbers \mathbb{R} from \mathbb{Q}, we complete \mathbb{Q} with respect to the usual norm |\cdot |_{\infty}. Similarly, we can complete \mathbb{Q} with respect to a p-adic norm. We call the resulting numbers p-adic numbers and denote the set by \mathbb{Q}_{p}, which actually forms a field. Equivalently, p-adic numbers can be thought as a field of Laurent series in the variable p, i.e.

\mathbb{Q}_{p} = \left\{ \sum_{n = N}^{\infty} a_{n}p^{n}\,:\, a_{n} \in \{0, 1, \dots, p-1\}, N\in\mathbb{Z}\right\}

where addition and multiplication are done by the usual way with carrying. (Here’s a quick question: why do we not care about, let’s say, 10-adic numbers?)

Why do we need to consider such ‘weird’ fields? We are not going to explain this in detail, but they turn out to be very important in number theory. There’s an important maxim which is important in number theory – Think Globally and Act Locally. In fact, this is the central philosophy of Hensel’s lemma (that we are going to explain in the following post), that tries to tackle global problems by solving local problems first and then ‘glue’ them. More precisely, here’s one important theorem for Diophantine equations.

Theorem 1 (Hasse-Minkowski). A quadratic form f(x_{1}, \dots, x_{n}) = \sum_{i, j} a_{ij}x_{i}x_{j} \in \mathbb{Q}[x_{1}, \dots, x_{n}] has a zero in \mathbb{Q}^{n} if and only if it has zeros in \mathbb{Q}_{p}^{n} and \mathbb{R}^{n} for all prime p.

Proof. See [1].

This is closely related to the adelic formulation, and I hope you to remember that p-adic numbers are extremely important in number theory.

Once you’ve studied undergraduate calculus, you may know that determining convergence of a given infinite series is hard in general. There are several convergence tests (term test, comparison test, ratio test, root test, …) which only give only one of the necessary or sufficient conditions. However, in a non-archimedean words, life is much easier and better – the term test is all you need.

Theorem 2. Let \{a_{n}\} be a sequence in \mathbb{Q}_{p}, or more generally, a complete non-archimedean field K. Then \sum_{n\geq 1} a_{n} converges if and only if \lim_{n\to\infty} |a_{n}| = 0.

Proof. One direction is easier and left as an exercise. It remains to show that the other direction. Let s_{n} = a_{1} + \cdots + a_{n} be a partial sum. Then, for n \geq m, |s_{n} - s_{m}| = |a_{m+1} + \cdots + a_{n}| \leq \max\{|a_{m+1}|, \dots, |a_{n}|\}, and RHS converges to 0 as n, m \to \infty. Hence \{s_{n}\} is a Cauchy sequence and so it converges.

For example, \sum_{n\geq 0} p^{n} = 1 + p + p^{2} + \cdots converges in \mathbb{Q}_{p} since |p^{n}|_{p} = p^{-n} \to 0 as n\to \infty, and one can show that the sum equals \frac{1}{1-p}. (Try it!). Also, the above proposition implies that the series \sum_{n\geq 1} n! converges in \mathbb{Q}_{p} for any p, since |n!|_{p} converges to 0 as n\to \infty. However, we don’t know whether the limit is rational or not. (Note that \mathbb{Q} is naturally a subfield of \mathbb{Q}_{p}.)

Open problem: It is not known whether \sum_{n\geq 1} n! is p-adic irrational or not.

Here’s an easier version: find the value of the infinite sum \sum_{n\geq 1} n\cdot n!.

How about the geometry of \mathbb{Q}_{p}? How much is \mathbb{Q}_{p} different from \mathbb{R} geometrically? More generally, how otherworldly could a non-archimedean world be? Here’s a simple but interesting fact about triangles.

Lemma 1. Let r, s\in \mathbb{Q} with |r|_{p} \neq |s|_{p}. Then |r + s|_{p} = \max\{|r|_{p}, |s|_{p}\}.

Proof. Without loss of generality, assume |r|_{p} > |s|_{p}. Then

|r|_{p} = |(r+s) - s|_{p} \leq \max\{|r+s|_{p}, |s|_{p}\} = |r+s|_{p}

for, otherwise, |r|_{p} \leq |s|_{p}. Also, we have

|r+s|_{p} \leq \max\{|r|_{p}, |s|_{p}\} \leq |r|_{p}

so we get |r+s|_{p} = |r|_{p} = \max\{|r|_{p}, |s|_{p}\}.

Proposition 1. Any triangle in \mathbb{Q}_{p} is isosceles.

Proof. Consider three points x, y, z \in \mathbb{Q}_{p}. If x, y, z form an equilateral triangle, then there’s nothing to prove. Otherwise, we may assume that |x - y|_{p} \neq |y - z|_{p}. Then, by the above lemma with x - z = (x-y) + (y-z), we have |x - z|_{p} = \max\{|x - y|_{p}, |y - z|_{p}\}, which proves the claim.

Here’s another interesting geometric property of \mathbb{Q}_{p}.

Proposition 2. For any open ball D(a, r) = \{ x\in \mathbb{Q}_p \,:\, |x - a|_p < r\} in \mathbb{Q}_{p}, any point in the disc is a center. In other words, D(a, r) = D(a', r) for any a' \in D(a, r).

Proof. Let a' \in D(a, r), so that |a - a'|_p < r. For any y \in D(a, r), we have

|y - a'|_p \leq \max\{|y -a|_p, |a - a'|_p\} < r

so D(a, r)\subseteq D(a', r). Similarly, we have D(a', r) \subseteq D(a, r), so two balls are identical.

Corollary 1. Any two open balls in an ultrametric space are either disjoint or identical.

One may wonder if there’s any other interesting norm (absolute value) on \mathbb{Q} that is different from the usual norm and p-adic norms. Unfortunately(?), these are the only absolute values on \mathbb{Q}.

Theorem 3 (Ostrowski). Every nontrivial absolute value on \mathbb{Q} is equivalent to either the usual real absolute value or a p-adic absolute.

Proof. See [1].

Note that we call two absolute values |\cdot |_{1}, |\cdot |_{2} equivalent if |\cdot |_{1} = |\cdot|_{2}^{c} for some real number c >0.

For the next post, we are going to study algebraic definition of p-adic numbers and a way to compute zeros of polynomials over \mathbb{Q}_p via Hensel’s lemma.


[1] J. Neukirch, Algebraic Number Theory, Vol. 322. Springer Science & Business Media, 2013.

A Lemma for Proving the Quadratic Reciprocity Law

We discuss below one particular lemma that connects permutations with modular arithmetic and then leads to a proof of quadratic reciprocity law. There are 246 and counting “different” proofs of the law.

Definitions And Notations : Let S_n denote the group of permutations of \{1,2,\ldots,n\}. For a permutation \pi\in S_n an inversion is a pair (i,j)\in\{1,2,\ldots,n\}^2 such that i>j and \pi(i)<\pi(j). Let n(\pi) denote the number of such inversions in \pi. Sign of a permutation is defined as following


A technique for computing \mathrm{sgn}(\pi):
If a pair (i,j) is an inversion, then \frac{\pi(i)-\pi(j)}{i-j} is negative. Since \pi just permutes 1,2,\ldots,n we get

\left|\prod_{1\leq j<i\leq n}(\pi(i)-\pi(j))\right|=\left|\prod_{1\leq j<i\leq n}(i-j)\right|


\prod_{1\leq j<i\leq n}\frac{\pi(i)-\pi(j)}{i-j}=\pm1

Now each inversion in \pi contributes a (-1) factor in the product \prod_{1\leq j<i\leq n}\frac{\pi(i)-\pi(j)}{i-j}. Hence we get

\prod_{1\leq j<i\leq n}\frac{\pi(i)-\pi(j)}{i-j}=(-1)^{n(\pi)}=\mathrm{sgn}(\pi)

Zolotarev’s Lemma:
Let p be an odd prime and a\in\mathbb{Z} be such that \gcd(a,p)=1. Then the map \pi_a:\mathbb{Z}/p\mathbb{Z}\longrightarrow\mathbb{Z}/p\mathbb{Z} which sends x\in\mathbb{Z}/p\mathbb{Z} to ax\pmod{p}, is a permutation of the numbers \{0,1,2,\ldots,p-1\}. Then


where \left(\frac{\cdot}{p}\right) is the Legendre’s symbol

Proof: According to the computation technique stated above and Fermat’s little theorem, which states that x^p\equiv x\pmod{p}\; \text{ for all }\;x\in\mathbb{Z}, we have

\mathrm{sgn}(\pi_a) =\prod_{0\leq j<i\leq p-1}\frac{\pi_a(i)-\pi_a(j)}{i-j}=\prod_{0\leq j<i\leq p-1}\frac{ai-aj}{i-j}=a^{\frac{p(p-1)}{2}}\equiv a^{\frac{p-1}{2}}\pmod{p}.

By Euler’s criterion we have

\left(\frac{a}{p}\right)\equiv a^{\frac{p-1}{2}}\pmod{p}

Therefore we get \mathrm{sgn}(\pi_a)\equiv\left(a/p\right)\pmod{p}. Since \mathrm{sgn}(\pi_a)-\left(a/p\right) can be only -2,-1,0,1,2 and the odd prime p\mid\mathrm{sgn}(\pi_a)-\left(a/p\right), we have


Hence, the lemma follows.


[1] Burton, David M. Elementary Number Theory (2010, McGraw-Hill Education)

Modular Forms I

I – Motivation and overview

The subject of modular forms is vast and has applications to number theory, algebraic geometry, combinatorics, and many other areas of mathematics and physics.

Modular forms are traditionally viewed as holomorphic functions defined on the upper half plane with remarkable transformation properties. You may wonder how these holomorphic functions have anything to do with number theory. One interesting connection is that certain cubic equations called elliptic curves arise from modular forms.

An elliptic curve can be viewed as a plane algebraic curve E over a field K defined by an equation of the form E: y^2 = x^3 + ax + b. Usually K is chosen to be an algebraically closed field, such as the algebraic closure of \mathbb{Q} or \mathbb{C}. If we let K = \mathbb{Q}, then the work required for the proof of Fermat’s Last Theorem by Wiles showed that a certain class of elliptic curves over \mathbb{Q}, called semi-stable elliptic curves, arise from modular forms. This program was carried on to prove the full modularity theorem by Breuil, Conrad, Diamond, and Taylor who showed that all elliptic curves over \mathbb{Q} are modular so they arise from modular forms.

The picture below is the elliptic curve y^2 = x^3 - x + 1 plotted in Sage.

There are a few lenses through which one can view the phrase “arise from modular forms.” One of the most concrete ways is to reinterpret the equation y^2 = x^3 + ax + b as coming from a differential equation satisfied by the so-called Weierstrass \wp-function, \wp(z). Using the meromorphic function, \wp(z), one can show [\wp'(z)]^2 = 4[\wp(z)]^2 - g_2\wp(z)- g_3 and the coefficients g_2 and g_3 can be viewed as values of Eisenstein series which are important examples of modular forms. Therefore, the coefficients are values of modular forms.

A second way of viewing this statement is via L-functions. An L-function is a complex-valued function called a Dirichlet series of the form

L(s) = \sum \limits_{n = 1}^{\infty} \frac{a(n)}{n^s}.

Number Theorists have observed that most L-functions have many interesting attributes, such as an Euler product, a functional equation, and an analytic continuation of the domain of definition to the whole complex plane. The most famous example of an L-function is the Riemann Zeta Function \zeta(s) = \sum_{n = 1}^{\infty} \frac{1}{n^s}. The phrase “elliptic curves over \mathbb{Q} arise from modular forms or are modular” means that the L-function coming from an elliptic curve E and a corresponding a modular form f are the same. Therefore, L(s,E) = L(s, f).

A third way to view this statement is through Galois Representations. These are representations \rho: Gal(\Bar{\mathbb{Q}}/\mathbb{Q}) \rightarrow GL_{n}(K) where K is usually a number field. In general, a Galois representation is a representation of any Galois group. Similar to L-functions, the Galois Representations arising from modular forms are isomorphic as modules to those from elliptic curves, so that if \ell is a prime satisfying some technical conditions, then \rho_{f, \ell} \sim \rho_{E, \ell}.

II – The Modular Group and its Transformations

Before we jump into the details of modular forms, let us mention the famous quote of Barry Mazur.

Modular forms are functions on the complex plane that are inordinately symmetric. They satisfy so many internal symmetries that their mere existence seem like accidents. But they do exist. – Barry Mazur

Let us now explore these surprisingly symmetric functions. Modular forms rely on the transformations of the familiar group

SL_{2}(\mathbb{Z}) = \Bigg \langle \begin{bmatrix} a & b \\ c & d \end{bmatrix} \Bigg | \,  a, b, c, d \in \mathbb{Z},  ad-bc = 1 \Bigg \rangle,

called the Modular Group. These matrices in SL_{2}(\mathbb{Z}) can be viewed as automorphisms of the Riemann sphere \Hat{\mathbb{C}} = \mathbb{C} \cup {\infty} by the fractional linear transformation on a complex variable \tau \in \Hat{\mathbb{C}} defined as

\begin{bmatrix} a & b \\ c & d \end{bmatrix} (\tau) = \frac{a\tau \, + \,b}{c\tau \,+ \, d}

Then if c \neq 0 and \tau approaches -\frac{d}{c} then we can think of -\frac{d}{c} mapping to \infty since as \tau \mapsto -\frac{d}{c} then the denominator approaches 0 so the fraction grows rapidly to \infty. Furthermore, taking the limit \tau \mapsto \infty of \frac{a\tau + b}{c\tau + d} shows that \infty maps to \frac{a}{c} in the traditional sense of a limit. Also, if c = 0 then \infty \mapsto \infty. Now that we have this transformation, we can plug in different matrices in SL_{2}(\mathbb{Z}) to reveal its symmetries. Let \gamma denote a matrix in SL_{2}(\mathbb{Z}).

You can check that if \gamma = I or -I, then we obtain \tau \mapsto \tau, as expected. Actually, in general, if you have two matrices \gamma and -\gamma, they will give the same transformation \tau \mapsto f(\tau). The important symmetries of this transformation are revealed when we focus on the generators of the modular group. The standard generators of SL_{2}(\mathbb{Z}) are the matrices

\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \hspace{5mm} and \hspace{5mm} \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}

Now let’s plug these matrices into the transformation. We obtain

\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} (\tau) = \tau + 1


\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}(\tau) = -\frac{1}{\tau}

giving rise to a translation action and an inversion. These symmetries are very interesting and turn out to be a central feature of modular forms. Above we considered a complex variable \tau \in \Hat{\mathbb{C}}. However, modular forms have the upper half plane \mathcal{H} as their domain, where \mathcal{H} = \{\tau \in \mathbb{C} \, | \, Im(\tau) > 0 \}.

III – Weak Modularity and Modular Forms

We will first define a weakly modular function. Before doing this let’s review some of complex analysis. A function f defined on \mathbb{C} is called holomorphic if for every point z \in \mathbb{C} there is an open neighborhood U_{z} of z where f is complex differentiable. Another important type of function is a meromorphic function which is slightly weaker than a holomorphic function. A function g defined on \mathbb{C} is called meromorphic if for every z \in \mathbb{C},  g is complex differentiable at all but finitely points in every open neighborhood U_{z} of z.

Now let’s define a weakly modular function.

Definition: Let k \in \mathbb{Z} and let f: \mathcal{H} \rightarrow \mathbb{C} be a meromorphic function. Then we say f is a weakly modular function of weight k if

f(\gamma(\tau)) = (c\tau + d)^{k} f(\tau)

where \gamma  = \begin{bmatrix} a & b \\ c & d \end{bmatrix} and \tau \in \mathcal{H}.

The term (c\tau + d) is called the factor of automorphy which is assumed to be nonzero. This factor essentially measures how far the function f(\tau) is from SL_{2}(\mathbb{Z}) invariance. This depends on the weight k so if k is large then f(\tau) varies from being SL_{2}(\mathbb{Z}) invariant by a larger factor. Given this definition we can now generalize our earlier findings that \tau \mapsto \tau + 1 and \tau \mapsto -\frac{1}{\tau} under the two generators of SL_{2}(\mathbb{Z})

\gamma_{1} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} and \gamma_{2} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}.

I claim that if f(\tau) is a weakly modular function of weight k then we have the following identities:

(I) f(\tau + 1) = f(\tau)

(II) f(\frac{-1}{\tau}) = \tau^{k}f(\tau)

These identities follow from plugging in the two generators \gamma_{1} and \gamma_{2} of SL_{2}(\mathbb{Z}) into the weakly modular definition. First, we have f(\gamma_{1}(\tau)) = f(\tau + 1) = ((0 * \tau) + 1)^{k} f(\tau) = f(\tau). The second identity is given by f(\gamma_{2}(\tau)) = f(\frac{-1}{\tau}) = (\tau + 0)^{k}f(\tau) = \tau^{k}f(\tau).

If k = 0, we obtain SL_{2}(\mathbb{Z}) invariance and the case when k = 2 is used in the proof of the modularity theorem. The first identity shows that modular forms are \mathbb{Z}-periodic functions. We will soon see that classical modular forms are weakly modular forms with a bit more structure. The main difference between weakly modular forms and modular forms is that modular forms are required to also be holomorphic at \infty. These modular forms are called holomorphic or classical modular forms. However, there are functions that are neither holomorphic nor weakly modular but exhibit symmetries similar to those of classical modular forms. The large umbrella term for classical modular forms and these other functions is automorphic forms. Let us investigate what holomorphic at \infty means.

There are two standard approaches for understanding the holomorphic at \infty condition. The first approach is the local approach. Let \tau \in \mathcal{H}. First, change coordinates from \tau to q where q = e^{2 \pi i \tau} such that q is in the punctured disk and define the function g(q) = f(log(q)/2 \pi i). Then g(q) is holomorphic on the punctured disk since f(z) and log(q) are holomorphic there. Then we can view the point at \infty as the origin of the punctured disk D' = D - (0,0). Then f(\tau) is holomorphic at \infty means that the function g(q) extends holomorphically to the origin. The second interpretation looks at the Fourier expansion of a modular form and this approach is more global. We will impose a growth condition on f(\tau). As before, if q = e^{2 \pi i \tau}, then |q| = e^{-2\pi Im(\tau)} so q \rightarrow 0 if and only if Im(\tau) \rightarrow \infty. Now, since f(\tau) is holomorphic on the complex punctured unit disk, then it possesses a Fourier expansion f(\tau) = \sum_{n = 0}^{\infty} a_{n}(f) q^{n}. Then since q \rightarrow 0 is equivalent to Im(\tau) \rightarrow \infty, we don’t have to compute the Fourier expansion of f. Instead, we must show that \lim_{Im(\tau)\to\infty} f(\tau) exists or that f(\tau) is bounded as Im(\tau) \rightarrow \infty. This discussion leads us to the definition of a holomorphic modular form.

Definition: Let k \in \mathbb{Z}. Then a function f: \mathcal{H} \rightarrow \mathbb{C} is a modular form of weight k if

(I) f is holomorphic on \mathcal{H}

(II) f is weakly modular of weight k

(III) f is holomorphic at \infty

Now that we have defined modular forms of weight k, it is natural to ask what structure the space of modular forms possesses. The set of modular forms of a fixed weight k, \mathcal{M}_{k}(SL_{2}(\mathbb{Z})), forms a vector space over \mathbb{C} with the usual function addition. Two important properties of each vector space are that each vector space is finite-dimensional because of the holomorphic at \infty condition, this requires some work, and that given two modular forms f and g of weights k and \ell, respectively, the product fg has weight k + \ell. Also, there are no modular forms of negative weight so we only have to look at weight zero and above. Then to get the vector space of all modular forms of each weight k \geq 0, we take the direct sum of these complex vector spaces to obtain

\mathcal{M}(SL_{2}(\mathbb{Z})) = \bigoplus \limits_{k \geq 0} \mathcal{M}_{k}(SL_{2}(\mathbb{Z}))

which is a graded ring.

In the next post I will give explicit examples of modular forms and discuss congruence subgroups.


[1] D. Bump, Automorphic Forms and Representations, Cambridge Studies in Advanced Mathematics, 1998

[2] F. Diamond and J. Shurman, A First Course in Modular Forms, Springer GTM, 2005

[3] D. Goldfeld, Automorphic forms and L-functions for the group GL (n, R), Cambridge University Press, 2006

[4] D. Zagier, The 1-2-3 of Modular Forms: Lectures at a Summer School in Nordfjordeid, Norway, Springer, 2008

Unramified Extensions by Examples

In algebraic number theory, the most important topic (at least for me) is about ramification of primes in number fields, which are finite extensions of the field of rationals \mathbb{Q}. It is known that every ring of integers \mathcal{O}_{K} of a number field K is a Dedekind domain, an integral domain where every ideal uniquely factors into product of prime ideals. It is important to decide factorization of a given prime ideal \mathfrak{p} of  \mathcal{O}_{K} in a larger number field L /K. We say the prime ideal \mathfrak{p} \subset \mathcal{O}_{K} ramifies in L if its factorization contains duplicates, i.e. \mathfrak{p}\mathcal{O}_{L} = \mathfrak{q}_{1}^{e_{1}} \cdots \mathfrak{q}_{k}^{e_{k}} where e_i > 1 for some i. It is known that, for a given extension of number fields L/K, only finitely many prime ideals of \mathcal{O}_K ramify in L. Such prime ideals are exactly the factors of the special ideal \Delta_{L/K} \subseteq\mathcal{O}_{K} called the relative discriminant of L/K. We usually denote \Delta_{K} for \Delta_{K/\mathbb{Q}}, which is actually generated by a single integer in this case. The integer is also denoted by \Delta_{K}.

In this post, we show that there’s no unramified extension of {\mathbb{Q}(\sqrt{3})}. Before we start, let’s recall the easier case – {\mathbb{Q}}. If {K/\mathbb{Q}} is a finite extension, then Minkowski’s bound tells us that for any ideal class {A\in \mathrm{Cl}_{K}}, there’s a nonzero integral ideal {\mathfrak{a}\subseteq \mathcal{O}_{K}} in {A} such that

\displaystyle [\mathcal{O}_{K}:\mathfrak{a}] = \mathcal{N}(\mathfrak{a}) \leq \frac{n!}{n^{n}} \left( \frac{4}{\pi}\right)^{s} \sqrt{|\Delta_{K}|}

where {n = [K:\mathbb{Q}]}, {s} is the number of (pairs of) complex places, and {\Delta_K} is the discriminant of {K}. We know that the prime {p\in \mathbb{Q}} ramifies in {K} if and only if {p|\Delta_K}. Since {\mathcal{N}(\mathfrak{a})\geq 1}, we have

\displaystyle \sqrt{|\Delta_{K}|} \geq \frac{n^{n}}{n!}\left(\frac{\pi}{4}\right)^{s} \geq \frac{n^{n}}{n!} \left(\frac{\pi}{4}\right)^{n/2}

If we define RHS as {a_{n}}, then

\displaystyle \frac{a_{n+1}}{a_{n}} = \left( \frac{\pi}{4}\right)^{1/2}\left( 1 + \frac{1}{n}\right)^{n} \geq 2\sqrt{\frac{\pi}{4}} = \sqrt{\pi} > 1

and {a_{2} = \frac{\pi}{2} > 1}, so {a_{n} >1} for all {n\geq 1}. This implies that for any nontrivial extension {K} of {\mathbb{Q}}, {|\Delta_{K}|>1} so there exists a prime {p\in \mathbb{Q}} that divides {\Delta_{K}}, so that ramifies in {K}.
To obtain the similar result for {\mathbb{Q}(\sqrt{3})} or any other number fields, we may need (global) class field theory. According to class field theory, for any number field {K}, there exists the Hilbert class field {H_K}, which is a maximal unramified finite abelian extension of {K} and {\mathrm{Gal}(H_K/K) \simeq \mathrm{Cl}_{K}} canonically (via Artin’s reciprocity map). So if {K} has class number 1 (i.e. {\mathcal{O}_{K}} is a PID), then there’s no nontrivial unramified abelian extension of {K}. (Here unramifiedness includes archimedean places. For example, {L/\mathbb{Q}(\sqrt{3})} is unramified at real places if and only if {L} is a totally real field.)

But how do we show that there are no unramified extensions including non-abelian ones? For a number field extension {M/L/K}, the relative discriminants satisfy the relation

\displaystyle \Delta_{M/K} = \mathcal{N}_{L/K}(\Delta_{M/L}) \Delta_{L/K}^{[M:L]}

where {\mathcal{N}_{L/K}:I_{L}\rightarrow I_{K}} is the ideal norm map. Now assume that there’s a nontrivial unramified extension {K} of {\mathbb{Q}(\sqrt{3})}. By applying the above relation, we get

\displaystyle \Delta_{K/\mathbb{Q}} = \Delta_{\mathbb{Q}(\sqrt{3})/\mathbb{Q}}^{[K:\mathbb{Q}{\sqrt{3}}]} = 12^{n}

where {n = [K:\mathbb{Q}(\sqrt{3})]}. (We have {\Delta_{K/\mathbb{Q}(\sqrt{3})} = 1} since {K/\mathbb{Q}(\sqrt{3})} is unramified.) Now Minkowski’s bound gives

\displaystyle \frac{(2n)!}{(2n)^{2n}} \cdot 12^{n/2} \geq 1.

(We have {s = 0} since we are assuming that archimedean places also unramify.) We can show that this inequality fails for big {n}. In fact, if we put LHS as {b_{n}} then

\displaystyle \frac{b_{n+1}}{b_{n}} = \left( 1 + \frac{1}{n}\right)^{-2n} \frac{2n+1}{2n+2} \sqrt{12} \leq \frac{\sqrt{12}}{4} \leq 1

for any {n\geq 1}, and {b_{3} < 1}. So we get {n \leq 2}, and we already know that there’s no degree 2 unramified extension of {\mathbb{Q}(\sqrt{3})} because every degree 2 extension is abelian!

What if we allow infinite places to be ramified? Then there’s such an extension. We will show that {K = \mathbb{Q}(\sqrt{3}, \sqrt{-1})} is one such extension. First, since {\Delta_{\mathbb{Q}(\sqrt{-1})} = -4}, the only prime {p\in \mathbb{Q}} that ramifies in {\mathbb{Q}(\sqrt{-1})} is 2. So if {p\neq 2}, then {p} is unramified in {\mathbb{Q}(\sqrt{-1})}, and this implies that any prime {\mathfrak{p}|p} in {\mathbb{Q}(\sqrt{3})} is unramified in {K = \mathbb{Q}(\sqrt{3}, \sqrt{-1})}. For {p = 2}, assume that the prime {\mathfrak{p}} lying over {2} ramifies in {K}. Then the ramification degree of {2} in {K} is {4} since {2} also ramifies in {\mathbb{Q}(\sqrt{3})}. However, this is impossible since {2} does not ramify in the subfield {\mathbb{Q}(\sqrt{-3}) = \mathbb{Q}(\zeta_{3})}, which has a discriminant {\Delta_{\mathbb{Q}(\sqrt{-3})} = -3}. Hence any finite prime in {\mathbb{Q}(\sqrt{3})} is unramified in {K}. But the infinite place ramifies in {K} since {K} has a complex place ({K} is not a totally real field).

We may ask if there is a number field which has class number 1 but has an unramified extension. Surprisingly (maybe not), this is true, and even there’s a number field with class number 1 which has an infinite unramified extension! In [1], Yamamura constructed infinitely many real quadratic fields with unramified extensions of Galois group A_{5}, and he showed that \mathbb{Q}(\sqrt{36497}) is such field with class number one. (This is a subfield of the splitting field of the polynomial f(x) = x^{5} - 2x^{4} - 3x^{3} + 5x^{2} + x - 1, which has a Galois group S_{5} over {\mathbb{Q}}.) Also, in [2], Brink extended his result and proved that  \mathbb{Q}(\sqrt{36497}, \sqrt{2819\cdot 103}) has class number one and has an infinite unramified extension! His discovery is based on the following theorem:

Theorem (Brink): Assume first that f\in \mathbb{Z}[x] is an irreducible polynomial of degree five with only real roots and whose discriminant l is a prime such that \mathbb{Q}(\sqrt{l}) has class number one. Assume further that q_{1} and q_{2} are primes such that \mathbb{Q}(\sqrt{q_{1}q_{2}}) has class number one and \mathbb{Q}(\sqrt{lq_{1}q_{2}}) has class number two. Assume finally that f has five simple roots modulo q_{1}, and that the tuple \mu of the degrees of the irreducible factors of f modulo q_{2} is (1, 1, 1, 1, 1), (1, 1, 1, 2), (1, 2, 2) or (1, 1, 3). Then the field k = \mathbb{Q}(\sqrt{l}, \sqrt{q_{1}q_{2}}) has class number one and an infinite unramified extension.

Based on the theorem, with the aid of the PARI software, he found two such real biquadratic fields with prime discriminant l < 100000, including the above example. He also noted that one can prove similar results for the existence of such real quadratic fields, by finding a degree 20 polynomial that has 18 simple roots modulo some prime, which seems to occur very rarely.


[1] K. Yamamura, On Unramified Galois Extensions of Real Quadratic Number Fields,  Osaka Journal of Mathematics 23, no. 2 (1986): 471–478

[2] D. Brink, Remark on Infinite Unramified Extensions of Number Fields with Class Number One, Journal of Number Theory 130, no. 2 (February 1, 2010): 304–6

[3] Alex J Best, Answer to the question “Unramified nonabelian extension of number field with class number 1” by Seewoo Lee, Mathematics StackExchange,

Minkowski’s Lattice Point Theorem

We present a proof of a famous theorem of Hermann Minkowski which spawned a beautiful branch of number theory known as the Geometry of Numbers. This theorem gives a condition on the volume of a centrally symmetric convex body in n-dimensional Euclidean space to contain at least one lattice point except the trivial point \Vec{0}.

Definitions and Notations

Definition 1 In an n-dimensional vector space V over the field \mathbb{R}, a lattice is a free abelian subgroup of V of the following form


where v_1,v_2,\ldots,v_m are linearly independent vectors in V. Then \Gamma is a free abelian group of rank m. A lattice is said to be complete if m=n. Assuming \Gamma to be complete, we define the fundamental mesh in \Gamma with respect the the v_i to be

\displaystyle \Phi=\{x_1v_1+x_2v_2+\cdots+x_nv_n:x_i\in\mathbb{R},x_i\in[0,1),1\leq i\leq n\}

The volume of the lattice \Gamma is defined to be the volume of the fundamental mesh \Phi denoted as \mathrm{Vol}(\Phi), i.e. \mathrm{Vol}(\Gamma)=\mathrm{Vol}(\Phi). There are a few equivalent expressions for this volume. One expression convenient for computations is given by the following relation

\mathrm{Vol}(\Phi)=\sqrt{\det[\langle v_i,v_j\rangle]}

where \langle \cdot,\cdot\rangle is the usual inner-product (or a symmetric positive definite bilinear form) defined on the vector space V.

Definition 2 (Convex and Centrally Symmetric Body)

A subset X of V is said to be centrally symmetric if for any point x\in X we have -x\in X, and the subset is convex if for any two distinct points x,y in X the line segment \{tx+(1-t)y:t\in[0,1]\} is contained in X. For example, a ball is convex in \mathbb{R}^3 but a solid torus is not.

Minkowski’s Lattice Point Theorem

Theorem: Let V be an n-dimensional Euclidean vector space (\mathbb{R}-vector space) and \Gamma be a complete lattice of V. Let X be a centrally symmetric convex subset of V such that


Then X contains at least one point other than \vec{0} in \Gamma.

Proof: Consider the dilation \frac{1}{2}X=\left\{\frac{1}{2}x:x\in X\right\}. For any \gamma\in \Gamma, consider the translated sets \frac{1}{2}X+\gamma. We will show that there exist \gamma_1,\gamma_2 \in \Gamma such that \gamma_1\neq\gamma_2 and


For the sake of contradiction, assume the sets \frac{1}{2}X+\gamma,\gamma\in\Gamma are pairwise disjoint. Then the intersections \Phi\cap\left(\frac{1}{2}X+\gamma\right),\gamma\in\Gamma are also disjoint for a fundamental mesh \Phi. This gives us the following inequality involving volumes

\mathrm{Vol}(\Phi)\geq\sum_{\gamma\in \Gamma}\mathrm{Vol}\left(\Phi\cap(\frac{1}{2}X+\gamma)\right)

We know that translation preserves volumes, so we consider the following translations


Hence for each \gamma\in \Gamma, we have


We claim that the \Phi-\gamma cover V as \gamma varies over \Gamma. Let x\in V. Since \{v_1,v_2,\ldots,v_n\} is a set of n linearly independent vectors in V and \mathrm{dim}(V)=n, it follows that \{v_1,v_2,\ldots,v_n\} is a basis of V. Then \exists \lambda_1,\lambda_2,\ldots,\lambda_n\in\mathbb{R} such that x=\sum_{i=1}^{n}\lambda_iv_i. We know that every real number r can be written as r=\lfloor r\rfloor+\{r\} where \lfloor r\rfloor\in\mathbb{Z} and \{r\}\in[0,1). Then

x=\sum_{i=1}^{n}\lfloor\lambda_i\rfloor v_i+\sum_{i=1}^{n}\{\lambda_i\}v_i

Note that \sum_{i=1}^{n}\lfloor\lambda_i\rfloor v_i\in\Gamma and \sum_{i=1}^{n}{\lambda_i}v_i\in\Phi. Taking \gamma'=-\sum_{i=1}^{n}\lfloor\lambda_i\rfloor v_i, we observe that x\in\Phi-\gamma'. Hence, V\subset\bigcup_{\gamma\in\Gamma}(\Phi-\gamma). Since (\Phi-\gamma)\subset V for all \gamma\in\Gamma, we have \bigcup_{\gamma\in\Gamma}(\Phi-\gamma)\subset V. Hence \bigcup_{\gamma\in\Gamma}(\Phi-\gamma)=V, and so the sets (\Phi-\gamma)\cap\frac{1}{2}X cover \frac{1}{2}X. Therefore, we finally have


which is a contradiction to our initial assumption since \mathrm{Vol}(\Gamma)=\mathrm{Vol}(\Phi). Hence, we can choose \gamma_1,\gamma_2\in\Gamma, \gamma_1\neq\gamma_2 such that


Therefore, there exist x_1,x_2\in X, x_1 \neq x_2, such that


Since X is centrally symmetric and convex, -x_2\in X and thus \gamma_0=\gamma_1-\gamma_2=\frac{1}{2}x_1-\frac{1}{2}x_2\in X Therefore \gamma_0\neq\Vec{0} and \gamma_0\in\Gamma\cap X. This completes the proof of the theorem.


Since \gamma_0\neq\Vec{0} we also have -\gamma_0\in\Gamma\cap X and -\gamma_0\neq\gamma_0. This means X contains at least two distinct lattice points. What else can you find hidden in the proof?

A Useful Corollary

Taking V=\mathbb{R}^n and \Gamma=\mathbb{Z}^n, we observe that any convex centrally symmetric body in \mathbb{R}^n of volume strictly bigger that 2^n contains at least one point with integer coordinates other than \Vec{0}.

Minkowski’s Theorem in Action

Now we present a number theoretic application of Minkowski’s theorem. The following result was proved by Axel Thue using the pigeonhole principle. We give a proof using Minkowski’s lattice point theorem.

Theorem: Primes of the form 4k+1 can be expressed as a sum of two squares.

Proof. Let p be a prime of the form 4k+1. Then -1 is a quadratic residue modulo p or, equivalently, there exists a\in\mathbb{Z} such that a^2+1\equiv0\pmod{p}. Consider the two vectors v_1=(p,0) and v_2=(a,1) in \mathbb{R}^2. Let \alpha v_1+\beta v_2=(0,0) for some \alpha,\beta\in\mathbb{R}. This gives us p\alpha+a\beta=0,\beta=0 and hence \alpha=\beta=0. Therefore v_1,v_2 are linearly independent. Then \Gamma=v_1\mathbb{Z}+v_2\mathbb{Z} is a complete lattice in \mathbb{R}^2 with \mathrm{Vol}(\Gamma)=p.

Let (x,y)\in\Gamma. There exist A,B\in\mathbb{Z} such that (x,y)=Av_1+Bv_2. This implies x=Ap+Ba,y=B. Hence,

x^2+y^2=(Ap)^2+2ABpa+(Ba)^2+B^2\equiv B^2(a^2+1)\equiv0\pmod{p}

Consider the open disc D of radius \sqrt{2p} centered at the origin (0,0). We have, \mathrm{Vol}(D)=\mathrm{area}(D)=\pi(\sqrt{2p})^2=2\pi p>4p=2^2\mathrm{Vol}(\Gamma)

Thus, D is convex and centrally symmetric. By Minkowski’s theorem, there exists a lattice point apart from the origin in D. Let this point be (m,n). Then 0<m^2+n^2<(\sqrt{2p})^2=2p and p\mid (m^2+n^2) and hence m^2+n^2=p. We are done!


[1] Andreescu, T. and Dospinescu, G., 2008. Problems from the Book.

[2] Neukirch, J., 2013. Algebraic number theory (Vol. 322). Springer Science & Business Media.

[3] Cassels, J.W.S., 2012. An introduction to the geometry of numbers. Springer Science & Business Media.

Welcome to the Blog!

This blog is meant for students and faculty to learn or present interesting ideas in number theory and algebraic geometry. There will be two main types of posts, introductory learning posts, and expository research posts. The introductory learning posts will focus on developing a beginner’s understanding of a certain subject or area. Then the expository research posts are meant to expose people to modern mathematics and to allow readers to get a glimpse of new problems and ideas.

If you are a student or a professor who has done interesting reading, an REU, or a research project in number theory or algebraic geometry, we encourage you to post on this blog. If you want to post, please fill out the following form

Create your website at
Get started