How to Find the Nth Prime Number

May 23, 2020


It seemed easier in my mind when I first thought about it. But, finding a random prime number fast isn't trivial. For example, the 5th prime number is 11, the 20th is 71. But, what about the 50th, 600th, or 20,000th prime number? I want to show you an efficient way to find the Nth prime number. We'll get help from an old algorithm known as the Sieve of Eratosthenes.

A prime number is an integer greater than 1, that is exactly divisible by 1 and itself. All non-prime integers are called composites. The Fundamental Theorem of Arithmetic says that every integer greater than 1, is either a prime or can be represented as the product of primes.

An easy way to determine if a number is prime is by trial division: divide the number n by all the integers less than n, and if no exact divisors–other than 1–are found, then n is prime. You can see how this becomes time-consuming as the value of n increases. Is 50,275 prime?

There are better ways to generate prime numbers. These methods generate lists of primes up to a given limit. An improvement over trial division is wheel factorization for example. Then there's a set of techniques called Sieves that include the Sieve of Atkin and the Sieve of Sundaram. But, the most famous of these is the Sieve of Eratosthenes. Probably because it has been around for a long time–a Greek mathematician called Eratosthenes of Cyrene created this algorithm at around 200 BC.


What is the Sieve of Eratosthenes and How Does it Work?

The Sieve of Eratosthenes is an efficient algorithm to generate prime numbers up to a given limit. To see how it works, let's follow an example. We want to find all prime numbers less than 13.

Initially, we have a list containing all the integers from 2 through 13–by definition 1 is not a prime so we discard it. Then starting from 2, we begin crossing out (sieving) all multiples of 2 that are smaller than 13. Any multiple of a prime number can't be prime–it will have more than two divisors, disqualifying it as a prime number. We eliminate all of these from the list. Then we check the list to find the next available number, which is 3–this is the next prime number. We repeat the process eliminating all multiples of 3. This continues until there are no more available numbers inside the list. Eventually, we will have eliminated all composite numbers and only primes will remain.

Let's illustrate this procedure, we start with the list of integers from 2 through 13:

$$2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13$$

Step 1) Take the first prime $p = 2$ and cross out all its multiples in the list, $2\times 2 = 4$, $2\times 3 = 6$, $2\times 4 = 8$, etc:

$$2, 3, \underline{4}, 5, \underline{6}, 7, \underline{8}, 9, \underline{10}, 11, \underline{12}, 13$$

Step 2) Find the next available number and cross out all its multiples. The next one is $p = 3$, so we cross out its multiples $3\times 2 = 6$, $3\times 3 = 9$, $3\times 4 = 12$ (notice that 6 and 12 had been crossed out already):

$$2, 3, \underline{4}, 5, \underline{6}, 7, \underline{8}, \underline{9}, \underline{10}, 11, \underline{12}, 13$$

Step 3) Continue the process with the next available number, $p = 5$. The only multiple of 5 that is less than 13 is $5\times 2 = 10$, which was crossed out in a previous step. So, we proceed.

The next number in the list is 7, but the first multiple of 7 ($7\times 2 = 14$) is greater than our limit 13. Any number above 7 will produce multiples greater than 13, so we are done. The remaining numbers in the list are all the primes up to 13:

$$2, 3, 5, 7, 11, 13$$

Optimizing the Process

In the previous exercise, we duplicated some work. For example, we crossed out several factors more than once. We can optimize the algorithm by recognizing that, for every prime number $p$, we only need to check the factors of $p$ that are greater than or equal to $p^2$. This is because all factors of $p$ less than $p^2$ are crossed out by the previous smaller primes. This will let us terminate the algorithm as soon as we find a prime $p$ where $p^2$ is greater than the size of our list.

To illustrate this, let's follow the same example as before. But, this time we'll increase the list up to 23. After step 1 we have:

$2, 3, \underline{4}, 5, \underline{6}, 7, \underline{8}, 9, \underline{10}, 11, \underline{12}, 13, \underline{14}, 15,$ $\underline{16}, 17, \underline{18}, 19, \underline{20}, 21, \underline{22}, 23$

notice that all factors of 2 have been crossed out. Using our heuristic, we should only check for factors of 3 that are greater than or equal to $3^2 = 9$. So, we cross out 9, 12, 15, 18, and 21:

$2, 3, \underline{4}, 5, \underline{6}, 7, \underline{8}, \underline{9}, \underline{10}, 11, \underline{12}, 13, \underline{14}, \underline{15},$ $\underline{16}, 17, \underline{18}, 19, \underline{20}, \underline{21}, \underline{22}, 23$

The next prime in the list is 5, but since $5^2 = 25$ is greater than 23, we end the process:

$$2, 3, 5, 7, 11, 13, 17, 19, 23$$

The remaining list contains all primes up to 23.

Notice that we still crossed out some factors more than once. However, we stopped the process at $p = 5$. So, even though we increased the list by 10–from 13 up to 23–we did the same amount of work as in the previous example. This is a small refinement, but it becomes significant as we increase the size of the list.


Using The Sieve of Eratosthenes to Find the Nth Prime Number

Going back to our problem, we now make use of the Sieve of Eratosthenes to find the Nth prime. As I described above, the algorithm finds all primes up to a given limit. However, in this case we don't have a predefined limit. This is because we don't know how big a list we need that will contain our Nth prime. The prime number theorem says that there are approximately $\frac{n}{log(n)}$ primes less than $n$. But, the gaps between one prime and the next one get arbitrarily large.

Due to this limitation, we are going to follow an iterative process. First, we pick an initial size for our list. We know that it has to be at least greater than $n$, our target prime number. For example we choose the size $s = 2n$. Then we generate all primes less than $s$ (using the Sieve of Eratosthenes). Finally, we count how many primes are inside the list. If the quantity of primes is less than $n$, we proceed to the next iteration, increasing the size of the list by $n$. The iterations stop when we find $n$ or more primes. The Nth prime will be the number in the Nth position in the list after removing all the composites.


Implementing the Algorithm in Python

The following python code implements this strategy to find the Nth prime number. It includes the optimized version of the Sieve of Eratosthenes. For convenience, I use a bytearray to represent our list of integers. The index of each element corresponds to the value of the integers in our list. All elements inside the bytearray are initialized to 1.

We "cross out" factors by toggling elements from 1 to 0 at the corresponding index. For example, if we want to cross out factor 4, we toggle the element at index 4. After crossing out all factors, and since we are not deleting them from the list, we need to count all 1's until we find the Nth 1. We return the index of this element, which is our Nth prime number.

#!/usr/bin/env python3

def get_nth_prime(nth):
  """ Returns the n-th prime number
  """
  total_primes = 0
  size_factor = 2
  s = (nth * size_factor)
  while total_primes < nth:
    primes = get_primes(s)
    total_primes = sum(primes[2:])
    size_factor += 1
    s = (nth * size_factor)
  nth_prime = count_primes(primes, nth)
  return nth_prime


def get_primes(s):
  """ Generates primes using the Sieve of Eratosthenes
      Includes the optimization where for every prime p, only factors p >= p^2
      are verified.
      The list of primes is represented with a bytearray. Each index corresponds
      to an integer in the list. A value of "1" at the index location indicates
      the integer is a prime.
  """
  primes = bytearray([1]*s)
  for i in range(2, s):
      if primes[i] == 1:
        for j in range(i, s):
          if i*j < s:
            primes[i*j] = 0
          else:
            break
  return primes


def count_primes(primes, nth):
  """ Returns the n-th prime represented by the index of the n-th "1" in the
      bytearray.
  """
  count = 0
  for k in range(2, len(primes)):
    count += primes[k]
    if count == nth:
      return k


def main():
    # Get the 20,000th prime number
    N_TH = 20000
    nth_prime = get_nth_prime(N_TH)
    print("The {}-th prime is: {}".format(N_TH, nth_prime))


if __name__ == "__main__":
    main()


Computational Complexity

Our code above executes the get_primes() function (the Sieve of Eratosthenes) iteratively until we find $n$ (or more) prime numbers. At each iteration, we increase the size of the bytearray. The total number of iterations is determined by the variable size_factor, which grows with the value of $n$. Finally, we go through the bytearray counting 1's, until we find the Nth 1. Let's analyze the runtimes of each of these three functions separately.

Looking at function get_nth_prime(), the number of iterations is determined by the variable size_factor - 1 (let's call this value $s$). Then the amount of work–using asymptotic notation–is $O(s)$.

Next, the count_primes() function traverses the bytearray counting 1's. It runs in linear time, determined by $n\times s$. The runtime is $O(n\cdot s)$, where $n = Nth$, our target prime number.

The get_primes() function implements the Sieve of Eratosthenes. The runtime of this algorithm is $O(N log log N)$, where N is the size of the list. To arrive at this result, think about what happens to the size of the list after crossing out the factors of each prime. A good approximation is as follows. We start with size N. After crossing out all the factors of the first prime, 2, the size becomes $\frac{N}{2}$. After crossing out the factors of 3, the size will be $\frac{N}{3}$. This continues until the size is reduced to $\frac{N}{N} = 1$. We can express this as: $N(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{5} + ... + \frac{1}{N})$. Notice that the expression inside the parentheses is a Harmonic Prime Series, represented as:

$$\sum_{p\leq N} = \frac{1}{p}$$

where $p$ is a prime and $N$ is the size of our list. It has an asymptotic lower bound of $log(log N)$. Therefore, the runtime–including the factor $N$ outside the parentheses–is $O(N log log N)$.

Combining the runtimes of the three functions and remembering that the size of our bytearray is $n\times s$, we get the total runtime, $O(s)\cdot O(n\cdot s\cdot log log (n\cdot s)) + O(n\cdot s)$. This becomes:

$$O(s^2 \cdot n\cdot log log (n\cdot s))$$

While testing the program, I noticed that $s$ grows slowly when compared to $n$. The value of $s$ increased from 12 (with $n$ = 20,000) to 16 (with $n$ = 1,000,000).

Finally, the memory cost comes from the size of the bytearray, so the space complexity is:

$$O(n\cdot s)$$


How on Earth is This Useful?

Now you might be thinking, how is all this useful? Let me give you some examples. Prime numbers are important in hashing–a fundamental concept in computer science. From data structures and algorithms to encryption, hashing is useful in many areas.

Another example is Cryptography–a crucial technology used every day for secure data communication. Online banking and shopping transactions are examples of secure data communications.

Just as I first thought it was very easy to find a random prime number, at first glance we may not see it, but if we look closely, math is everywhere.