DSA Fundamentals #4: Essential Number Theory for Competitive Programming

This guide explains number theory concepts for modern algorithm design. It connects abstract math to working code. This text is for computer science students, competitive programmers, and software engineers who need a practical understanding of the topic.

Part I: Greatest Common Divisor and Least Common Multiple

This part covers the Greatest Common Divisor (GCD) and Least Common Multiple (LCM). It introduces the Euclidean Algorithm, a very fast tool for computation.

Section 1.1: The Greatest Common Divisor (GCD)

What is the GCD?

The Greatest Common Divisor (GCD) of two integers is the largest positive integer that divides both numbers without a remainder. People also call it the Highest Common Factor (HCF).

For example, the divisors of 24 are 24. The divisors of 60 are 60. The largest number in both lists is 12. So gcd(24, 60) is 12.

A Practical View: The Tiling Problem

You have a rectangular area of size a by b. The gcd(a, b) gives the side length of the largest square tile that can cover this area perfectly. There will be no gaps or overlaps. A 24x60 rectangle can be tiled with 12x12 squares.

You can also find the GCD using prime factors. The Fundamental Theorem of Arithmetic says that every integer greater than 1 is a unique product of prime numbers. The GCD of two numbers, a and b, is the product of their shared prime factors. You raise each prime factor to the lowest power it appears in either number's factorization.

For example:

1386 = 2¹ × 3² × 7¹ × 11¹
3213 = 3³ × 7¹ × 17¹

The shared primes are 3 and 7. The minimum power of 3 is 3². The minimum power of 7 is 7¹.

So gcd(1386, 3213) = 3² × 7¹ = 63.

Finding prime factors of large numbers is very hard. Many security systems depend on this difficulty. So factorization is not a practical way to find the GCD. We need a better method.

A diagram showing a 24x60 rectangle being perfectly tiled by 12x12 square tiles, demonstrating the concept of the Greatest Common Divisor (GCD).

The Euclidean Algorithm: A Fast Method

The Euclidean algorithm is an old and fast way to find the GCD. Let’s look at slower methods first.

Naive Method: A simple way is to start with the smaller number, min(a, b), and count down. The first number that divides both a and b is the GCD. This method is too slow for large inputs. Its time complexity is O(min(a, b)).
Subtraction Method: This method uses the rule gcd(a, b) = gcd(a - b, b), if a is greater than b. You keep subtracting the smaller number from the larger one. When the numbers are equal, you have the GCD. This is also slow if the numbers are far apart, like gcd(10⁹, 1).

The Core Idea

The modern Euclidean algorithm uses one powerful rule: gcd(a, b) = gcd(b, a mod b). The term a mod b means the remainder of a divided by b.

This rule works because any number that divides both a and b must also divide their remainder. This lets us swap a pair of large numbers for a pair of smaller ones. The GCD does not change. This is much faster than subtraction.

Algorithm Example: gcd(500, 150)

Call gcd(500, 150):
- Find the remainder of 500 divided by 150.
- 500 = 3 × 150 + 50. The remainder is 50.
- The problem is now gcd(150, 50).
Call gcd(150, 50):
- Find the remainder of 150 divided by 50.
- 150 = 3 × 50 + 0. The remainder is 0.
- The problem is now gcd(50, 0).
Call gcd(50, 0):
- This is the end condition. gcd(a, 0) is always a.
- The function returns 50.

The final answer is 50.

Implementation in C++ and Python

The algorithm can be coded with recursion or a loop. The loop version often uses less memory.

C++ Code

// Recursive Version
int gcdRecursive(int a, int b) {
    if (b == 0) {
        return a;
    }
    return gcdRecursive(b, a % b);
}

// Iterative Version
int gcdIterative(int a, int b) {
    while (b != 0) {
        int remainder = a % b;
        a = b;
        b = remainder;
    }
    return a;
}

Python Code

# Recursive Version
def gcdRecursive(a, b):
    if b == 0:
        return a
    else:
        return gcdRecursive(b, a % b)

# Iterative Version
def gcdIterative(a, b):
    while b:
        a, b = b, a % b
    return a

Stop Scrolling, Start Achieving: Get Actionable Tech & Productivity Insights.

Join the inner circle receiving proven tactics for mastering technology, amplifying productivity, and finding deep focus. Delivered straight to your inbox – no fluff, just results.

Complexity: The Speed of Euclid's Algorithm

The time complexity of the Euclidean algorithm is O(log(min(a, b))). The number of steps grows very slowly with the size of the numbers. The algorithm finds the GCD of very large numbers very quickly.

The worst case uses consecutive Fibonacci numbers. Lamé's theorem shows the number of steps is never more than five times the number of digits in the smaller number. This logarithmic speed is a great example of a mathematical idea leading to a much faster algorithm.

The Extended Euclidean Algorithm

This version of the algorithm finds more than the GCD. It also finds two integers, x and y, that solve Bézout's Identity: ax + by = gcd(a, b).

The algorithm works backwards from the end of the standard algorithm. It calculates the x and y coefficients at each step. Its main use is finding the modular multiplicative inverse, which is a key tool discussed in Part II.

Section 1.2: The Least Common Multiple (LCM)

What is the LCM?

The Least Common Multiple (LCM) of two integers, a and b, is the smallest positive integer that is a multiple of both a and b.

For example, multiples of 12 are 12, 24, 36, 48, 60, .... Multiples of 18 are 18, 36, 54, 72, .... The smallest number in both lists is 36. So lcm(12, 18) is 36.

The GCD-LCM Connection

A simple and strong relationship connects the GCD and LCM.

The Formula

For any two positive integers a and b:

lcm(a, b) × gcd(a, b) = |a × b|

This formula gives a very fast way to compute the LCM. First, find gcd(a, b) with the Euclidean algorithm. Then rearrange the formula to find the LCM:

lcm(a, b) = (|a × b|) / gcd(a, b)

A Practical Tip: Avoid Overflow

In your code, calculate this as (|a| / gcd(a, b)) * |b|. This stops a possible integer overflow when |a × b| is a very large number. The fast Euclidean algorithm makes computing the LCM fast too.

Part II: Modular Arithmetic

This part explains "clock arithmetic" and gives you tools to solve problems with large numbers and cycles.

Section 2.1: Modular Systems

'Clock Arithmetic': A Simple Start

A 12-hour clock is a great example of modular arithmetic. If it is 10:00, an event in 4 hours will be at 2:00, not 14:00. Time "wraps around" at 12.

We write this as 14 ≡ 2 (mod 12). This reads "14 is congruent to 2 modulo 12." The number that sets the cycle, 12, is the modulus. This works for days of the week (modulus 7) and other cycles.

An illustration of a 12-hour clock demonstrating modular arithmetic. An arrow starts at 10, moves forward 4 hours, and lands on 2, showing 10 + 4 ≡ 2 (mod 12).

Congruence and Residue Classes

The idea of congruence makes this formal.

Two integers a and b are congruent modulo m if (a - b) is a multiple of m. Another way to say this is that a and b have the same remainder when divided by m.

For example, 38 ≡ 14 (mod 12) because 38 - 14 = 24, a multiple of 12. Both 38 and 14 also have a remainder of 2 when divided by 12.

Congruence splits all integers into m different sets. These sets are called residue classes. They match the m possible remainders: 0, 1, 2, ..., m-1. For modulus 5, the residue class for 2 is {..., -8, -3, 2, 7, 12, ...}.

This makes arithmetic simpler. We can work with a small set of m classes instead of infinite integers.

Section 2.2: Operations in Modular Arithmetic

Addition, subtraction, and multiplication work as you would expect in modular systems.

Addition: (a + b) mod m = ((a mod m) + (b mod m)) mod m
Subtraction: (a - b) mod m = ((a mod m) - (b mod m) + m) mod m
Multiplication: (a × b) mod m = ((a mod m) × (b mod m)) mod m

These rules are useful in algorithms. They keep all intermediate numbers small and prevent integer overflow. The + m in subtraction is a programming trick to make sure the result is positive.

Section 2.3: Advanced Modular Operations

Modular Exponentiation: Finding (aᵇ) mod m

We often need to find aᵇ mod m where b is very large.

A simple approach is to compute aᵇ first, then take the modulus. This does not work. The value aᵇ grows very quickly and overflows standard data types. A loop that multiplies by a and takes the modulus b times is also too slow. Its complexity is O(b).

A much faster O(log b) method is binary exponentiation, or exponentiation by squaring. This method uses the binary form of the exponent b. Any number b is a sum of powers of 2. For example, 13 is 1101 in binary, so 13 = 8 + 4 + 1. This means a¹³ = a⁸⁺⁴⁺¹ = a⁸ × a⁴ × a¹.

We find terms like a¹, a², a⁴, a⁸, ... by squaring repeatedly: a² = a¹ × a¹, a⁴ = a² × a², and so on. This lets us compute aᵇ mod m with few multiplications.

Modular Division and the Multiplicative Inverse

Division is different in modular arithmetic. We do not write (a/b) mod m. We multiply by a modular multiplicative inverse.

The modular inverse of an integer b modulo m, written b⁻¹, is an integer x where b × x ≡ 1 (mod m). The division (a/b) mod m is then (a × b⁻¹) mod m.

An inverse for b exists only if b and m are coprime. This means gcd(b, m) = 1. This connects back to our work in Part I. If the modulus m is a prime number, an inverse exists for every integer b from 1 to m-1.

There are two main ways to find the modular inverse:

Using the Extended Euclidean Algorithm: We know gcd(b, m) = 1. This means integers x and y exist where b·x + m·y = 1. If we take this equation modulo m, the m·y term is zero. This leaves b·x ≡ 1 (mod m). The x from the Extended Euclidean Algorithm is the modular inverse. This works for any modulus m.
Using Fermat's Little Theorem: This is a shortcut for a prime modulus p. The theorem says if p does not divide b, then bᵖ⁻¹ ≡ 1 (mod p). We can rewrite this as b × bᵖ⁻² ≡ 1 (mod p). This means b⁻¹ ≡ bᵖ⁻² (mod p). We can compute bᵖ⁻² (mod p) with binary exponentiation.

Section 2.4: Computer Science Applications

Hashing

Modular arithmetic is key to hashing. A hash function maps a large universe of keys to a small range of array indexes. The modulo operator is the simplest tool for this: index = hash(key) % table_size. This "wrap-around" action is how hash tables work.

Competitive Programming and Large Numbers

In programming contests, many problems use numbers too big for a 64-bit integer. The problems often ask for the answer modulo a large number to solve this.

The number 1,000,000,007 (or 10⁹ + 7) is a common modulus. It is useful for a few reasons:

It is prime. This is its best feature. Since it is prime, a modular inverse exists for any number from 1 to 10⁹ + 6. This means division is always possible.
It is large but fits in standard types. It fits in a 32-bit integer. The product of two smaller numbers, like (10⁹ + 6) × (10⁹ + 6), will not overflow a 64-bit integer. This makes calculations safe.

Modular arithmetic lets programmers solve complex problems without a custom big-integer library.

Part III: Prime Numbers and Primality Testing

This part introduces prime numbers and a basic algorithm to find them.

Section 3.1: The Building Blocks of Integers

A prime number is a natural number greater than 1 with only two divisors: 1 and itself. The first few primes are 2, 3, 5, 7, 11, and 13. A composite number is a natural number greater than 1 that is not prime. The number 1 is neither prime nor composite.

The Fundamental Theorem of Arithmetic states every integer greater than 1 is either a prime or a unique product of primes. Primes are the basic "building blocks" of all other integers.

Section 3.2: Trial Division for Primality Testing

The Algorithm and a Key Improvement

The simplest way to check if a number n is prime is trial division. The method tests if n is divisible by any integer from 2 to n-1. If a divisor is found, n is composite. If not, n is prime.

This is slow. A key improvement makes it much faster. We only need to test divisors up to the square root of n.

Why does this work?

Factors come in pairs. If a number n is composite, it can be n = a × b. It is not possible for both a and b to be larger than √n. If they were, their product a × b would be greater than √n × √n = n. This is a contradiction. So any composite number n has at least one factor less than or equal to √n. We only need to check for factors up to √n.

A graph comparing the number of operations for a primality test, showing the slow O(n) approach versus the much faster O(√n) optimized approach

Implementation and Complexity

Here is an optimized Python implementation of the trial division test.

import math

def is_prime(n):
    # Primes must be greater than 1.
    if n <= 1:
        return False
    # 2 and 3 are prime.
    if n <= 3:
        return True
    # Check divisibility by 2 or 3.
    if n % 2 == 0 or n % 3 == 0:
        return False
    # Check divisors of the form 6k ± 1 up to sqrt(n).
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

The time complexity of this algorithm is O(√n). This is much better than O(n). It is fast enough for numbers up to around 10¹² in programming contests.

Limitations and Future Topics

For very large numbers, like those in cryptography, an O(√n) algorithm is too slow. These fields use advanced algorithms:

Probabilistic Tests: The Miller-Rabin test is very fast and can find with high probability if a number is prime. It has a very small chance of error.
Deterministic Tests: The AKS primality test can prove if a number is prime. It is a polynomial-time algorithm, but it is often slower in practice than probabilistic tests.

The security of systems like RSA is based on the fact that multiplying two large primes is easy, but factoring their product is hard.

Part IV: Practice and Further Study

Solving problems is the best way to learn these ideas.

Section 4.1: Practice Problems

Euclidean Algorithm and GCD/LCM

GeeksforGeeks: "Program to find GCD or HCF of two numbers"
HackerEarth: "Number Theory - Part 1"
Codeforces: Problem 1203C - "Common Divisors"
Codeforces: Problem 582A - "GCD Table"
HackerRank: "Sherlock and GCD"
Codeforces: Problem 1285C - "Fadi and LCM"

Modular Arithmetic and Exponentiation

SPOJ: "The last digit" (LASTDIG)
CSES: "Exponentiation II"
GeeksforGeeks: "Modular Exponentiation for Large Numbers"
CodeChef: Practice problems tagged "modular-arithmetic"
UVA: Problem 10104 - "Euclid Problem"
HackerEarth: "Number Theory - Part 2"
Codeforces: Problem 1236B - "Alice and the List of Presents"

Primality Testing and Factorization

SPOJ: "Prime Generator" (PRIME1)
Codeforces: Problem 230B - "T-Primes"
Codeforces: Problem 1165D - "Almost All Divisors"

Section 4.2: Resources for Deeper Study

Books

Elementary Number Theory by David M. Burton. A classic, readable introduction.
An Introduction to Mathematical Cryptography by Hoffstein, Pipher, and Silverman. For cryptographic uses.

Online Tutorials

Khan Academy: The Number Theory section has good introductory videos and exercises.
CP-Algorithms: A great resource for competitive programmers with detailed articles.
GeeksforGeeks: A large library of articles and practice problems on number theory.

Online Judges

Consistent practice on platforms like Codeforces, HackerEarth, SPOJ, CodeChef, AtCoder, and Project Euler is the best way to build your skills.

Previous Chapter

← DSA Fundamentals #3

Next Chapter

Coming Soon →