Karatsuba’s algorithm

What’s the fastest way to multiply two n-digit numbers? Well, that depends on n, but maybe I’ll show you something that you didn’t know was possible. To multiply them with less operations that you would do in the digit-by-digit multiplication.

To multiply two n-digit numbers, you have to do n2 n^2 single digit multiplications. That’s obvious — you have to multiply each digit of one numba to each digit of the other one. There’s no way around it. When this factoid was mentioned by Andrei Kolomogorov in a seminar in 1960, one of his students was audacious enough to not take it for granted.

This student was named Anatoly Karatsuba and he managed to find a way to multiply numbers faster than it was assumed possible. He found the first algorithm to multiply numbers faster then multiplying them classicly thus opening the gates for algorithm development and improvements in an area that everyone until then had considered done and solved.

The breakthrough

So what did Karatsuba notice that no one else had?

Consider 26 and 43. To multiply them classically you would do the following:

26
* 43
-----------------------
6*3 =  18
6*4 =  24 
2*3 =  6 
2*4 =  8  
-----------------------
1118

That’s four single-digit multiplications as the counter on the left shows. This distribution can be expressed via the following formula where b=10b=10:

2643=(2b+6)(4b+3)=2b4b+2b3+64b+63 26 \cdot 43 = (2b+6)(4b+3) = 2b \cdot 4b + 2b \cdot 3 + 6 \cdot 4b + 6 \cdot 3

OK, let’s switch to something more abstract. Denote the first number by xx and its digits by x1x_1 and x0x_0. Similarly the second number will be represented by yy and we obtain this:

xy=(x1b+x0)(y1b+y0)=x1by1b+x1by0+x0y1b+x0y0=x1y1b2+x1y0b+x0y1b+x0y0 \begin{align*} xy & = (x_1 b + x_0)(y_1 b + y_0) \\ & = x_1 b \cdot y_1 b + x_1 b \cdot y_0 + x_0 \cdot y_1 b + x_0 \cdot y_0 \\ & = x_1 y_1 b^2 + x_1 y_0 b + x_0 y_1 b + x_0 y_0 \end{align*}

What Karatsuba noticed about this simple expression was that coefficients by the term bb can be expressed using one multiplication and the other coefficients:

x1y0+x0y1=(x1+x0)(y1+y0)x1y1x0y0 x_1 y_0 + x_0 y_1 = (x_1 + x_0) (y_1 + y_0) - x_1 y_1 - x_0 y_0

So we get this strange algorithm. First calculate 242 \cdot 4 and 636 \cdot 3. Then, instead of calculating 232 \cdot 3 and 646 \cdot 4 we multiply 878 \cdot 7 and reuse the previous results to get what we were missing:

23+64=(2+6)(3+4)2463 2 \cdot 3 + 6 \cdot 4 = (2 + 6) (3 + 4) - 2 \cdot 4 - 6 \cdot 3

And that’s the key. We did three multiplications instead of four.

The algo

In practice this is used for large numbers. x1x_1 and x0x_0 will not actually be single digits, but the first half and the second half digits of the numbers. And bb will be whatever it takes to make x=x1b+x0x = x_1 b + x_0 work. And y=y1b+y0y = y_1 b + y_0, of course.

With those definitions in mind, do three multiplications to calculate these three numbers:

z2=x1y1z0=x0y0z1=(x1+x0)(y1+y0)z2z0 \begin{align*} z_2 &= x_1 y_1 \\ z_0 &= x_0 y_0 \\ z_1 &= (x_1 + x_0) (y_1 + y_0) - z_2 - z_0 \end{align*}

And obtain the result:

xy=z2b2+z1b+z0 xy = z_2 b^2 + z_1 b + z_0

But wait! I forgot to mention that this algorithm should be used recursively. If the numbers involved to calculate those z’s above are large, you should do the involved multiplications using the Karatsuba algorithm.

Example

Let’s multiply 416001 by 719028. We will fall back to the classic multiplication algorithm for numbers below 10000.

We split 416001 into 416 and 001; and 719028 into 719 and 028. We will also drop the leading zeros right away turning 001 and 028 into 1 and 28 respectively.

The display starts by calculating the z0, z1 and z2 and putting them together to calculate the result. The values of involved multiplications are calculated below, you can use the orange links to highlight a calculation. You can see the counter for individual multiplications on the left hand side.

416001
* 719028
-----------------------
z0 = 1 * 28 = 0
z1 = 1 + 416 * 28 + 719 - z0 - z2 = 0   
z2 = 416 * 719 = 0      
-----------------------
z2*10^(2*3) + z1*10^3 + z0 = 0
1
* 28
-----------------------
1*8 =  8
1*2 =  2 
-----------------------
28
417
* 747
-----------------------
7*7 =  49
7*4 =  28 
7*7 =  49  
1*7 =  7 
1*4 =  4  
1*7 =  7   
4*7 =  28  
4*4 =  16   
4*7 =  28    
-----------------------
311499
416
* 719
-----------------------
6*9 =  54
6*1 =  6 
6*7 =  42  
1*9 =  9 
1*1 =  1  
1*7 =  7   
4*9 =  36  
4*1 =  4   
4*7 =  28    
-----------------------
299104

This multiplication took only 20 steps because we simplified some of the numbers with leading zeros. If we hadn’t done this step and just done multiplications with all the digits of 001, we’d have done nine single-digit multiplications in each of the sub-steps and need to do 27 multiplications in total. In the worst case we might’ve also gotten two 4-digit numbers in the z1 calculation which would’ve required 16 instead of 9 multiplications, bringing the total 34.

But even with these simplifications taken away, we still have a slight edge as the classic digit-by-digit multiplication would’ve required to do 36 multiplications.

You can try out the same visualization with different (larger!) numbers here.

Other algos

The classic digit-by-digit algorithm requires n2n^2 multiplications. And some additions as well, but it can be estimated to run in O(n2) \mathcal{O} (n^2) .

The number of operations (time complexity) of Karatsuba’s algorithm can be estimated to be Θ(nlog23) \Theta(n^{\log_2 3}) (the big theta is a bit more precise than the big o as it’s also bounded from below).

As Karatsuba revealed that the O(n2) \mathcal{O} (n^2) multiplication can be beaten, it opened a flood of other attempts.

Soon Andrei Toom proposed and Stephen Cook refined a variation where the numbers would be split in more than 2 parts. Under this interpretation Karatsuba’s algorithm becomes Toom-2, splitting into three parts is referred to as Toom-3 (or sometimes just Toom-Cook) and so on. The complexity evaluation is hard for arbitrary Toom-k algorithm, but Toom-3 runs in Θ(nlog35) \Theta(n^{\log_3 5}) .

In 1971 Arnold Schönhage and Volker Strassen came up with an idea to multiply integers by multiplying polynomials which can be sped up by then recently popularized fast Fourier transforms. Their algorithm works in O(nlognloglogn) \mathcal{O} (n \log n \log \log n) and it was the fastest multiplication algorithm for large numbers until 2007. They conjectured that multiplication can be done in O(nlogn) \mathcal{O} (n \log n) .

This bound was reached in 2019 by David Harvey and Joris van der Hoeven who used multi-dimensional (1729-dimensional to be precise, what a taxicab coincidence) FFTs. But it is still unknown whether O(nlogn) \mathcal{O} (n \log n) is truly the best we can do.

It must also be noted that all of these algorithms have significant overhead and the more complex algorithms are faster only for very large numbers. The Karatsuba and Toom-Cook algorithms are used for numbers consisting of hundreds or thousands of digits. The Schönhage–Strassen algorithm is usually used for numbers that have at least tens of thousands of digits. And it is used for billions of digits as well. The newer algorithms are so complex that they only reach the asymptotic complexity for impractically large numbers and are not used in any software, at least until futher polishing.

Publicēts 2023-01-08