xy=(x0+B*x1)(y0+B*y1)=x0y0 + (x0*y1+x1*y0)B + x1*y1*B²
there are four multiplications of smaller numbers, also here the calculation time is proportional to n². However
(x0*y1+x1*y0)=(x0+x1)(y0+y1)-x0*y0-x1*y1
why it's enough to calculate three smaller multiplications
x0*y0, x1*y1 and (x0+x1)(y0+y1).
The multiplication with B are fast left shifting and if the shifting and the addition where cost free, the recursive Karatsuba multiplication would be very efficient
but unfortunately the extra math (and in Forth also some stack juggling) takes a lot of time and the method is efficient only for rather big numbers. For very big numbers, however, it's very efficient.
Here is the way I implemented it in ANS Forth:
: bcells* \ big m -- big*C^m
cells top$ locals| n ad mb |
ad ad mb + n move
ad mb erase
mb bvp @ +! ;
\ C is the number of digits in a cell
: bcells/ \ big m -- big/C^m
cells top$ locals| n ad mb |
ad mb + ad n move
mb negate bvp @ +! ;
dup nextfree <
if bvp @ dup @ vp+ bvp @ ! !
else drop bzero
then ;
\ A big number is split on the big stack at address ad
: btransmul \ x y -- x0 x1 y0 y1 m B=2^bits
len1 len2 max cell + lcell 1+ rshift \ m
dup >r cells
>bx first over + bsplit
bx> first + bsplit r> ;
\ x=x0+x1*B^m y=y0+y1*B^m
0x84 value karalim \ break point byte length for termination.
: b* \ x y -- xy
len1 len2 max karalim <
if b* exit then
btransmul >r \ x0 x1 y0 y1
3 bpick 2 bpick recurse >bx \ bx: x0*y0
2 bpick 1 bpick recurse >bx \ bx: x0*y0 x1*y1
b+ >bx b+ bx> recurse \ (x0+x1)(y0+y1)
bx b- by b- r@ bcells* \ z1*C^m
bx> r> 2* bcells* bx> b+ b+ <top ;
\ Karatsuba multiplication