## Monday, December 28, 2015

### Prime tables and the prime counting function

The only possibility for a very fast prime counting function is to make tables. To make a straight forward table would take a lot of time and a lot of memory, but a table of primes is more compact, is loading faster and can be used for fast prime counting.

It's possible to store consecutive prime series in tables using only two bytes per prime number up to virtually any size (at least 100 decimals or so), with a technique using that the prime gaps for those numbers are less than 65536=2^16. What's stored in the main table are the prime numbers modulo 65536 - and in a minor table "break points" are stored: the number n when the next prime Pn+1 divided with 65536 is greater than the same for the previous prime. A fast search in the minor table gives the index number, that multiplied with 65536 should be added to the number in the main table to get the prime.

Such a table for all primes less than 2^24 is created in less than a second and can be used to compute π(x) for x<16777216 in a few microseconds. The memory needed for the main table is 2*π(16777216)=2*1077871, slightly more than 2 Mb, and for the minor table 1-2 kb if the break points are stored as single integers. The memory saved is near 50 % in this case, but for larger tables the memory saved will be 100 % or more.

Making a table for all 64 bit primes would need about 5 billion Gb. All 32 bit primes would need about 1 Gb. All primes less than 100,000,000 would need about 10 Mb and would take some time to load (about 5 seconds). For greater values it's more realistic to use other and slower methods, like files with records of primes and their order numbers, with one second computing time gap in between.

First some words that is used:

: prevprime ( numb -- prime )
dup 3 u< if drop 2 exit then
1- 1 or
begin dup fermat-rabin-miller 0=
while 2 -
repeat ;

: ?def ( -- flag )  bl word find nip 0= ;

?def under+ [if]

: under+ ( a b c -- a+c b )  rot + swap ; [then]

: 8/mod ( n -- r q )  dup 7 and swap 3 rshift ;

: 256/mod ( n -- r q )  dup 255 and swap 8 rshift ;
: 65536/mod ( n -- r q )  dup 65535 and swap 16 rshift ;

The idea is to use the sieve of Eratosthenes to create av very fast word prime_ ( n -- flag ) and bit arrays is used.

\ bit arrays

: adbit ( i ad -- j a )  swap 8/mod rot + ;

: setbit \ i ad --

adbit 1 rot lshift over c@ or swap c! ;

: tglbit \ i ad --

adbit 1 rot lshift over c@ xor swap c! ;

: clrbit \ i ad --

adbit 1 rot lshift 255 xor over c@ and swap c! ;

: bit@ \ i ad -- bit

adbit c@ swap rshift 1 and ;

: bit! \ bit i ad --

rot if setbit else clrbit then ;

: bit? ( i ad -- f )  bit@ negate ;

\ Defining named bitarrays i.e.
\ 1000000 bitarray megaset constant megasetbuffer
: bitarray \ bits -- ad | n -- bit
8/mod swap 0= 0= negate + allocate throw
create dup ,
does> @ swap 8/mod rot + bit@ ;

For storing numbers in i=2,3,... bytes spaces I use the words

mb! ( n ad i -- )

and

mb@ ( ad i -- n )

\ multiple byte read/write in arrays

variable ebuf
1 ebuf ! ebuf c@ negate
[if] \ little-endian, as in Windows and Android
: mb! ( n ad i -- )  2>r ebuf ! ebuf 2r> cmove ;
: mb@ ( ad i -- n )  ebuf 0 over ! swap cmove ebuf @ ;
[else] \ big-endian, as in MacOS
: mb! ( n ad i -- )  2>r ebuf ! ebuf cell + r@ - 2r> cmove ;
: mb@ ( ad i -- n )  ebuf 0 over ! cell + over - swap cmove ebuf @ ;
[then]

### The tables

The method below can be used for any number of primes. Here plim=2^24 and pi_plim=π(plim)=1077871:

\ the sieve of Eratosthenes

0xfffff constant plim
82025 constant pi_plim

\ 16777215 constant plim   \ Loads in
\ 1077871 constant pi_plim \ a few seconds

\ 100000000 constant plim  \ 100000000 takes 6 times
\ 5761455 constant pi_plim \ longer time to load

plim bitarray prime_ constant pbuf
\ prime_ ( n -- flag )  n<plim

: resetsieve ( -- )
pbuf plim 8/mod swap 0= 0= - + pbuf
do true i ! cell +loop ;

: sieve ( -- )

resetsieve
0 pbuf clrbit
1 pbuf clrbit
plim sqrtf 1+ 2
do i pbuf bit@
if plim 1+ i dup *
do i pbuf clrbit j +loop
then
loop ; \ takes approximate 3 seconds or less to load

Next improvement of the prime functions:

plim prevprime constant pmax_

: nextprime_ ( numb -- prime )  \ numb<pmax_
dup 3 u< if drop 2 exit then
1 or
begin dup prime_ 0=
while 2 +
repeat ;

: prevprime_ ( numb -- prime )
dup 3 u< if drop 2 exit then
1- 1 or
begin dup prime_ 0=
while 2 -
repeat ;

: prime ( n -- flag )

dup plim u> if prime else prime_ negate then ;

: nextprime ( numb -- prime )

dup pmax_ u< if nextprime_ else nextprime then ;

: prevprime ( numb -- prime )

dup plim u< if prevprime_ else prevprime then ;

It takes less than a second to store all primes less than plim=2^24 in the buffer pnrbuf.

plim 65536/mod swap 0= 0= - constant breaknumbers
pi_plim 2* allocate throw constant pnrbuf
breaknumbers cells allocate throw constant breaks

: init_pnr ( -- )
0 pad ! 0 breaks !
1 pi_plim 1+ 1
do nextprime_ dup                   \ p p
65536/mod pad @ = 0=             \ p r flag
if 1 pad +!                      \ p r
i 1- pad @ cells breaks + !   \ p r
then pnrbuf i 1- 2* + 2 mb! 1+   \ p+1
loop drop ; init_pnr

: newintbreaks ( i j x -- i' j' )     \ bisection search
>r 2dup + 2/ dup
cells breaks + @ r> u>              \ i j k flag
if -rot nip else nip then ;

: pnr@ ( n -- p )                     \ p is n:th prime

1- dup >r 2* pnrbuf + 2 mb@         \ n ≤ pi_plim
breaknumbers 0
begin r@ newintbreaks 2dup - 2 u<
until rdrop nip 16 lshift + ;

From n comes the modulo in pnrbuf and bisection search in breaks gives the index i. Then Pn=Pn(mod 2^16)+i*2^16, where 0≤i<breaknumbers.

: newintpnr ( i j x -- i' j' )
>r 2dup + 2/ dup pnr@ r> u>         \ i j k flag
if -rot nip else nip then ;

: pi ( x -- n ) >r pi_plim 1+ 0       \ x ≤ plim

begin r@ newintpnr 2dup - 2 u<      \ i j flag
until rdrop nip ;

### A conjecture

A multiplicative function is such that f(ab)=f(a)f(b). All multiplicative functions on the positive integers can be defined by the action on the primes in the prime factorization and any function defined that way is multiplicative. The function that maps primes p on π(p), that is, the nth prime on n, defines a multiplicative function f that maps positive integers on positive integers.

 Blue plot f(n) and red plot π(n)
The conjecture is that f(n)≤π(n) except for n=35,49 and there is a proof on mathematics stack exchange but it's unclear if the proof is verified. Testing the conjecture for all n≤plim takes a few minuts if plim=100000000:

\ conjecture: if f:p1^n1*...*pk^nk -> 1^n1*...*k^nk ==>
\ f(n)π(n) for n>49

: func_f ( n -- m )
oddpart nip pollard# sort 1 swap 0
do swap pi * loop ;

Since f(Pn)=π(Pn)=n it's enough to test non primes:

: conjecture ( N -- )  1
do i prime_ 0=
if i func_f i pi u>
if cr i .
then
then
loop ;

100000000 conjecture
35
49  ok

## Sunday, December 13, 2015

### Gaussian integers

The problem with Gaussian integers in Forth is the problem of overflow. It's a minor problem that multiplication of two gintegers might need double word length for the components of the result, but a major problem that the norm (the sum of the square of the components) of gintegers might need double word length, since the components of the result of division of two gintegers are results of division with the norm.
In Forth there are no standard words for division where the denominator is a double. It's possible to define of course, but this need to be done in assembler to obtain a fair speed. Perhaps it can be done in floating point with 20 decimals precision, but that seems tricky according to portability. Or there might be some other trick?

This first approach to Gaussian integers will be of poor range: just 3 decimals in 32 bit systems and 6 decimals in 64-bit systems. I might be able to double the range in the future but that's uncertain, but I will implement Gaussian integers in big integers, even if there might be problems with speed for big Gaussian integers.

The input of a ginteger is simply two integers on the stack, re im, where the imaginary component is at the top of the stack. But it's nice with a typographical output. The standard output of integers, like the dot, also include spaces, which don't allow desirable outputs like

10-7i

In 'Starting Forth' the output formatting procedures are described and the word for printing signed integers without spaces could be like this

: .sign-w/o-space \ n --
dup 0< swap abs 0 <# #s rot sign #> type ;

which can be used for typographic output of Gaussian integers:

: g. \ a b --
2dup 0. d= if d. exit then
swap dup                           \ b a a
if over
if dup .sign-w/o-space
else dup .
then
then swap dup 0< 0=                \ a b f
if dup \ a b b
if swap if ." +" then dup 1 >   \ b f
if .sign-w/o-space else drop then ." i" space
else 2drop
then
else nip dup                       \ b b
if dup -1 <                     \ b f
if .sign-w/o-space else ." -" drop then ." i" space
else drop
then
then ;

Now 1 -1 g. results in 1-i, 0 1 g. in i and 10 7 g. in 10+7i.

And so a word printing the numbers on the stack interpreted as gintegers:

: .gs depth 2 < if exit then 2>r recurse 2r> 2dup g. ;

To drop, swap, rot gintegers one have to use 2drop, 2swap, 2rot and so on.

### Arithmetics

: gnorm \ a b -- n
dup * swap dup * + ;

: g< \ a b c d -- flag

gnorm -rot gnorm u> ;

: g+ \ a b c d -- a+c b+d

under+ under+ ;

: g- \ a b c d -- a-c b-d

negate under+ negate under+ ;

: g* { a b c d -- ac-bd ad+bc }  \ (a+ib)(c+id)

a c * b d * - a d * b c * + ;

: g/ { a b c d -- (ac+bd)/(c^2+d^2) (bc-ad)/(c^2+d^2) }
c d gnorm
a c * b d * + over /
b c * a d * - rot / ;

There is a need for a more general interpretation of modulo for gintegers that guaranties that the rest of a division has a smaller norm than the norm of the denominator. This is well explained in this paper.

The main idea is to use a generalized modulo for integers: the rest of n/m should be the smallest non negative integer r such that n=mq±r. This correspond to a generalized modulo for Gaussian integers, where all division of components use "rounded" results instead of the floor:

: /' ( a b -- c ) dup 2/ under+ / ;   \ c=floor((a+b/2)/b)

: g/' { a b c d -- (ac+bd)/'(c^2+d^2) (bc-ad)/'(c^2+d^2) }

c d gnorm
a c * b d * + over /'
b c * a d * - rot /' ;

Thinking about it I found a way to use floats to increase the range:

\ --- alternative for about 50 % greater range -----

: g* { a b c d -- e f }
a c m* b d m* d- d>s a d m* b c m* d+ d>s ;

: g/ { a b c d -- e f }
c dup m* d dup m* d+ d>f
a c m* b d m* d+ d>f fover f/
b c m* a d m* d- d>f frot f/
fswap f>d d>s f>d d>s ;

: g/' { a b c d -- e f }
c dup m* d dup m* d+ d>f
a c m* b d m* d+ d>f fover f/ fround
b c m* a d m* d- d>f frot f/ fround
fswap f>d d>s f>d d>s ;

\ --------------------------------------------------

Using floats made a test running on my computer about 10 times slower, when I repeated testing gpollard2 (see below). But then I used numbers with random components in the range of 9 decimals, while in the first test a range of 6 decimals. The alternative g* g/ and g/' manage at least 4 decimals in 32-bit systems and 9 decimals in 64-bit systems.

: gmod ( a b c d -- e f )
2over 2over g/' g* g- ;

: ggcd ( a b c d -- e f ) \ gcd for Gaussian integers

0 0 loc{ a b c d s t }  \ Euclid's algorithm
a b c d g<
if a c to a to c
b d to b to d
then
begin c d or \ while c+id<>0
while c to s d to t
a b c d gmod to d to c
t to b s to a
repeat a b ;

Of course the divisor is up to multiplication with unit: 1, i, -1, -i. To bring some order and simplicity to the matter I use a normal form: the normal form of a Gaussian integer is the representant where the real component is greater or equal then zero and greater or equal then the imaginary component:

: gnormal ( a b -- c d )  \ c>=d and c>=0
2dup abs swap abs >
if 0 1 g* then over 0<
if negate swap negate swap then ;

Due to Wolfram Mathworld a Gaussian integer a+ib is a prime if and only if it is satisfying one of the following properties:
1. If both  and  are nonzero then,  is a Gaussian prime iff  is an ordinary prime;
2. If , then  is a Gaussian prime iff  is an ordinary prime and ;
3. If , then  is a Gaussian prime iff  is an ordinary prime and .
: gprime1 \ n --flag
abs dup prime swap 3 and 3 = and ;

: gprime \ a b -- flag

over 0= if nip gprime1 exit then
dup 0= if drop gprime1 exit then
gnorm prime ;

Since prime is only defined for single integers, gprime only works properly if a²+b² is a single cell integer.

To print the norm and the number a+ib in a table form:

: .normgp ( a b norm -- )
cr 8 .r space g. ;

The next word print the norm and the normal form of all primes with a certain norm.

: .gprime loc{ norm -- }
norm 2 = if 1 1 norm .normgp exit then
norm sqrtf 1+ 0
do i 0
?do i j gnorm norm =
if i j gprime
if j i norm .normgp then
then
loop
loop ;

: .gps ( n -- ) 1+ 2 do i .gprime loop ;

The word .gps lists all norms and primes on normal form up to numbers with the norm n.

Note, that if a+ib is a prime then so is (a±ib)i^k, k=0,1,2,3.

### Factorization

The Pollard rho factorization method seems to work for Gaussian integers with some minor changes:

: gfunc ( a b x y -- u v )
2dup g* -1 0 g+ 2swap gmod ;

First I tested with the function z²+1 which don't seems to factorize all Gaussian integers, for example not 4+3i. Then I tried z²-1 and with that function I've tested millions of cases without problem.

: gpollard1 ( a b c d -- p q flag )  \ a+ib not having factor 2
2dup loc{ a b alpha1 alpha2 beta1 beta2 } 0.
begin 2drop a b alpha1 alpha2 gfunc to alpha2 to alpha1
a b 2dup beta1 beta2 gfunc gfunc to beta2 to beta1
alpha1 alpha2 beta1 beta2 g-
a b ggcd 2dup a b d=
if false exit
then 2dup gnorm 1 = 0=
until true ;

: gpollard2 ( a b -- p q )
false loc{ flag }
2dup gnorm 1 = if exit then
2dup 2 0 gmod d0= if 2drop 1 1 exit then
2dup gprime if exit then -1 1
do i 0
do 2dup j i gpollard1
if true to flag leave then 2drop
loop flag if leave then
loop 2swap 2drop ;

: gfac ( a b -- p1 q1 ... pk qk )
2dup gpollard2 2over 2over gnorm -rot gnorm =
if 2drop exit
then 2dup 2>r g/ recurse 2r> recurse ;

795 649 2dup g. 795+649i  ok
gfac .gs 8-5i 4+9i -6+5i -1-i  ok
g* g* g* g. 795+649i  ok

7891 9785 2dup g. 7891+9785i  ok
gfac .gs -47-10i 19+184i -1+i  ok
g* g* g. 7891+9785i  ok