It's possible to store consecutive prime series in tables using only two bytes per prime number up to virtually any size (at least 100 decimals or so), with a technique using that the prime gaps for those numbers are less than 65536=2^16. What's stored in the main table are the prime numbers modulo 65536 - and in a minor table "break points" are stored: the number n when the next prime Pn+1 divided with 65536 is greater than the same for the previous prime. A fast search in the minor table gives the index number, that multiplied with 65536 should be added to the number in the main table to get the prime.

Such a table for all primes less than 2^24 is created in less than a second and can be used to compute π(x) for x<16777216 in a few microseconds. The memory needed for the main table is 2*π(16777216)=2*1077871, slightly more than 2 Mb, and for the minor table 1-2 kb if the break points are stored as single integers. The memory saved is near 50 % in this case, but for larger tables the memory saved will be 100 % or more.

Making a table for all 64 bit primes would need about 5 billion Gb. All 32 bit primes would need about 1 Gb. All primes less than 100,000,000 would need about 10 Mb and would take some time to load (about 5 seconds). For greater values it's more realistic to use other and slower methods, like files with records of primes and their order numbers, with one second computing time gap in between.

First some words that is used:

**: prevprime ( numb -- prime )**

**dup 3 u< if drop 2 exit then**

**1- 1 or**

**begin dup fermat-rabin-miller 0=**

**while 2 -**

**repeat ;**

: ?def ( -- flag ) bl word find nip 0= ;

: ?def ( -- flag ) bl word find nip 0= ;

?def under+ [if]

?def under+ [if]

**: under+ ( a b c -- a+c b ) rot + swap ; [then]**

: 8/mod ( n -- r q ) dup 7 and swap 3 rshift ;

: 8/mod ( n -- r q ) dup 7 and swap 3 rshift ;

**: 256/mod ( n -- r q ) dup 255 and swap 8 rshift ;**

**: 65536/mod ( n -- r q ) dup 65535 and swap 16 rshift ;**

The idea is to use the sieve of Eratosthenes to create av very fast word prime_ ( n -- flag ) and bit arrays is used.

\ bit arrays

**: adbit ( i ad -- j a ) swap 8/mod rot + ;**

: setbit \ i ad --

: setbit \ i ad --

**adbit 1 rot lshift over c@ or swap c! ;**

: tglbit \ i ad --

: tglbit \ i ad --

**adbit 1 rot lshift over c@ xor swap c! ;**

: clrbit \ i ad --

: clrbit \ i ad --

**adbit 1 rot lshift 255 xor over c@ and swap c! ;**

: bit@ \ i ad -- bit

: bit@ \ i ad -- bit

**adbit c@ swap rshift 1 and ;**

: bit! \ bit i ad --

: bit! \ bit i ad --

**rot if setbit else clrbit then ;**

: bit? ( i ad -- f ) bit@ negate ;

: bit? ( i ad -- f ) bit@ negate ;

\ Defining named bitarrays i.e.

\ 1000000 bitarray megaset constant megasetbuffer

**: bitarray \ bits -- ad | n -- bit**

**8/mod swap 0= 0= negate + allocate throw**

**create dup ,**

**does> @ swap 8/mod rot + bit@ ;**

For storing numbers in i=2,3,... bytes spaces I use the words

mb! ( n ad i -- )

and

mb@ ( ad i -- n )

\ multiple byte read/write in arrays

**variable ebuf**

**1 ebuf ! ebuf c@ negate**

**[if] \ little-endian, as in Windows and Android**

**: mb! ( n ad i -- ) 2>r ebuf ! ebuf 2r> cmove ;**

**: mb@ ( ad i -- n ) ebuf 0 over ! swap cmove ebuf @ ;**

**[else] \ big-endian, as in MacOS**

**: mb! ( n ad i -- ) 2>r ebuf ! ebuf cell + r@ - 2r> cmove ;**

**: mb@ ( ad i -- n ) ebuf 0 over ! cell + over - swap cmove ebuf @ ;**

**[then]**

### The tables

The method below can be used for any number of primes. Here plim=2^24 and pi_plim=π(plim)=1077871:\ the sieve of Eratosthenes

**0xfffff constant plim**

82025 constant pi_plim

\ 16777215 constant plim \ Loads in

\ 1077871 constant pi_plim \ a few seconds

\ 100000000 constant plim \ 100000000 takes 6 times

\ 5761455 constant pi_plim \ longer time to load

**plim bitarray prime_ constant pbuf**

\ prime_ ( n -- flag ) n<plim

**: resetsieve ( -- )**

**pbuf plim 8/mod swap 0= 0= - + pbuf**

**do true i ! cell +loop ;**

: sieve ( -- )

: sieve ( -- )

**resetsieve**

**0 pbuf clrbit**

**1 pbuf clrbit**

**plim sqrtf 1+ 2**

**do i pbuf bit@**

**if plim 1+ i dup ***

**do i pbuf clrbit j +loop**

**then**

**loop ; \ takes approximate 3 seconds or less to load**

Next improvement of the prime functions:

**plim prevprime constant pmax_**

**: nextprime_ ( numb -- prime ) \ numb<pmax_**

**dup 3 u< if drop 2 exit then**

**1 or**

**begin dup prime_ 0=**

**while 2 +**

**repeat ;**

**: prevprime_ ( numb -- prime )**

**dup 3 u< if drop 2 exit then**

**1- 1 or**

**begin dup prime_ 0=**

**while 2 -**

**repeat ;**

: prime ( n -- flag )

: prime ( n -- flag )

**dup plim u> if prime else prime_ negate then ;**

: nextprime ( numb -- prime )

: nextprime ( numb -- prime )

**dup pmax_ u< if nextprime_ else nextprime then ;**

: prevprime ( numb -- prime )

: prevprime ( numb -- prime )

**dup plim u< if prevprime_ else prevprime then ;**

**It takes less than a second to store all primes less than plim=2^24 in the buffer pnrbuf.**

**plim 65536/mod swap 0= 0= - constant breaknumbers**

**pi_plim 2* allocate throw constant pnrbuf**

**breaknumbers cells allocate throw constant breaks**

**: init_pnr ( -- )**

**0 pad ! 0 breaks !**

**1 pi_plim 1+ 1**

**do nextprime_ dup \ p p**

**65536/mod pad @ = 0= \ p r flag**

**if 1 pad +! \ p r**

**i 1- pad @ cells breaks + ! \ p r**

**then pnrbuf i 1- 2* + 2 mb! 1+ \ p+1**

**loop drop ; init_pnr**

**: newintbreaks ( i j x -- i' j' ) \ bisection search**

**>r 2dup + 2/ dup**

**cells breaks + @ r> u> \ i j k flag**

**if -rot nip else nip then ;**

: pnr@ ( n -- p ) \ p is n:th prime

: pnr@ ( n -- p ) \ p is n:th prime

**1- dup >r 2* pnrbuf + 2 mb@ \ n ≤ pi_plim**

**breaknumbers 0**

**begin r@ newintbreaks 2dup - 2 u<**

**until rdrop nip 16 lshift + ;**

**pnrbuf**and bisection search in

**breaks**gives the index i. Then Pn=Pn(mod 2^16)+i*2^16, where 0≤i<breaknumbers.

**: newintpnr ( i j x -- i' j' )**

**>r 2dup + 2/ dup pnr@ r> u> \ i j k flag**

**if -rot nip else nip then ;**

: pi ( x -- n ) >r pi_plim 1+ 0 \ x ≤ plim

: pi ( x -- n ) >r pi_plim 1+ 0 \ x ≤ plim

**begin r@ newintpnr 2dup - 2 u< \ i j flag**

**until rdrop nip ;**

### A conjecture

A multiplicative function is such that f(ab)=f(a)f(b). All multiplicative functions on the positive integers can be defined by the action on the primes in the prime factorization and any function defined that way is multiplicative. The function that maps primes p on π(p), that is, the nth prime on n, defines a multiplicative function f that maps positive integers on positive integers.Blue plot f(n) and red plot π(n) |

\ conjecture: if f:p1^n1*...*pk^nk -> 1^n1*...*k^nk ==>

\ f(n)≤π(n) for n>49

**: func_f ( n -- m )**

**oddpart nip pollard# sort 1 swap 0**

**do swap pi * loop ;**

Since f(Pn)=π(Pn)=n it's enough to test non primes:

**: conjecture ( N -- ) 1**

do i prime_ 0=

if i func_f i pi u>

if cr i .

then

then

loop ;

do i prime_ 0=

if i func_f i pi u>

if cr i .

then

then

loop ;

100000000 conjecture

35

49 ok

"...more than 100 %."

ReplyDeleteYou can't save more than 100%.

No, you are right! But it could takes more than 100 % more memory NOT using any squeezing method. Happy new year!

ReplyDeleteWhile the mechanics of doing table-based prime counts are very interesting, I don't agree with the opening thesis: "The only possibility for a very fast prime counting function is to make tables." There are very fast prime count methods that don't use tables.

ReplyDeleteI really would like to see that algorithm. The algorithm at Wikipedia is very slow. There is a linked paper with some complicated computations that might make it faster, but I can't imagine it's comparable with table-based counting.

DeleteThe methods of Legendre, Meissel, and Lehmer all work quite well. LMO faster yet. The first three are well described in Riesel's 1994 book. LMO is rather complicated and I haven't seen an exposition that easily leads to code (combining the papers and spending quite a bit of time does work, but it's nothing like the others).

DeleteSee benchmark results and LMO code at:

https://github.com/danaj/Math-Prime-Util/blob/master/lmo.c

All the other algorithms are also in that repo.

Also see:

https://github.com/kimwalisch/primecount

which also has implementations of all the mentioned methods.

Sparse tables are a good method for small inputs, and my code uses them to 60M. But I'm also making a generic library and nobody wants big tables. The nth prime page can use absurdly large tables since that is the one and only application, and every user doesn't have to store the files. Note the time up to 10 billion in these functions are in the single millisecond or lower range -- I use tables to optimize for the case of being called inside a tight loop. Kim doesn't do anything different for small values.

Assuming your goal was to stay competitive with those, some good use of indices or estimates, sparse tables, and a good sieve for the difference could probably keep up to 10^18 or so using under 10MB. At some point you'll have the same problem as a multi-petabyte dense table system -- you've run out of data that other people have calculated, and now you have to start sieving to expand the table. How long will it take to fill in the table from 1*10^18 to 2*10^18?

I'll look at it once again. It's the recursive computation of phi that made me suspicious. I can't understand that a recursion formula like

Deletephi(m,n)=phi(m,n-1)-phi(m/pn,n-1)

can be that fast as in your link.

Speeding up the phi function is a critical part of making it fast -- you can't naively use the recursive formula for large values. There are a number of ways to do it, each with different tradeoffs. Mapes' method is one way to handle small values. Another good method for dealing with small a values is to use small tables, which can give you almost instant results for phi(x,a) when a < 10. Caching can also be done to vastly cut down on the number of recursions (basically doing efficient memoization). I made a very different method that creates a sorted list as it walks down the a's -- this has very few function calls but can be memory intensive.

ReplyDeleteSage uses an implementation of Legendre's method, with interesting phi optimizations -- Mapes' type calcs for small values, some shortcuts for large values, and then tables. You can see the code here:

https://github.com/sagemath/sage/blob/2cc7f6ce9b270c1efcd07b541fe94e34a245918d/src/sage/functions/prime_pi.pyx

primecount now uses eLMO by default, but the other algorithms are still there. His Phi uses small tables with small a values, caching otherwise.

Perl/ntheory uses eLMO by default, but the other algorithms are still there (no longer compiled in by default). There are a few ways to do phi -- small a with small tables (~32k of shorts), caching (32MB cache), or my a-walker.