Monday, December 28, 2015

Prime tables and the prime counting function

The only possibility for a very fast prime counting function is to make tables. To make a straight forward table would take a lot of time and a lot of memory, but a table of primes is more compact, is loading faster and can be used for fast prime counting.

It's possible to store consecutive prime series in tables using only two bytes per prime number up to virtually any size (at least 100 decimals or so), with a technique using that the prime gaps for those numbers are less than 65536=2^16. What's stored in the main table are the prime numbers modulo 65536 - and in a minor table "break points" are stored: the number n when the next prime Pn+1 divided with 65536 is greater than the same for the previous prime. A fast search in the minor table gives the index number, that multiplied with 65536 should be added to the number in the main table to get the prime.

Such a table for all primes less than 2^24 is created in less than a second and can be used to compute π(x) for x<16777216 in a few microseconds. The memory needed for the main table is 2*π(16777216)=2*1077871, slightly more than 2 Mb, and for the minor table 1-2 kb if the break points are stored as single integers. The memory saved is near 50 % in this case, but for larger tables the memory saved will be 100 % or more.

Making a table for all 64 bit primes would need about 5 billion Gb. All 32 bit primes would need about 1 Gb. All primes less than 100,000,000 would need about 10 Mb and would take some time to load (about 5 seconds). For greater values it's more realistic to use other and slower methods, like files with records of primes and their order numbers, with one second computing time gap in between.

First some words that is used:

: prevprime ( numb -- prime )
  dup 3 u< if drop 2 exit then
  1- 1 or 
  begin dup fermat-rabin-miller 0=
  while 2 -
  repeat ;

: ?def ( -- flag )  bl word find nip 0= ;


?def under+ [if]

: under+ ( a b c -- a+c b )  rot + swap ; [then]

: 8/mod ( n -- r q )  dup 7 and swap 3 rshift ;

: 256/mod ( n -- r q )  dup 255 and swap 8 rshift ;
: 65536/mod ( n -- r q )  dup 65535 and swap 16 rshift ;

The idea is to use the sieve of Eratosthenes to create av very fast word prime_ ( n -- flag ) and bit arrays is used.

\ bit arrays

: adbit ( i ad -- j a )  swap 8/mod rot + ;

: setbit \ i ad --

  adbit 1 rot lshift over c@ or swap c! ;

: tglbit \ i ad --

  adbit 1 rot lshift over c@ xor swap c! ;

: clrbit \ i ad --

  adbit 1 rot lshift 255 xor over c@ and swap c! ;

: bit@ \ i ad -- bit

  adbit c@ swap rshift 1 and ;

: bit! \ bit i ad --

  rot if setbit else clrbit then ;

: bit? ( i ad -- f )  bit@ negate ;


\ Defining named bitarrays i.e.
\ 1000000 bitarray megaset constant megasetbuffer
: bitarray \ bits -- ad | n -- bit
  8/mod swap 0= 0= negate + allocate throw
  create dup ,
  does> @ swap 8/mod rot + bit@ ;

For storing numbers in i=2,3,... bytes spaces I use the words 

mb! ( n ad i -- ) 

and 

mb@ ( ad i -- n ) 

\ multiple byte read/write in arrays

variable ebuf
1 ebuf ! ebuf c@ negate
[if] \ little-endian, as in Windows and Android
: mb! ( n ad i -- )  2>r ebuf ! ebuf 2r> cmove ;
: mb@ ( ad i -- n )  ebuf 0 over ! swap cmove ebuf @ ;
[else] \ big-endian, as in MacOS
: mb! ( n ad i -- )  2>r ebuf ! ebuf cell + r@ - 2r> cmove ;
: mb@ ( ad i -- n )  ebuf 0 over ! cell + over - swap cmove ebuf @ ;
[then]


The tables

The method below can be used for any number of primes. Here plim=2^24 and pi_plim=π(plim)=1077871:

\ the sieve of Eratosthenes 

0xfffff constant plim
82025 constant pi_plim

\ 16777215 constant plim   \ Loads in 
\ 1077871 constant pi_plim \ a few seconds

\ 100000000 constant plim  \ 100000000 takes 6 times 
\ 5761455 constant pi_plim \ longer time to load

plim bitarray prime_ constant pbuf
\ prime_ ( n -- flag )  n<plim

: resetsieve ( -- )
  pbuf plim 8/mod swap 0= 0= - + pbuf 
  do true i ! cell +loop ;

: sieve ( -- )

  resetsieve
  0 pbuf clrbit 
  1 pbuf clrbit
  plim sqrtf 1+ 2
  do i pbuf bit@
     if plim 1+ i dup *
        do i pbuf clrbit j +loop
     then 
  loop ; \ takes approximate 3 seconds or less to load

Next improvement of the prime functions:

plim prevprime constant pmax_

: nextprime_ ( numb -- prime )  \ numb<pmax_
  dup 3 u< if drop 2 exit then
  1 or
  begin dup prime_ 0=
  while 2 +
  repeat ;

: prevprime_ ( numb -- prime )
  dup 3 u< if drop 2 exit then
  1- 1 or 
  begin dup prime_ 0=
  while 2 -
  repeat ;

: prime ( n -- flag )

  dup plim u> if prime else prime_ negate then ;

: nextprime ( numb -- prime )

  dup pmax_ u< if nextprime_ else nextprime then ;

: prevprime ( numb -- prime )

  dup plim u< if prevprime_ else prevprime then ;

It takes less than a second to store all primes less than plim=2^24 in the buffer pnrbuf.

plim 65536/mod swap 0= 0= - constant breaknumbers
pi_plim 2* allocate throw constant pnrbuf 
breaknumbers cells allocate throw constant breaks 

: init_pnr ( -- )
  0 pad ! 0 breaks ! 
  1 pi_plim 1+ 1 
  do nextprime_ dup                   \ p p
     65536/mod pad @ = 0=             \ p r flag
     if 1 pad +!                      \ p r
        i 1- pad @ cells breaks + !   \ p r
     then pnrbuf i 1- 2* + 2 mb! 1+   \ p+1
  loop drop ; init_pnr 

: newintbreaks ( i j x -- i' j' )     \ bisection search
  >r 2dup + 2/ dup
  cells breaks + @ r> u>              \ i j k flag 
  if -rot nip else nip then ; 

: pnr@ ( n -- p )                     \ p is n:th prime 

  1- dup >r 2* pnrbuf + 2 mb@         \ n ≤ pi_plim
  breaknumbers 0 
  begin r@ newintbreaks 2dup - 2 u< 
  until rdrop nip 16 lshift + ; 



From n comes the modulo in pnrbuf and bisection search in breaks gives the index i. Then Pn=Pn(mod 2^16)+i*2^16, where 0≤i<breaknumbers.

: newintpnr ( i j x -- i' j' ) 
  >r 2dup + 2/ dup pnr@ r> u>         \ i j k flag 
  if -rot nip else nip then ; 

: pi ( x -- n ) >r pi_plim 1+ 0       \ x ≤ plim

  begin r@ newintpnr 2dup - 2 u<      \ i j flag 
  until rdrop nip ;


A conjecture

A multiplicative function is such that f(ab)=f(a)f(b). All multiplicative functions on the positive integers can be defined by the action on the primes in the prime factorization and any function defined that way is multiplicative. The function that maps primes p on π(p), that is, the nth prime on n, defines a multiplicative function f that maps positive integers on positive integers.

Blue plot f(n) and red plot π(n)
The conjecture is that f(n)≤π(n) except for n=35,49 and there is a proof on mathematics stack exchange but it's unclear if the proof is verified. Testing the conjecture for all n≤plim takes a few minuts if plim=100000000:

\ conjecture: if f:p1^n1*...*pk^nk -> 1^n1*...*k^nk ==>
\ f(n)π(n) for n>49

: func_f ( n -- m )
  oddpart nip pollard# sort 1 swap 0
  do swap pi * loop ;

Since f(Pn)=π(Pn)=n it's enough to test non primes:

: conjecture ( N -- )  1
  do i prime_ 0=
     if i func_f i pi u>
        if cr i .
        then
     then
  loop ;


100000000 conjecture
35
49  ok

Sunday, December 13, 2015

Gaussian integers

The problem with Gaussian integers in Forth is the problem of overflow. It's a minor problem that multiplication of two gintegers might need double word length for the components of the result, but a major problem that the norm (the sum of the square of the components) of gintegers might need double word length, since the components of the result of division of two gintegers are results of division with the norm. 
In Forth there are no standard words for division where the denominator is a double. It's possible to define of course, but this need to be done in assembler to obtain a fair speed. Perhaps it can be done in floating point with 20 decimals precision, but that seems tricky according to portability. Or there might be some other trick?

This first approach to Gaussian integers will be of poor range: just 3 decimals in 32 bit systems and 6 decimals in 64-bit systems. I might be able to double the range in the future but that's uncertain, but I will implement Gaussian integers in big integers, even if there might be problems with speed for big Gaussian integers.

The input of a ginteger is simply two integers on the stack, re im, where the imaginary component is at the top of the stack. But it's nice with a typographical output. The standard output of integers, like the dot, also include spaces, which don't allow desirable outputs like 


10-7i 

In 'Starting Forth' the output formatting procedures are described and the word for printing signed integers without spaces could be like this

: .sign-w/o-space \ n --
  dup 0< swap abs 0 <# #s rot sign #> type ;

which can be used for typographic output of Gaussian integers:

: g. \ a b --
  2dup 0. d= if d. exit then
  swap dup                           \ b a a
  if over 
     if dup .sign-w/o-space
     else dup .
     then
  then swap dup 0< 0=                \ a b f
  if dup \ a b b
     if swap if ." +" then dup 1 >   \ b f
        if .sign-w/o-space else drop then ." i" space
     else 2drop
     then
  else nip dup                       \ b b 
     if dup -1 <                     \ b f
        if .sign-w/o-space else ." -" drop then ." i" space
     else drop
     then
  then ;

Now 1 -1 g. results in 1-i, 0 1 g. in i and 10 7 g. in 10+7i.

And so a word printing the numbers on the stack interpreted as gintegers:

: .gs depth 2 < if exit then 2>r recurse 2r> 2dup g. ;

To drop, swap, rot gintegers one have to use 2drop, 2swap, 2rot and so on.

Arithmetics


: gnorm \ a b -- n 
  dup * swap dup * + ;

: g< \ a b c d -- flag

  gnorm -rot gnorm u> ;

: g+ \ a b c d -- a+c b+d

  under+ under+ ;

: g- \ a b c d -- a-c b-d

  negate under+ negate under+ ;

: g* { a b c d -- ac-bd ad+bc }  \ (a+ib)(c+id)

  a c * b d * - a d * b c * + ;

: g/ { a b c d -- (ac+bd)/(c^2+d^2) (bc-ad)/(c^2+d^2) }
  c d gnorm
  a c * b d * + over /
  b c * a d * - rot / ;

There is a need for a more general interpretation of modulo for gintegers that guaranties that the rest of a division has a smaller norm than the norm of the denominator. This is well explained in this paper.

The main idea is to use a generalized modulo for integers: the rest of n/m should be the smallest non negative integer r such that n=mq±r. This correspond to a generalized modulo for Gaussian integers, where all division of components use "rounded" results instead of the floor:

: /' ( a b -- c ) dup 2/ under+ / ;   \ c=floor((a+b/2)/b)

: g/' { a b c d -- (ac+bd)/'(c^2+d^2) (bc-ad)/'(c^2+d^2) }

  c d gnorm
  a c * b d * + over /'
  b c * a d * - rot /' ;

Thinking about it I found a way to use floats to increase the range:

\ --- alternative for about 50 % greater range -----

: g* { a b c d -- e f } 
  a c m* b d m* d- d>s a d m* b c m* d+ d>s ;

: g/ { a b c d -- e f }
  c dup m* d dup m* d+ d>f
  a c m* b d m* d+ d>f fover f/
  b c m* a d m* d- d>f frot f/
  fswap f>d d>s f>d d>s ;

: g/' { a b c d -- e f }
  c dup m* d dup m* d+ d>f
  a c m* b d m* d+ d>f fover f/ fround
  b c m* a d m* d- d>f frot f/ fround
  fswap f>d d>s f>d d>s ;

\ --------------------------------------------------

Using floats made a test running on my computer about 10 times slower, when I repeated testing gpollard2 (see below). But then I used numbers with random components in the range of 9 decimals, while in the first test a range of 6 decimals. The alternative g* g/ and g/' manage at least 4 decimals in 32-bit systems and 9 decimals in 64-bit systems.

: gmod ( a b c d -- e f )
  2over 2over g/' g* g- ;

: ggcd ( a b c d -- e f ) \ gcd for Gaussian integers

  0 0 loc{ a b c d s t }  \ Euclid's algorithm
  a b c d g<
  if a c to a to c
     b d to b to d
  then
  begin c d or \ while c+id<>0
  while c to s d to t
     a b c d gmod to d to c
     t to b s to a
  repeat a b ;

Of course the divisor is up to multiplication with unit: 1, i, -1, -i. To bring some order and simplicity to the matter I use a normal form: the normal form of a Gaussian integer is the representant where the real component is greater or equal then zero and greater or equal then the imaginary component:

: gnormal ( a b -- c d )  \ c>=d and c>=0
  2dup abs swap abs >
  if 0 1 g* then over 0< 
  if negate swap negate swap then ;

Due to Wolfram Mathworld a Gaussian integer a+ib is a prime if and only if it is satisfying one of the following properties:
1. If both a and b are nonzero then, a+bi is a Gaussian prime iff a^2+b^2 is an ordinary prime;
2. If a=0, then bi is a Gaussian prime iff |b| is an ordinary prime and |b|=3 (mod 4);
3. If b=0, then a is a Gaussian prime iff |a| is an ordinary prime and |a|=3 (mod 4).
: gprime1 \ n --flag
  abs dup prime swap 3 and 3 = and ;

: gprime \ a b -- flag 

  over 0= if nip gprime1 exit then
  dup 0= if drop gprime1 exit then
  gnorm prime ;

Since prime is only defined for single integers, gprime only works properly if a²+b² is a single cell integer.

To print the norm and the number a+ib in a table form:

: .normgp ( a b norm -- )
  cr 8 .r space g. ;

The next word print the norm and the normal form of all primes with a certain norm.

: .gprime loc{ norm -- }
  norm 2 = if 1 1 norm .normgp exit then
  norm sqrtf 1+ 0
  do i 0
     ?do i j gnorm norm =
        if i j gprime 
           if j i norm .normgp then
        then
     loop
  loop ;

: .gps ( n -- ) 1+ 2 do i .gprime loop ;


The word .gps lists all norms and primes on normal form up to numbers with the norm n. 

Note, that if a+ib is a prime then so is (a±ib)i^k, k=0,1,2,3.

Factorization

The Pollard rho factorization method seems to work for Gaussian integers with some minor changes:

: gfunc ( a b x y -- u v )
  2dup g* -1 0 g+ 2swap gmod ;

First I tested with the function z²+1 which don't seems to factorize all Gaussian integers, for example not 4+3i. Then I tried z²-1 and with that function I've tested millions of cases without problem.

: gpollard1 ( a b c d -- p q flag )  \ a+ib not having factor 2
  2dup loc{ a b alpha1 alpha2 beta1 beta2 } 0.
  begin 2drop a b alpha1 alpha2 gfunc to alpha2 to alpha1
     a b 2dup beta1 beta2 gfunc gfunc to beta2 to beta1
     alpha1 alpha2 beta1 beta2 g-
     a b ggcd 2dup a b d=
     if false exit
     then 2dup gnorm 1 = 0=
  until true ;

: gpollard2 ( a b -- p q )
  false loc{ flag }
  2dup gnorm 1 = if exit then
  2dup 2 0 gmod d0= if 2drop 1 1 exit then
  2dup gprime if exit then -1 1
  do i 0
     do 2dup j i gpollard1
        if true to flag leave then 2drop
     loop flag if leave then
  loop 2swap 2drop ;

: gfac ( a b -- p1 q1 ... pk qk )
  2dup gpollard2 2over 2over gnorm -rot gnorm =
  if 2drop exit
  then 2dup 2>r g/ recurse 2r> recurse ;


795 649 2dup g. 795+649i  ok
gfac .gs 8-5i 4+9i -6+5i -1-i  ok
g* g* g* g. 795+649i  ok

7891 9785 2dup g. 7891+9785i  ok
gfac .gs -47-10i 19+184i -1+i  ok
g* g* g. 7891+9785i  ok

Monday, November 23, 2015

Some arithmetical functions

The radical of a natural number is the product of all it's distinct prime factors. An easy way to calculate the radical is to factorize and sort:

: radical1 ( n -- r )
  1 dup loc{ n radical prime }
  n pollard# sort 0 
  do dup prime = 0=
     if dup radical * to radical
     then to prime 
  loop radical ;

But instead of using the local variable radical to build up the product one can use the stack:

: radical2 ( n -- r )
  1 loc{ n prime }
  n pollard# sort 1 swap 0 \ 1 will be at tos in loop
  do over prime =
     if nip
     else over to prime *
     then 
  loop ; 

And if the code don't look for the first a occurrence of a prime factor but for the last occurrence of that factor, then there is no need for any local variables.

: undequ ( a b c -- a b c f )
  >r 2dup = r> swap ;

The word undequ tests if a=b and put a flag on top of stack.

: radical ( n -- r )
  1 swap                \ a non prime dummy at bottom for undequ
  pollard# sort
  1 swap 0              \ 1 will be at tos in loop
  do undequ if nip else * then
  loop nip ;            \ nip drops the dummy

The Euler totient function gives the number of positive coprimes less than the number. Euler's product formula implies that
: totients ( n -- t )
  1 swap                \ a non prime dummy at bottom for undequ
  pollard# sort
  1 swap 0              \ 1 will be at tos in loop
  do undequ if * else swap 1- * then
  loop nip ;            \ nip drops the dummy


Some words that are needed:


: drops ( a1...an n -- )
  0 ?do drop loop ;

: ** ( b e -- b^e )
  dup 0= if 2drop 1 exit then
  1 swap 0 do over * loop nip ;

: m** ( b e -- d )
  dup 0= if 2drop 1. exit then
  1 swap 1. rot 0
  do 2over m*/ loop 2nip ;

: -1** ( n -- i )     \ exponentiation of (-1)
  1 and if -1 else 1 then ;

: unit* ( i j -- k )  \ multiplication of units ±1
  xor 1+ ;

: ufaculty ( u -- u! )
  dup 2 u< if drop 1 exit then
  dup 1- recurse * ;

: umfaculty ( u -- ud )
  dup 2 u< if drop 1. exit then
  dup 1- recurse rot 1 m*/ ;

: oddpart ( a -- i m ) \ m is odd and a=m*2^i
  0 swap
  begin 2/mod swap 0=
  while 1 under+
  repeat 2* 1+ ;

The Möbius function is defined for positive integers and is zero if it contains a squared prime factor; otherwise 1 if the number of prime factors is even and else it is -1.

: mobius ( n -- my )
  dup 2 u< if drop 1 exit then
  1 swap
  pollard# sort
  dup true loc{ facts sqrfree } 0 
  do 2dup =
     if false to sqrfree
        facts i - drops leave 
     then drop
  loop sqrfree
  if facts 1 and
     if -1
     else 1
     then
  else 0
  then nip ;

The word bigomega counts the number of prime factors of a number, and smallomega counts the number of distinct prime factors.

: bigomega ( n -- b )
  dup 2 u< if drop 0 exit then
  pollard# dup >r drops r> ;

: smallomega ( n -- s )
  dup 2 u< if drop 0 exit then
  1 swap
  pollard# sort 0 swap 0
  do undequ 0= if 1+ then nip
  loop nip ;

The Legendre symbol (a/p), where p is an odd prime and a is positive, is defined as zero if p divides a; otherwise 1 if there exist a x with a=x^2(mod p) else -1. 

The Euler criterion implies that (a/p)=a^((p-1)/2)(mod p):

: legendre ( a p -- i )
  tuck dup 1- 2/ swap u**mod dup 1 u>
  if swap - else nip then ;

A generalization is the Jacobi symbol (a/n) defined for all odd numbers n and positive integers a:

: jacobi ( a n -- j )
  1 loc{ a n unit }
  a n ugcd 1 = 0= if 0 exit then 
  begin n 1 = a 1 = or if unit exit then
     a n mod ?dup 0= if 0 exit then
     oddpart to a 1 and
     if n dup * 1- 3 rshift -1** unit unit* to unit 
     then a n ugcd 1 = 0= if 0 exit then
     n a to n to a
     a 1- n 1- * 2 rshift -1** unit unit* to unit
  again ; 

And the Kronecker symbol (a/n) is a further generalization defined for all integers a and n:

: kronecker ( a n -- j )
  0 loc{ a n unit }
  n 0= if a abs 1 = if 1 else 0 then exit then
  n dup abs to n 0<
  if a 0< if -1 else 1 then
  else 1
  then to unit
  a dup abs to a                         \ old_a
  n oddpart to n dup                     \ old_a i i
  if a 1 and 0=                          \ old_a i f
     if 2drop 0 exit                     \ old_a
     else 1 and                          \ old_a f
        if a 7 and 1+ 4 and              \ old_a f
           if unit negate to unit then   \ old_a
        then
     then
  else drop
  then a n jacobi dup 0=                 \ old_a i f
  if nip exit then                       \ old_a
  unit unit* to unit 0<                  \ f
  if n 1- 2/ -1** else 1 then            \ ±1
  unit unit* ;


Thursday, October 29, 2015

Integer factoring

Factoring of large integers is a great challenge in computational mathematics.
When the numbers are very large, no efficient, non-quantum integer factorization algorithm is known; an effort by several researchers concluded in 2009, factoring a 232-digit number (RSA-768), utilizing hundreds of machines over a span of two years. Wikipedia
However, there are efficient and simple algorithms for cell size numbers (about 19 digit numbers in 64 bit systems) that immediately delivers the prime factorization. One such algorithm is Pollard rho method.

: func ( numb n -- m ) dup um* 1 0 d+ rot um/mod drop ; 


: pollard1 ( numb start -- pfac flag )
\ numb is an odd composite 

  dup dup loc{ numb start alpha beta } 0
  begin drop numb alpha func to alpha
     numb dup beta func func to beta
     alpha beta - abs        \ |alpha-beta|
     numb ugcd dup numb =    \ gcd flag 
     if false exit 
     then dup 1 = 0=         \ gcd<>1 
  until true ;

: sqrtf \ m -- n             \ floor

  0 d>f fsqrt f>s ;

: sqrtfslow \ m -- n

  dup 4 u< if dup
  if drop 1 then exit then
  dup 1- >r 1 rshift
  begin r@ over 0 swap um/mod nip
     over + 1 rshift tuck u> 0=
  until r> drop ;

: sqrtc \ m -- n             \ ceiling

  1- sqrtf 1+ ;

: pollard2 \ numb -- prime numb>1
  dup 1 and 0= if drop 2 exit then
  dup prime if exit then
  dup sqrtf dup * over = 
  if sqrtf exit then -1 2    \ numb 100 0
  do dup i pollard1          \ numb pfac flag
     if leave                \ numb pfac
     then drop               \ numb
  loop nip ;                 \ pfac

: pollard \ n -- p1 ... pk      

  dup pollard2 2dup =        \ n q f=
  if drop exit               \ n
  then dup >r 
  0 swap um/mod nip recurse  \ q n/q
  r> recurse ;

4267640728818788929 pollard .s 1234567891 3456789019  ok

12345678987654321 pollard cr .s 
333667 37 3 3 333667 37 3 3  ok

Sometimes the greatest prime factor is of interest and I guess there is no other way to find it than factoring and sorting the stack.

: lower \ m1 ... mn m n+1 -- m1 ... m ... mi n+1  

\ lower m on stack until next number not is greater
  dup 2 =
  if drop 2dup u>
     if swap
     then 2 exit
  then >r 2dup u> 0= 
  if r> exit
  then r> rot >r 
  1- recurse 1+ r> swap ;

: sort \ m1 ... mn n -- s1 ... sn n o(n²)

  dup 3 <
  if dup 2 =
     if drop 2dup u> 
        if swap 
        then 2 
     then exit
  then dup >r
  1- recurse roll 
  r> lower ;

But to use sort one have to know the number of factors to sort.

: pollard# \ numb -- p1 ... pk k
  depth >r
  pollard depth r> - 1+ ;

12345678987654321 pollard# cr .s
333667 37 3 3 333667 37 3 3 8  ok
sort .s 3 3 3 3 37 37 333667 333667 8  ok
over . 333667  ok

Monday, October 26, 2015

Prime number tests

A simple pseudo prime number test is using Fermat's little theorem
for any prime number p and integer a such that 0<a<p. To select a first there is a need for a pseudo random number generator.

variable rnd base @ hex 

: reset_seed  0ABCDEF1 rnd ! ; reset_seed 

: rand ( -- u )
  rnd @ 08088405 * 1+ dup rnd ! ;

: random ( u1 -- u2 ) 
  rand um* nip ;

base ! 

A good thing with fermat is that flag always is true if p is a prime.

: fermat ( numb -- flag ) 
  dup 2 - random 2 + 
  over 1- rot u**mod 1 = ; 

It could therefore be used as a rather fast first primality test to see whether it's necessary to take a closer look.

Rabin-Miller strong pseudoprime test

is an efficient probabilistic algorithm for determining if a given number is prime.
: 2/mod \ n -- r q
  dup 1 and swap 1 rshift ;

: get-rs 0 loc{ numb r -- r s }
  numb 1-
  begin dup 2/mod swap 0=            \ n qout rest=0
  while nip r 1+ to r
  repeat 2drop 
  r numb 1- r rshift ;

: get-a ( numb -- a )

  2 - random 2 + ;

: rabin-miller1 ( numb -- flag )
  dup dup get-rs rot get-a false loc{ numb  r s a flag }
  a s numb u**mod 1 = 
  if true exit
  then false to flag r 0 
  ?do a s i lshift numb u**mod numb 1- = 
     if true to flag leave then
  loop flag ;

Both fermat and rabin-miller1 depends on pseudo random numbers that varies from test to test and are not even combined accurate enough. A repeated test with all odd numbers between 3 and 1000000 gave the series 12 12 7 17 13 of number of errors. A second test series with fermat plus 2 times rabin-miller gave the series 1 1 2 1 3.

Due to OEIS A014233 it is enough to repeat the Rabin-Miller test a few times, depending of the word length, to be sure. Instead of random numbers one use the first prime numbers: 

2 3 5 7 11 13 17 19 23 29 31 37 ...

Repeated Rabin-Miller will reveal all composite numbers less than:

         2027 if a=2
      1373653 if a=2,3
     25326001 if a=2,3,5
   3215031751 if a=2,3,5,7
2152302898747 if a=2,3,5,7,11

Since 2^32 < 2152302898747 a=2,3,5,7,11 suffice for 32 bit integers. For an accurate test of all 64 bit integers it's enough to test for a=2,3,5,7,11,13,17,19,23,29,31,37.

create pnr 
2 c, 3 c, 5 c, 7 c, 11 c, 13 c, 17 c, 19 c, 23 c, 29 c, 31 c, 37 c,

The word create defines the word pnr which leaves an address on tos, the address where the bytes 2 to 37 are allocated.

: rabin-miller2 loc{ numb a | s r flag -- flag } \ odd numb>a>1
  numb get-rs to s to r
  a s numb u**mod 1 =
  if true exit then
  false to flag r 0
  ?do a s i lshift numb u**mod numb 1- =
      if true to flag leave then
  loop flag ;

: rmx ( numb -- n )   \ 32 bit integers
  dup     2047 u< if drop 1 exit then
  dup  1373653 u< if drop 2 exit then
  dup 25326001 u< if drop 3 exit then
    3215031751 u< if 4 else 5 then ;

: rmx ( numb -- n )   \ 64 bit integers
  dup            2047 u< if drop 1 exit then
  dup         1373653 u< if drop 2 exit then
  dup        25326001 u< if drop 3 exit then
  dup      3215031751 u< if drop 4 exit then
  dup   2152302898747 u< if drop 5 exit then
  dup   3474749660383 u< if drop 6 exit then
  dup 341550071728321 u< if drop 8 exit then
  3825123056546413051 u< if 11 else 12 then ;

: rabin-miller ( numb -- flag )
  dup rmx 0
  do dup i pnr + c@ rabin-miller2 0=
     if 0= leave then
  loop 0= 0= ;

Or faster when composite:

: fermat-rabin-miller ( numb -- flag )   \ numb odd
  dup fermat
  if rabin-miller
  else 0=
  then ;

suitable to define 

: nextprime ( numb -- prime )  \ numb unsigned integer
  dup 3 u< if drop 2 exit then
  1 or

  begin dup fermat-rabin-miller 0=
  while 2 +
  repeat ;

: prime ( numb -- flag )
  dup 3 u< if 2 = exit then
  dup 1 and 0= 
  if drop false exit 
  then rabin-miller ;