here is(1)a corrected elliptic curve # points(2) remarks on factzn
The Number of Points on an Elliptic Curve Modulo a Prime
by
A.O.L. Atkin
(This is an expanded and improved version of a 9-page ms dated
January 1988. )
0.^ The elliptic curve we consider is
E: y**2=x**3 + A*x +B (modulo P) (0)
where P is a (large) prime,certainly not 2 or 3,and usually bigger than
any auxiliary prime q,where currently 2.le.q.le.311 . We summarize briefly
some routine material.
^1. Matters of Proof.
In the method we finally describe,it is often convenient not to prove
some of the intermediate steps; we arrive at a putative number of points,
certainly correct,but legally speculative.Thus in this section we address
the problem:
Prove that the number of points on (0) is P + 1 + c, where
c is given. (1.1)
First use trial divisors on P + 1 +/- c(simultaneously) up to some
moderate bound B(perhaps 10**7,depending on the size of P). The general area
of interest here is with P from 30 to 100 decimal digits,so after each
(found factor AND 10000 primes)a Selfridge may be performed. This will
(in the worst case-often better with obvious consequences)leave us with
P +1 + c = F1 * U1 , P + 1 -c = F2 * U2 ,
where F1 and F2 are (factorized) products of known primes,and U1 and U2 are
composite numbers with no prime factor less than B. We also have
(F1,U1)=(F1,U2)=(F2,U1)=(F2,U2)=(U1,U2)=1,
of which the last needs a gcd; and we hope that
(F1,F2) is rather small.
Now find + and - twists of E and points (a1 and a2) on them. By the usual
techniques,if necessary repeating with a new point/twist,we expect to show
that there is a subgroup of the group of points on E mod P which
has order F1*X1 on E,where X1 divides U1 and is not 1,and similarly
an order F2*X2 on E' ,the twist of E by (*/ P). If the true numbers
of points on E and E' are P + 1 +/- C,then we have shown that (C-c) is
divisible by lcm(F1,F2) *X1 *X2 . But the absolute value of (C-c) is less
than 4*sqrt(P),so that our c is certainly proved correct when this is
less than our proven divisor. In particular,since X1 and X2 are both
grater than B,we have that
lcm(F1,F2)*B*B . gt. 4*sqrt(P) (1.2)
is sufficient to prove that c is correct.
We can improve on this with rather more work. Let us ostensibly
factorize P by the elliptic curve method,using a first stage with a1**F1,
and a second stage up to B2(.gt.B) . Notice that,since P is prime and we
can identify a match without a gcd,we may use the Shanks baby-giant steps
with sort and match,as in Shanks's quadratic forms factorization method. Using
the modification introduced in SPAR saves a factor of 6 or 7,and it is quite
fast to work up to B2=10**10,and 10**12 only takes a few minutes. This will
either produce a factor of U1,or else prove that X1 is greater than B2.
A similar computation on E' will prove that X2 is greater than B2,and we
can now replace B by B2 in (1.2) above.
In the unfortunate (and unlikely) event that these expedients fail,
we resort to ECM and possibly sieve factorizations of U1 or U2. Factorization
takes considerably longer than the techniques described above. Of course,the
congruences modulis the small primes q described below are in many cases
proved without reservation,and so give rise to further conditions on
(C-c). The difficulty is that knowing (C-c) is,say,0 or 4 mod 13 and
divisible by an UNKOWN large prime,is unhelpful: 0 is all right,but not
4.
2^. The Construction of Modular Equations
2.1.For our present purpose,we take q to be an odd prime,and seek an algebraic
equation connecting j =x**-1 + 744 + 196884*x+... (x=exp(2*pie*i*tau))
and some suitable function f on the subgroup GZ(q) ("gamma sub zero q"). We
prefer this function to have as low a valence v as possible,so that the
equation has degree (q+1) in f and degree v in j. One canonically available
function is f= eta**s(tau)/ eta**s(q*tau) ,where s is the smallest even
integer such that 24 divides s*(q-1). Unfortunately its valence is
unreasonably large when q is not congruent to 1(mod 12). Another less
numerically canonical function is the function of lowest valence v
on GZ*(q) =(GZ(q),W(q)),where W(q) is the involution (0,-1 ; q,0).
This has valence 2v on GZ(q),of course,but v is usually very small,since
GZ*(q) almost always has a Weierstrass point at infinity(I conjecture
that this is so for q.gt.389). My own package has a series of special
purpose routines for different q,but there is one universal method,
involving hypergeometric series and Berlekamp factorization,which gives
total conviction but no proof.
The learned reader will recognize supersingular equations,Hasse
invariants,&c,in what follows,but I shall not allow myself to start explaining
anything.
(for seq: see Weber III or author's 1975 Michigan/Japan talk)
When q=1(mod 4),consider the Gauss hypergeometric series
F(1/12,5/12; 1; 1728/j) ,
(which happens to be the fourth root of E4). Modulo q this series is congruen
t
to H(1/j)*H(1/j**q)*H(1/j**q**2)* .... (for this & more,see Carlitz)
where H has degree d; then j**d *H(1/j) is a monic polynomial SSE(j) of
degree d= approx. q/12 .
When q=3(mod 4),we employ F(7/12,11/12; 1; 1728/j) similarly.
SSE(j) has only nonrepeated linear and quadratic factors. Remove the
linear factors to produce SSE*(j) of degree 2g,where g is the genus of
GZ*(q),and SSE* has precisely g quadratic factors modulo q. We do NOT
wish to find these,however.
It is easy to prove that every function with a pole of order m
at infinity and leading term k*x**-m+...on GZ*(q) is congruent modulo q
to a polynomial in j (i.e.,the corresponding Fourier coefficients are
termwise congruent). This polynomial has the following property: it has
constant remainder(no j-term)when divided by ANY of the quadratic factors
of SSE*. That is to say,the (infinite)ring of such polynomials is precisely
that of the Berlekamp polynomials in Berlekamp's factorization method. Thus we
ostensibly factorize SSE* by B.,but never proceed to find the constants and
the factors;we stop when the Berlekamp matrix has been echelonized,and now the
(columns as I do it)are the polynomials in j(and,though we do not need it
here,the rows *dj are the differentials)modulo q.
There is however one caveat which I have never been able to overcome
(nor,I believe,have Serre and many other people). If one finds that,in
terms of x,the polynomial in j of smallest degree begins(for example)
x**-3(1,1,1,2,2,2,3,3,4,5,6,7,9,10,.....) ,
it is tempting to suppose that one has PROVED that 3 is the smallest
possible valence on GZ*(q). We are in fact sure,and even that 1,1,1,2,..
&c are right over Z,since how could it go 1,1,1+179,2-2*179,...? But
in theory we may have been misled. The least degree function over Z
(over C directly,but deep theorems of Shimura imply over Z)may be
x**-4(179,1,1,1,2,...) and of course a compensating differential
may start x**3(179,1,-1,...). Thus the "mod q gap series"may not
be the true gap series over C at infinity. I have of course checked
numerous cases,and it always has been,but the annoying theoretical
possiblity remains that they may differ. However,it is clear that the
mod q situation is more Weierstrassy than the "true" one,so that when
we have no WP mod q,we have no WP a fortiori over C. In this way(using
the full SSE)one can show that infinity(and hence also 0)is never a WP
of GZ(q).
( There are many other interesting corollaries and issues related to the
supersingular equation,which we defer.)
Now even if we are prepared to forgo proof,we cannot afford to be
wrong. The speculative inference of an exact function
x**-21(1,1,2,2,4,5,8,10,14,....
from its reduction modulo q for,say,q=701,starts off very securely,but
what happens as the presumed terms become larger? Our position is
analogous to that of the tax evader,led step by step to wilder and
more extravagant lies and guesses until finally the IRS gives him a
wrong coefficient. Our remedy should be sought,not from the ineptitudes
of Government,but from the technology of organized crime,in the
process LAUNDR described in the next paragraph: for a small commission
we can have our coefficients laundered.
The reader might wish to try her hand at the following. We can
be sure that the 125 after 681 is really 826,the 287 998,&c,but at
some point discrepancy is inevitable. The'inevitability'of Bach's
next chord given the last chord and the next note of the chorale
does not enable us to write his works ourselves.
Q= 701 FUNCTION X**- 21(1, 1 2 2 4 5 8 10 14 18 25
31 42 52 68 84 108 133 168 205 256 311 384 463 567 681
125 287 489 14 294 607 289 19 540 426 430 524 67 435 296
321 596 387 498 197 302 83 384 474 521 503 618 147 31 259
413 501 153 98 35 11 500 188 335 373 269 208 284 63 481
494 527 344 579 453 142 448 452 469 640 160 279 528 571 333
67 212 309 698 365 358 654 389 687 337 458 382 48 323 246
(Solution at end of this essay)
Assume now that the valence v of our function f is less than
(q+1)/24,as is almost always the case(163 is the biggest
exception.) Then f*eta(tau)*eta(q*tau) is a CUSPform of
weight one and multiplier system (*/q)*some 12-th root of unity ,and
hence by Serre-Deligne of prescribed type and with very small coefficients
after we subtract if needed some constant multiple of eta(tau)*(eta(q*tau
).
These can be identified with certainty(and are almost always based on
definite quadratic forms). Now modulo any of our Chinese primes we can
divide back by our weight one multiplier,to get a laundered or sanitized
function.
Now I have to confess that I do not do this myself. For once one
has used the SSE and the above to identify the function as a ratio of
forms of weight one,the numerators can be readily identified as binary
theta series,and the algorithm expressed in that form. This is also
much faster than computing j modulo q out to several thousand terms;
however the POWMAT and GEGAJ described below take so much longer
relatively that the cost of computing the function is immaterial.
2.2^. The Modular Equation(Star Type) (routine GEGAJ)
We have an algebraic relation
Sigma a(r,s) . f**r . j**s =0
(0.le.s.le.2*v and 0.le.r.le.q+1)
For fixed f,both j and j(q*tau) are among the solutions;for fixed j,
the solutions are f and f((tau+l)/q),l=0,q-1. If we plan most of the
work using j(q*tau),we see that a(r,s) multiplies a power series
with leading term x**-(r+q*s). Two such terms are only equal if
r=r'(mod q). Also the coefficient of f**q as a polynomial in j is
plainly equal to -f +constant +O(terms in x and higher powers),which
enables us to commence our equation as
f**(q+1) + f**q *(-j**v +...+const) + f**(q-1)*...... =0 :
This start achieved,we have r=r' and s=s' in the above,so that
every remaining coefficient can be found by solving echelon-form
using f**r and j(q*tau)**s. For this we need all the series
f**r (1.le.r.le.q+1) up to(at most) q*(2v+1) terms; unfortunately
the succesive use does not enable us to save much on storage here.
POWMAT is a general purpose program which produces all powers
from 1 to k of a given power series up to N terms in each power. It
uses the fast FFTLIB which is part of LMA4064V,and is too long to describ
e here. Basically we find d of size sqrt(k )+a little,and get the powers
from 1 to d and store their FFTs. Then successively the FFTs of f**2d,
f**3d,...,are found,and each is used to find d powers of f using
only one inverse FFT per power. This is often quicker than
the subsequent echelon solving. The latter could be speeded up by
the usual "block"technique,but in the present case that was too much
of a pain to program.
2.3^. The Modular Equation (Eta Type) (routine GEGEN3)
Here we have f=eta**s/eta(q*tau)**s of valence v,say,and j. Again,
the involution shows that q**(s/2) / f and j(q*tau) satisfy the same
relation as f and j; this would enable a"both ends to the middle"
type of echelon solving to be applied. But we chose instead to use the
relation between j and g=f(tau/q): for fixed j the roots are
f((tau+l)/q),l=0,q-1, and q**(s/2)/ f, and we then used routines
for multn/divn by eta series,for powers of sparse coefficient
series,and Newton's formula.
3^. The Use of the Modular Equation
Our indifference above to the precise function used to define GZ(q)
with j will now be explained. Given our original curve E modulo P,
we compute the usual j modulo P as
j = 6912*A**3 / ( 4*A**3 + 27* B**2)
Substitution of this value of j in the modular equation yields an
equation f**(q+1)....=0 of degree q+1 modulo P. (In practice,we
leave the modular equation as a set of equations modulis enough
Chinese primes,and reduce modulo P at this point.)
We are concerned ONLY with the DEGREES of the irreducible factors
of this last equation modulo P,and not with the factors themselves.
More precisely,recalling always that q and P are distinct,we have:
(Definition: the "split" modulo P is the permutation of (q+1) letters
obtained by treating each irreducible factor of degree d as a cycle
of length d;"alternating" means split in A(q+1);"symmetric" means
split not in A(q+1). )
(3.1) If (P/q)=1,the split is alternating.
(3.2) If (P/q)=-1,the split is symmetric.
If the number of points on E mod P is P + 1 + c,and
the roots of t**2 -t*c + P =0 (mod q) are a and b,then:
(3.3) if a=b(or c**2=4*P mod q),then the split is (1q) or (111..11)
(3.3.1) The last case (all factors linear) occurs only when c**2 -4*P
is divisible by q**2.
Now assume a and b are distinct mod q.
(3.4) If a and b are in the field of q elements(so that (c**2-4*P)/q)
=1),then the split is (11rrrr..rr),where rs=q-1 . (r.gt.1)
(3.5) If a and b are not in the field of q elements(so that
(c**2-4*P)/q)=-1),then the split is (rrrrr.rr) with rs=q+1.(r.gt.1)
In the last two cases,r is the EXACT ORDER of (a/b) in the fields
of q,q**2,elements,respectively.
Conversely,splits as prescribed above imply the appropriate
conditions on a and b,and hence on c. Note that we cannot
distinguish +/- c by these criteria,which is to be expected since
we have only used j mod P,and not A and B, and j is invariant by
the twist of E.
The number of possible c mod q remaining is equal
to (Euler)phi(r). This is always less than q/2 and in some cases
we hope to find very small values of r.
Computations with Polynomials Modulo P
For large degree (q+1) and large P(here perhaps up to 100 decimal
digits)we run into serious time. Thus efficiency is important,even
constant improvements like 20%.
Our prime P consists of P(1) 32-bit words starting at P(2).
For degree N,we have three kinds of object,each represented by a
string of N*P(1) words: the monic polynomial x**N+ a(1)*x**(N-1)+..
...+a(N),the same as x**N=a(1)*x**(N-1)+...+a(N),and finally the
expression a(1)*...without the x**N.
Our first task is to form x**P-x modulis (P,f(x)); for this
we do the usual left-to-right square and multiply. "Multiply" is
trivial(only by x). For squaring we construct an auxiliary table
of powers x**N,...x**(2N-2) modulis (P,f),in an (N-1)by N BLNMAT
matrix of P(1)-word numbers. We also add an N-th row(see below),
which requires a total storage of P(1)*N*N words.
To square a(1)*x**(N-1)+...+a(N) we proceed in two stages.
First,we have to square to get a polynomial of degree 2N-2(FFT is
pretty useless here). Assuming that the basic number multiplication
is about twice the squaring time for P(1).gt.4,say,it is beneficial
to regard a(i)*a(j) as ((a(i)+a(j))**2-a**2(i)-a**2(j))/2 ,once N
is over about 10 or so. The same remark applies to the next stage
(essentially forming a vector as vector*matrix): we have to find
N dot products,and at the overhead cost of working out 1 extra
sum of squares(N were already done in BLNMAT),we halve the dot
product time.
At this point,the gcd (x**P-x,f) is trivial,and after
removal of the linear factors we need to contemplate x**(P**a) - x .
We know that at some point(namely a=r)this will suddenly vanish. P
is so large that exponentiation with square and(now long)multiply
would be absurd. We therefore form another matrix BLQMAT which
contains the powers x**P,x**2P,...,x**((N-1)*P),as polynomials of
degree.le.N-1 in x (and again an N-th row for sums of squares.)
This is done by the usual "square and halve and square again"
algorithm for forming a list of powers,when FFT is not relevant,
and when squaring is cheaper than multiplying. Briefly,we compute
powers as follows 1,2,4,8,16,3,6,12,24,5,10,20,7,14,9,18,11,22,
13,15,17,19,21,23,(for N-1=25),with only the 25-th power needing
a multiply. For,e.g.,3= half( (1+2)**2-2-4),17=half((8+9)**2-16-18).
&c. It should be said that all these devices have been programmed,
and do lead in practice to substantial improvements. This is not
computer science.
The auxiliary storage used(which really needs to be in core for
time efficiency)is in an array with a safe counter. When it is clear
that BLQMAT(which needs BLNMAT)would overrun storage,a trap PNOMAT
is set,and BLNMAT is not used,at the cost of greater execution time.
(Note that the storage for computing the modular equation can now be
thrown away. In a perfect world one would store all the modular
equations on disk: in practice we check legally once-for-all that
the equation is correct if we use w Chinese primes,and only store w
for later purposes,recomputing the resiudes.)
5^. Some Examples of the Method before the Sieve.
5.1^. P= 1 87668 40755 53521 23529 ,
E is Y**2=X**3 + 105*X + 78153
q= degree of (x**P-x,f) Possible r Actual r List of c (+/-)
13 1 .... ... 3
17 0 3,9 3 1
19 2 2,6,18 6 3
23 0 2,3,4,6,12 6 5
29 2 4,28 28 1,2,5,7,8,14
31 2 3,5,15 15 9,10,13,14
37 2 2,3,6,9,18 9 5,10,15
41 0 3,7,21 7 6,9,11
For example,when q=37,the second column shows that we are in
case (3.4 ). Since (P/37)=1,we need (rrr..rr)to be alternating,so
that r=4,12,and 36,are impossible. At x**(P**9) we found our running
x**(P**a) was reduced to x,giving r=9; the final c calculation is
trivial. Note that in the cases q=29 and q=31 the value of r was the largest
possible. In theory this can be deduced at the penultimate stage
(when q=31 and a=5 does NOT give x**(P**5)=x,we know r must be 15:
one can even stop after one non zero entry is found as we begin the
a=5 computation.) However,once BLQMAT has been done,the cost of
continuing to the end is relatively small,and any error in any
single excluding list is so disastrous,that we often do so as a check.
Finally notice that as a matter of global strategy,we are
interested in retaining those q and corresponding lists of c for which
phi(r)/q is small. Thus we might very well leave q=29 at a=8,since we
know we do not wish to use the list of values of c even if we double
check it.
5.2^. P = 10**99 + 17**2
E is Y**2 = X**3 + 105* X + 78153
(At present(May 17th,1991)there is no reasonable prospect that our methods
will work for a 100-digit prime. The information modulis q can be
gathered,but the sieve is not feasible.)
For q=211(and in fact v=7 in (2.2))the modular equation has
2333 terms. We need 21 Chinese primes @ 29.5" each,for a total of
10.24',to compute it. It took 14.60' to find X**P-X,with a linear part of
degree 2. At this point
r=3,5,7,15,21,35,and 105, were still possible.
BLQMAT and BLNMAT together would have needed storage beyond the 4096K
available,so that we did BLQMAT without BLNMAT,at some cost in terms of
time,which was 23.02'. In fact r=7 was now found after only 10.54"
more,giving
c= +/- 47,95,105 (mod 211) for total 48' CPU time on an IBM3090.
(With this prime,the largest q for which BLQMAT was possible with BLNMAT
was 197: the timings there were 357" for modular equation,12.71' for
X**P - X,and 460" for BLQMAT. In this case r was 196,so that the check
took substantial time,namely 302").
6^ . Other methods.
Before describing the sieve,sort,and match,we first discuss
what else is available,and what are the criteria for using various method
s.
We continue to regard the operation of our group of points as
multiplication; many parts of the program are the same for quadratic
forms and finite fields,where multiplication is standard.
6.1 ^ Modular forms.
If the modular form corresponding to E taken over Q can be identifi
ed,
then its P-th Fourier coefficient will be -c. Obviously this is an absurd way
of getting one particular value of c for one P,if P is small(under 10**5),and
an impossible way if P is medium or larger. But if one wants the values of c
for all P up to some limit,it is probably the best method when available.
6.2^ Direct Computation of the Character Sum
Although this takes time P,the implied constant is very good,and with
treble differencing of x**3 +a*x +b,double differencing to create a table
of the Legendre symbol,and careful use of all tricks(e.g. on the IBM 3093
one can difference y**2/2 down to zero using BCTR,&c),the time is only
a few milliseconds up to P=10**5.
6.3 ^ Baby-Giant Steps as in Shanks.
In what follows r and s are integers to be suitably chosen
later; r*s should be minimally greater than sqrt(P).
Find a twist +/- of E with an initial point A. Form and"store"the powers
of A from 1 to r. If r*P is small enough,this presents no problem. If
storage becomes tight,the following technique is used(we shall not repeat
thse remarks in section 7,and other analogous places.) We store the
penultimate word of X,where A**r=(X,Y). We thus have an array of r
32-bit words. We now perform a keysort(very fast) on this so as to make
the array in ascending"logical"order(i.e. as unsigned 32-bit integers).
If necessary,the key can be sent off to secondary storage. We now
successively compute A**(P+1+i*r) and match the penultimate word of its
"X" against our sorted list,and similarly A**(P+1-i*r) ,where i runs
from 1 through odd integers up to 2*s-1.(Remember that,using the Weierstra
form,we always know when any power we compute happens to become the group
identity).
If a match is found on the penultimate word,we find r from our key
,and directly recompute A**r for a full comparison of the "X"s. If it is a true
match,then the "Y"s are equal or equal and opposite,and we have
found a nontrivial power of A which is equal to the group identity.One
expects in large cases to find a number of bogus matches; the situation
is similar to the birthday problem(we have 2**32 days in the year,so that
once we have 2**16 or more students in our class two may well have the
same birthday.) However,there are not likely to be enough bogus matches
to cause serious inefficiency. Notice that we prefer the penultimate
word; the last word may have some 2-adic congruences,and the first
word may be small. (If first=penultimate we have only two words in
all,when storage is unlikely to be a problem).
Now P + 1 + c,the order of our group of points,lies between P +1
+ 2*Sqrt(P) and P + 1 - 2*sqrt(P). Provided that r*(2*s) is greater than
or equal to the integer part of 2*sqrt(P),or more or less r*s.ge.sqrt(P),
then every power of A which is equal to the identity,and which lies betwe
en the given bounds,will be found by the above process. In particular,if
we carry out the full search,and find exactly one match,that must be the
group order. It is probably best,however,to leave the sieve once the
first match is found,and to make a quick attempt to prove it correct
by the methods of section 1; if we fail,come back to the sieve. Again,
if the matches are too numerous,and the order of the point A is
disastrously low,we should try again with a new point.
Finally,excerpts of data from different places can often
be combined logically to advantage. For example,we may quickly
find some information on c modulo 60. Now if our first success S
is NOT the true order of the group,then our point has order d where
d divides S,and the possible orders of the group are S (mod d). Als
any number which is S (mod d) in the range over which we have
already searched would have been a success,and it was not; this
plainly restricts d.
(Example: suppose that our first success is P+1 +k*sqrt(P)=S (k.gt.0),
following the algorithm described above. Then we know d.ge.2k*sqrt(P).
If we know that c is odd,and in fact S is odd,the nearest possible
multiple of d is at least 6k*sqrt(P) away. Thus if 6k.gt.k+2,or
k.gt.2/5,we have shown that S is the true order. )
It remains to choose r and s. They are both P**.25
apart from constants,so that the match time is fixed.(a single
match is O(log r). ) The major time is thus the computation
of the powers of A. If we are going to go through the whole sieve,
that time is proportional to r+2*s (For each new power to be
matched,the time to find its (X,Y) is very much greater than the
match itself,so we do not bother with (2+ .01*log r) *s ). This
is minimal when r=2*s approximately. However,we know the distribution
of c/(2*sqrt(P)): if this is cos(theta) with 0.le.theta.le.pie,then
the element probability is sin**2(theta).d(theta). We find that,assuming
there is only one multiple of the order of A in our range(as is usually
the case),the expected value of i where we find it is 4*(2*s)/(3*pie),
in time 4*s/(3*pie),recalling that i runs over odd values only. Thus
our expected effort now becomes r+ 4*s/(3*pie),which is minimal whe
-n r=P**.25 *sqrt(4/3pie). It is certainly worth distinguishing
the case when one intends to leave the sieve after the first success;
it is problematical whether it is worth bothering with the exact
distribution of c; the assumption of uniform distribution leads to
sqrt(1/2) in place of sqrt(4/3pie),and makes the program slightly
easier to write.
6.4^ The Ostensible Factorization of P by ECM.
The method of 6.3 is certain to succeed,but is not feasible once r and
s reach about 10**6,suggesting that P of 25 decimal digits is the limit.
For P between 25 and 35 decimal digits(with decreasing chance of success
as P increases)we may have recourse to the following method. Find a point
A on some twist of E,and raise A to the usual "Large Product of Small
Primes" power L(however,the algorithm should perhaps keep A**(iL/10),
for i =1 through 9,even though this might normally be slightly inefficien
t.)
A**L may itself be the identity. If not,we perform a second stage as in
SPAR using sort and match,and perhaps find a large(often prime)B such tha
t A**(L*B) is the identity. It is now quite easy to disentangle the exact
order of A,which almost always defines the group order uniquely. We
have a second chance(but no more)if we fail the first time,using a
different twist of E with different quadratic character.
6.5 ^ The Method of Schoof.
Schoof's beautiful paper (Math.Comp.1985) is so well known that I do not
here summarize it. It is an effective polynomial time algorithm,and has
many elegant corollaries. I did in fact attempt to implement it in 1987,
and was led to the method of the present paper,but I do not regard
my method as any kind of "implementation" of Schoof. My Chinese
grandfather here is rather Lehmer,with his Delay Line Sieve,who had
partial information modulis an arbitrary number of Chinese primes,
but a problem in its effective application. Naturally the idea of
using Frobenius with auxiliary primes q I learned from Schoof. More
detailed comparisons are best left until after my method has been
fully described.
7 ^ Determination of the number of points on E modulo P.
The method we now describe is almost certain to work(though I cannot
prove this)for P up to about 50 decimal digits. From there up to 65
digits(my current record),the chances of success decrease,or strictly speaking
an excessive time is needed in "unlucky" cases. We suppose that by the
methods described above in section 3,we have found a number of
limitations on c of the form c congruent to (list) modulo q,for various
primes q(perhaps including q=8 or 9 by special purpose programs.)
For practical purposes,we collate this information in the form
c = L3 (modulo M3) (L3 is usually zero)
c= one of LIST1(L1) (modulo M1) (N1 values of L1)
c = one of LIST2(L2) (modulo M2) (N2 values of L2).
(We suppose now w.l.o.m.g. that L3=0). Write
c = M3 * (M1*R2 + M2*R1),
where we shall also suppose that
M1*M2*M3 .gt.4*sqrt(P) .
There certainly exists an R1 with abs(R1).le.M1/2,and (since we have
that abs(c).le.M3*M1*M2/2 )the corresponding R2 has abs(R2).le.M2 .
The congruences for c modulis M1 and M2 are equivalent to congruences for
R1 modulo M1 and R2 modulo M2.
We now find some point A on some twist of E and form A2=A**(M3*m1).
The N2 possible powers A2**R2 are found,and the penultimate words of the
relevant X are stored together with the corresponding values of R2.Finall
y the X-values are sorted by size and a key to the R2 values is retained.
Next we form A1 =A**(M3*M2),and form successively the powers
A**(P+1) * A1**R1, where R1 runs through N1 possible residues modulo M1,
and at each stage we match the penultimate word of X against our sorted
list. When a match is found,the list word is wholly recomputed using the
key and a further check made for a true match on X,which implies a match
on +/-Y. (As in section 6.3,a number of bogus matches is to be expected,
perhaps two or three hundred in a run of several hundred thousand
comparisons,but not enough to cause any serious inefficiency.) Once the
match is found,we have essentially "won",and revert to section 1,or
possibly carry out the search to the end as previously described.
7.1^ More Technical Details .
7.1.2. ^ The Sorted List.
The preference here(for "large" P of 40 or more decimal digits;
everything is quite fast for less)is to make M2 as large as possible
(but of double IBM word size,i.e.,under 2**64)for a given N2,by using
for M2 a product of those q which have very few possible values of c.
In that case the gaps between successive values of R2 are large,and
we speed up the computation of A2**R2 by the following device.
Choose a parameter SIZ such that 2*SIZ**2 exceeds almost all the
successive differences of the R2(SIZ =200 is often suitable). Form a
table of A2,A2**2,...,A2**SIZ,and A2**(2*SIZ*L),L=1,SIZ .
Then almost every power A2**R2 can be computed from the previous one
by two multiplications(of points on the elliptic curve); recall that
division is multiplication by (X,-Y). The very few large gaps are
computed by applying the usual power algorithm.
Since we always have +/-c mod q together in the method,a value R2
occurs if and only if M2-R2 occurs. Thus we apply the above only to
the R2 under M2/2; after that we complete the table at the cost of a
single division(the point A2**M2 having been worked out once for all.)
7.1.2. The Matching List.
We find it expedient here to write M1 as a product M1K1*M1K2,and
to rewrite the typical residue R1 as LISTK1(K1) +M1K1*LISTK2(K2),
where the residues LISTK1 modulo M1K1 form an outer loop,and the
residues LISTK2 modulo M1K2 the inner loop. Both our M1K1 and M1K2
are currently single length,and certainly we require M1K2 to be so.
We prefer there to be a large number of residues modulo M1K2,so that
the possible differences betweeen successive ones are limited in number
to at most 2000.
For different LISTK1 in the outer loop,the precise residues in
the inner loop are different. However,their differences modulo M1K2
form a small set. Thus before commencing the double loop,we prepare a
table of our relevant point raised to the powers of all these differences
.. Then in the inner loop,each successive power is obtained by a single
multiplication of points on the elliptic curve.
7.2 ^ The Global Strategy .
It is now clear that we require a set of values of q ,with their
corresponding numbers of possible c,which satisfy two criteria. First,
the product of all the q must be at least 4*sqrt(P). Second,the product
of the numbers of c for all these q must not exceed the time and storage
constraints of our sort and match. For the sorted list,if we have N2
distinct residues,we need storage of 3*N2 words(list plus double word key
). One might send off the key to secondary storage,but it is a mild
nuisance. At present,N2 of around 300000 is the most that I use.
With the two-tier matching list,the critical constraint is the
time; so long as the inner loop has about 10000 residues it is very
efficient and needs only moderate storage for the powers of differences
table. But the outer loop is there,and the number of residues it
contains is a multiplier of the total time.
The entire program is now automatic except for the
assignment of q between M2,M1(and M1K1 and M1K2). As we proceed
with the computation of q,any q with c=0 is automatically assigned
to M3,and is pure good. Otherwise,we rank the qs by the ratio
(q/ number of c),and as soon as the product of the leading q
exceeds 4*sqrt(P) and the product of the numbers of c is less
than some bound depending on P(with 50 digits,we might carry on
a bit to make a shorter second stage; with 70,we are only too glad
to have found something),and on the storage and time constraints
of our program,we decide to go into the sort and match stage.
At this point,a hand assignment of q is made.
7.3 ^ Prospects.
As things stand,it is quite feasible to do all q up to 200 for a 100-
digit prime in about 4 hours. The apparent restriction of information
(measured as the product of (possible c / all c ) modulo (product q))
is enormous. For every q it is less than 1/2,and usually much better.
But there seems to be no good way of using this information effectively.
In primality proving one can in effect show very quickly for
numerous q that any prime factor of a putative prime is congruent to
one of (q-1)/2 values modulo q,but the information becomes unwieldy
and the Delay Line Sieve to find the least number satisfying the
conditions is needed,and is not effective beyond a certain size. The
situation here is analogous.
Of course Schoof's method does not have this drawback,and
actually knowing c uniquely modulo each q gives an unique answer
as soon as we have enough q. But the price he pays for this is
(in effect)that he must work on the principal congruence subgroup
Gamma(q) versus our GZ(q),and deal with equations of degree q**2
versus our (q+1). The equivalent computer effort that we need to
get up to q=251 would only give at most q=17 for Schoof's
algorithm.
These remarks are in no way intended to detract from a
very remarkable piece of mathematics,merely to point out that with
our programming skills his method does not work(and,by all accounts,
the programming skills of our complement in the world do no better.)
However,a combination of the two methods might yield another
10 decimal digits;we could start off with an unique c modulo
8*9*5*7*11*13*17,and then use the method above with a much bigger M3.
It may thus be hoped that someone will now be enterprising enough
to program Schoof's method up to,say,17; the fact that by themselves
the results would have no practical application need no longer
act as a deterrent.
8. ^ Various appendices and numerical data.
8.1 ^ Examples(q=173 and q=257) of SSE-derived functions,showing the
laundry process. Notice that the two Fourier series have the same
coefficients(though a different leading power of x)for a large number
of terms. Why is this? (Ramanujan worshippers please note what the
master did himself,not the fumblings of his apostolic succession.)
Q= 173 FUNCTION X**- 4(1, 1 1 2 3 3 5 6 8 10 13
16 21 25 31 38 47 56 69 82 99 118 141 166 26 59
102 148 32 92 168 76 1 108 62 28 22 28 68 129 56
8 12 47 143 110 150 71 86 166 12 120 14 16 12 140
122 99 141 45 69 170 103 6 153 162 163 105 136 42 152
83 21 88 154 12 61 98 53 56 76 93 97 86 111 158
169 164 120 82 119 106 17 127 130 0 74 165 46 134 122
(multiply by eta(tau) *eta(173*tau) )
Q= 173 X**( 13/ 4)*(1, 0 -1 0 0 -1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 -1 1 -1 0 0
0 1 0 0 -1 0 -1 0 0 0 0 0 0 0 -1
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 0 0 0 0 -1 0 0 -1
0 0 1 1 0 -1 0 0 0 0 0 0 -1 0 -1
0 -1 0 0 0 -1 0 0 0 0 1
(above in sparse notation: coeff*x**(power)/4)
-1( 21) -1( 33) 1( 57) 1( 81) -1( 113) 1( 117) -1( 121) 1( 137)
-1( 149) -1( 157) -1( 189) 1( 209) 1( 213) 1( 261) 1( 269) -1( 297)
-1( 309) 1( 321) 1( 325) -1( 333) -1( 361) -1( 369) -1( 377) -1( 393)
1( 413)
LAUNDERED 1 1 2 3 3 5 6 8
10 13 16 21 25 31 38 47 56 69
82 99 118 141 166 199 232 275 321 378
438 514 595 693 800 927 1066 1233 1412 1625
1859 2132 2430 2780 3161 3603 4089 4648 5261 5968
6740 7624 8597 9702 10915 12295 13807 15519 17399 19517
21843 24462 27331 30551 34087 38040 42374 47219 52524 58437
64917 72120 80009 88770 98352 108971 120593 133444 147494 163019
179976 198680 219111 241605 266160 293173 322630 354992 390279 428987
471161 517389 567719 622817 682785 748355 819674 897598 982286 1074722
1175150 1284647
(end 173 begin 257)
Q= 257 FUNCTION X**- 6(1, 1 1 2 3 3 5 6 8 10 13
16 21 25 31 38 47 56 69 82 99 118 141 166 199 233
18 65 122 183 2 84 182 34 162 46 214 139 97 78 97
143 240 115 49 31 84 196 144 162 34 1 100 57 181 189
139 12 124 199 56 178 140 180 128 222 63 134 52 55 24
202 243 123 36 235 153 48 175 12 105 217 145 186 197 212
147 79 216 147 173 151 200 240 199 47 93 76 172 215 220
(multiply by eta*eta(257))
Q= 257 X**( 19/ 4)*(1, 0 -1 0 0 -1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0
0 0 0 0 0 0 -1 -1 1 0 0 0 0 1 0
0 -1 0 0 0 1 0 0 0 0 0 -1 0 0 0
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 -1 0 0 -1 0 0 0 0 0 0
0 0 0 0 0 -1 0 0 0 0 0
(sparse notation)
-1( 27) -1( 39) 1( 63) 1( 87) -1( 127) -1( 163) -1( 167) 1( 171)
1( 191) -1( 203) 1( 219) -1( 243) 1( 263) 1( 267) 1( 323) -1( 339)
-1( 351) -1( 399)
LAUNDERED 1 1 2 3 3 5 6 8
10 13 16 21 25 31 38 47 56 69
82 99 118 141 166 199 233 275 322 379
440 516 598 696 805 933 1074 1242 1424 1639
1877 2153 2456 2810 3199 3647 4143 4710 5336 6055
6844 7744 8739 9866 11108 12517 14067 15816 17745 19913
22301 24985 27934 31237 34875 38935 43398 48379 53847 59933
66618 74040 82185 91221 101124 112088 124109 137391 151935 167996
185566 204934 226120 249435 274919 302943 333541 367143 403826 444055
487933 536018 588424 645784 708275 776596 850974 932232 1020623 1117094
1221993 1336363
8.2 Examples of application of method.
8.2.1. Smallest prime of 65 digits(copy of notice in NMBRTHRY net.)
(February 21st,1991)
In early 1988 I found a method for finding the number of points on
an elliptic curve modulo a prime P,and gave an example where P had 44
decimal digits; this was described and distributed in a 9-page memorandum.
Recently,without any essentially new ideas,but applying greater technic
-al competence at each stage of the method,I have found that:
The number of points on the INRIA curve y**2=x**3 + 105*x + 78153
modulo the smallest prime of sixtyfive decimal digits,which is 10**64+57,
is equal to P + 1 + C,where C is MINUS the following
179 72908 11790 97384 74641 25900 78222 .
The method of finding this number leads to its own proof of correctness
(" only one possible number between P + 1 +/- 2*root(P) "),but two more direct
proofs can be obtained by factorizing the numbers of points on the curve and
on its twist modulo P. The latter number is 120 times an easily
provable prime; for the former one must work harder,and I am indebted to
Francois Morain for using his quadratic sieve to factorize it for me(since
at the moment the authorities will not give me enough storage to use SPAR);
the factors are 92,
27 70973 85393 30085 85690 21436 13511,
and
3 92265 16706 26008 12655 88031 99403.
***********************FINIS *************************************
*******************************************
(Q=701 extrapolation solution)
Q= 701 FUNCTION X**- 21(1, 1 2 2 4 5 8 10 14 18 25
31 42 52 68 84 108 133 168 205 256 311 384 463 567 681
125 287 489 14 294 607 289 19 540 426 430 524 67 435 296
321 596 387 498 197 302 83 384 474 521 503 618 147 31 259
413 501 153 98 35 11 500 188 335 373 269 208 284 63 481
494 527 344 579 453 142 448 452 469 640 160 279 528 571 333
67 212 309 698 365 358 654 389 687 337 458 382 48 323 246
**** multiply by eta * eta(701)*******
Q= 701 X**( 33/ 4)*(1, 0 0 -1 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0
0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0
0 -1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 -1 0 0 0 0
0 0 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0
sparse notation
-1( 45) -1( 65) 1( 101) 1( 141) -1( 201) -1( 261) -1( 289) 1( 297)
1( 325) 1( 345) -1( 349) -1( 405) 1( 425)
LAUNDERED 1 2 2 4 5 8 10 14
18 25 31 42 52 68 84 108 133 168
205 256 311 384 463 567 681 826 988 1190
1416 1696 2009 2392 2823 3344 3931 4636 5431 6376
7445 8708 10135 11812 13706 15920 18423 21332 24618 28424
32720 37674 43264 49688 56928 65224 74565 85234 97239 110911
126278 143740 163344 185564 210488 238675 270258 305905 345801 390741
440992 497490 560593 631427 710457 799018 897733 1008180 1131161 1268561
1421396 1591910 1781401 1992521 2226904 2487719 2776994 3098487 3454740 3850201
Berlekamp and Lehmer Factorization of Polynomials Modulo q
A.O.L.Atkin
To simplify our exposition,we shall make some unessential assumptions.
Consider a polynomial f(x) without repeated factors,of degree n,
monic. Our finite field will be the prime field modulo q(prime),where
q is large in that finding roots of an equation mod q by trial and
error is not feasible,but not very large in that storage becomes a
problem. Typically we are interested in "Chinese" size,q just less
than positive single word size,thus under 2^31 and often for convenience
under 2^30 on IBM machines. Our first task is to compute
x^q - x (modulis q,f(x))
which is done by the usual power algorithm left to right. Multiplication
by x is trivial; squaring is best done with an auxiliary table of powers
from x^n to x^(2n-2) as linear combinations of 1,x,x2,...,x^(n-1).
(If multiplication mod q is slower than squaring mod q further considerat
-ions apply; see my Ellpts Essay). Then
gcd(x^q -x ,f(x))
will yield the linear factors of f(x) as a block. If what remains has
degree.gt.3 we proceed further with the leftover part,which we rename as
f(x),of degree n.
We now create a matrix of the powers x^q,x^(2q),...,x^((n-1)*q) ,
expressed as linear combinations of 1,x,x2,...,x^(n-1). f^q requires
only a reduction from the previous computation. We form the remaining
powers in the order(e.g.,for n=37) :: 2,4,8,16,32,3( as (1+2)^2
-2 -4 divided by 2),6,12,24,5( as (2+3)^2 -4 -6 divided by 2),10,20,
7,14,28,9,18,36,11,22,13,26,15,30,17,34,19,21,23,25,27,29,31,33,35,and
finally 37 which needs a multiplication. At this point the methods
diverge substantially,and we describe Berlekamp's first.
It should first be said that much of the original literature
assumes q small(especially 2),and n relatively large; plainly one
must modify some of our algorithms in this case. Berlekamp finds the
nullspace of this (n-1)-by-n matrix,in the form of a basis consisting of
1 and polynomials B1(x),.,Br(x),which nullspace we call the"Berlekamp polynomial
B(x) with the property that B^q = B (modulis q,f(x))and 0.lt.deg(g).lt.n
If r=0,so that the matrix has rank n-1,then f(x) is irreducible.
Otherwise f(x) is the product of (r+1) distinct factors. Observe that
the polynomials B form not merely a vector space,but a ring(with
multiplication defined modulis q,f(x)),and for fixed B(x) and each
factor fi(x) of f(x) there is a constant c such that fi(x) divides
B(x)-c. The product of all (B(x)-c) must be zero modulis q,f(x); but
it is also a polynomial in B of degree.le.r+1(usually=) ,which can be
found directly from an (r+1-by-n) matrix of B,B^2,...B^(r+1),as linear
combinations of 1,x,...x^(n-1). Thus subject to FINDING THE LINEAR
FACTORS OF A POLYNOMIAL MODULO q(to which we return later),we may
choose some B(usually the Bj of least degree) and gcd all its
appropriate (B-c) with f(x). If the number of c is less than (r+1),
we must repeat partially with another B,&c.
For a fair comparison with the other method,one should program
a less theoretically elegant but practically faster algorithm using
"first out",where the matrix is not processed fully to find its rank,
but the first element of the nullspace is taken as B(x). This is especial
ly advantageous when the number of factors is large(in the extreme super-
singular case,they all have degree 2,and this is known); but ad hoc
methods may be needed to PROVE that all the factors found are irreducible
.. In a random case,the miniscule saving in matrix processing is not worth
the bother involved.
In the Lehmer method,we compute successively x^(q^a)-x,for a=2,3,
... For each a,we form the gcd with f(x),which necessarily extracts those
factors of f(x) of precisely degree a. The powers are computed using the
matrix(it is extremely inefficient to use the power algorithm for large
q),which in general we do not modify even when a factor is found.We do
however peel each successive gcd off f(x),and when 2*a exceeds the
degree of the remaining f(x),it must be irreducible. Notice that in our
gcd algorithm,which is likely to proceed as in Euclid with decreasing
degrees,we can stop when our degree falls below a.
The problem remains as to how to separate a block of factors
all of which have the same degree a. This is of infrequent occurrence
in a random factorization,and usually the number of factors is small.
We perform a subsidiary Berlekamp on the block,but with the advantage
that we already know some members of the Berlekamp nullspace,namely the
symmetric functions of x,x^q,x^(q^2),...x^(q^(a-1)),which have already
been computed and (when potentially needed)saved; usually the sum is
sufficient to split the block.
A refinement which I have recently(1988) adopted with some success is
as follows. The time of the gcd is significant compared to the time of
creating the next x^(q^a); thus for some values of a we defer the gcd
and proceed directly to the next power. At the end,when 2*a exceeds
the degree of the remaining f(x),we backtrack and clean up("Coda"). An array
DEGLIS(a) keeps the degree down to which we must take our gcd algorithm
at a(a=-1 conventionally says no gcd). The choice of DEGLIS is a matter
of judgment; an example follows for degree n=101(the default value is
DEGLIS(a)=a,and we exhibit only the other values):
DEGLIS(a)=-1 for a=4,5,6,7,8,9,10,11,13,14,15,17,19,21,23,24,25
DEGLIS(a)=b for (a,b)= (12,4) (16,8) (18,9) (20,5) (22,11)
(26,13) (28,7) (30,15) (34,17) (38,19) (42,21) (46,23) (48,24) (50,25).
Thus three factors of degrees 4,6,and 12,would show up as a block
of degree 22 at a=12; but they are easily separated using the (saved)
x^(q^4)-x and x^(q^6)-x.
In a typical example with n=99 we found factors of degrees 12,9,
and 10,at stages a=12,18,and 20,respectively. The stages were continued
up to a=34,at which point a "logic" subprogram determined what splits
were still possible. For as the program proceeds,the array DEGLIS is
changed; once a=12 is done,then factors of degree 4,6,and 12,are no
longer possible,so we reset DEGLIS(a)=a for a=4,6,12. Now a search
for all a with DEGLIS(a).lt.a may require some actual work,in the shape
of an actual gcd with the saved x^(q^a)-x,or sometimes an inference
from the implied cofactor.
In fact at the coda stage we have a residual polynomial
and a number of remaining a with DEGLIS(a)=-1. Any irreducible
factor of our residual polynomial must have degree one of these a,and
its cofactor must have degree in the semigroup of nonnegative addends of
these a. The second condition is sometimes enough to eliminate cases
without further work; otherwise judicious gcds are performed,and DEGLIS
constantly updated.
In the above example work was required for
a=19,21,23,and 24. Thus compared with the standard method of taking gcds
for each a,we saved gcds for a=4,5,6,7,8,9,10,11,13,14,15,but lost
by carrying degree 9 for 9 stages and degree 10 for 10 stages,in our f(x)
.. It would be foolish to break one's head trying to assess these effects
for a random split of n into degrees of factors of f(x),especially
since they are machine dependent. Far better to run both programs,and
let the machine tell which is better(we give some data at the end).
Whether or not one adopts our modification, it is
certainly inefficient to recompute the matrix for a smaller f(x)
except very near the beginning(I do it only if factors are found at
a=2 or a=3). On balance the saving on gcds seems to exceed the loss
on carrying a larger f(x) for longer in most cases,and of course
for irreducible f(x) the saving is considerable.
Finally,I have found that,even doing all gcds,Lehmer is
faster than Berlekamp. However it is fair to say that an optimal
matrix processing routine in which one proceeds with,say,10-by-10
blocks and does scalar products in the registers before division
by q might speed up the Berlekamp by 20%. A postfinal observation:
since the first common part of the program is 60% or more of the whole
in many cases,the overall saving provided by the modification may
be only 5%. But 5% seems a lot in the State of Illinois.
Obtaining Linear Factors Modulo q .
Suppose without loss of generality that we have f(x) of degree n modulo q
and that it is known that all the factors of f(x) are linear. As before,
we think of q as large,possibly Chinese of 10 digits,primality proving
of 500-1000 digits,factorizing of 100-150 digits,&c. We describe below
the principles; the exact algorithm depends on the precise size of q and
the machine,which influence the relative cost of finding x^(approximately
q) (modulis q,f(x)) and that of performing a gcd with two polynomials
of degree n mod q.
The first task is to find all the small factors of (q-1),where
"small" may be up to 1000 if q has 500 digits. If for example q-1=
4*167*rough and n=9,we would form and store x^rough. Then gcd f(x)
successively with x^(rough*167)-w,for w=i,-1,-i,1 mod q . (The cost
of finding fourth roots mod q is small compared with the cost of
polynomial operations.) With reasonable luck we should now have split
f(x) into at most quadratic pieces,which can be finished using only a
square root algorithm(and our previously found i becomes useful again.)
I have in fact got a special purpose n=3 and n=4 routine,using Tartaglia
and Euler,but these are probably not worth writing in most contexts.
If however there remains an unfactorized residual polynomial,say of
degree 5,it would be a mistake for large q to do what many textbooks
recommend,namely to form y^((q-1)/4) for various translates y=x+small
(some authors actually recommend using (q-1)/2 only!). For this
involves a repeat of the longest part of the algorithm. It is better
to gcd with x^((q-1)/167)-w,where w runs over 167-th roots of unity
modulo q,with various options as to one's action after partial success
(depending on relative cost of square roots & expected remaining gcds).
Obviously whether it is specifically worth using 167 depends on n and q and the
machine; it is a nice exercise in probability. But my observation has been
that the "intuitive" programmer,including myself,does not calculate
this strategy precisely unless forced to do so.
Some Timings
Comparison of Berlekamp and Lehmer(Unmodified)
(Modulo 2^30-35 on the IBM 3090 with previous "slow"processor)
Degree of Polynomial Time in " for Berlekamp Lehmer Split
x^q & matrix ending ending
125 2.63 2.00 1.29 71,34,14,2,2,2
124 2.50 1.88 1.44 102,13,5,3,1
225 14.44 9.08 6.73 153,45,26,1
224 14.48 9.16 8.73 200,13,6,5
350 58.78 45.41 25.78 125,105,83,15,6,3,
3,1
Comparison of Lehmer(unmodified) and Lehmer(modified)
(modulo 2^30-35 on IBM 3090 with fast processor June 1991)
Degree x^q & mat Unmodified Modified Split
170 2.44 1.157 0.834 13,47,110
169 2.37 0.854 0.923 1,31,37,41,59
168 2.30 1.378 1.125 1,1,2,3,3,15,143
167 2.30 0.799 0.798 1,6,7,17,21,31,84
166 2.27 1.360 1.135 29,137
165 2.19 1.186 0.888 1,2,44,118
164 2.17 1.057 0.816 1,2,7,8,10,22,114
Only in one case does the modification lose,when the extremely "unlucky"
decision to have DEGLIS(a)=-1 for 31,37,41, involved that these factors
were all found in the Coda. On balance,it is clearly superior,and there
remains the possibility of improving the actual DEGLIS strategy.