1. 08 Aug, 1999 12 commits
  2. 07 Aug, 1999 3 commits
  3. 06 Aug, 1999 3 commits
  4. 05 Aug, 1999 5 commits
  5. 04 Aug, 1999 1 commit
  6. 03 Aug, 1999 6 commits
  7. 02 Aug, 1999 7 commits
  8. 01 Aug, 1999 2 commits
  9. 31 Jul, 1999 1 commit
    • Andy Polyakov's avatar
      Extra i386+gcc bn_div.c tune-up featuring inline division and saving · 4c22909e
      Andy Polyakov authored
      the remainder left in %edx. Here is the resulting performance improvement
      matrix (improvement as a result of this *and* previous tune-up committed
      two days ago). The results were obtained by profiling the "div" part of
      the crypto/bn/bnspeed.c.
      
      CPU	BN_div	bn_div_words	overall	comment
      ------------------------------------------------------------------------
      PII	+16%	accumulated by	+2-3%	PII multiplies damn fast! Taking
      		inlining		multiplication out of the loop
      					didn't make too much difference.
      					Eliminating of the multiplication
      					involved in remainder calculation
      					is the major factor.
      
      Pentium	+45%	accumulated by	+7-9%	mull isn't that fast and replacing
      		inlining		multiplications with additions in
      					the loop has more visible effect:-)
      
      MIPS	+75%	+12%		+20-25%	In addition to the taking mults
      R10000					out of the loop (giving 12% in the
      					asm/mips3.s) three mults were
      					eliminated in BN_div.
      
      Alpha	+30%	+50%		+10-15%	Same as above. But remember that
      EV4					bn_div_words is a C implementation.
      					It takes 4 Alpha mults in C to do
      					the same thing as 1 MIPS mult in
      					assembler does. So the effect (50%)
      					is more impressive. But not the
      					overall one... Well, if Alpha
      					bn_mul_add would be implemented
      					in assembler overall improvement
      					would be closer to MIPS...
      4c22909e