Skip to content
  1. Aug 08, 1999
  2. Aug 07, 1999
  3. Aug 06, 1999
  4. Aug 05, 1999
  5. Aug 04, 1999
  6. Aug 03, 1999
  7. Aug 02, 1999
  8. Aug 01, 1999
  9. Jul 31, 1999
    • Andy Polyakov's avatar
      Extra i386+gcc bn_div.c tune-up featuring inline division and saving · 4c22909e
      Andy Polyakov authored
      the remainder left in %edx. Here is the resulting performance improvement
      matrix (improvement as a result of this *and* previous tune-up committed
      two days ago). The results were obtained by profiling the "div" part of
      the crypto/bn/bnspeed.c.
      
      CPU	BN_div	bn_div_words	overall	comment
      ------------------------------------------------------------------------
      PII	+16%	accumulated by	+2-3%	PII multiplies damn fast! Taking
      		inlining		multiplication out of the loop
      					didn't make too much difference.
      					Eliminating of the multiplication
      					involved in remainder calculation
      					is the major factor.
      
      Pentium	+45%	accumulated by	+7-9%	mull isn't that fast and replacing
      		inlining		multiplications with additions in
      					the loop has more visible effect:-)
      
      MIPS	+75%	+12%		+20-25%	In addition to the taking mults
      R10000					out of the loop (giving 12% in the
      					asm/mips3.s) three mults were
      					eliminated in BN_div.
      
      Alpha	+30%	+50%		+10-15%	Same as above. But remember that
      EV4					bn_div_words is a C implementation.
      					It takes 4 Alpha mults in C to do
      					the same thing as 1 MIPS mult in
      					assembler does. So the effect (50%)
      					is more impressive. But not the
      					overall one... Well, if Alpha
      					bn_mul_add would be implemented
      					in assembler overall improvement
      					would be closer to MIPS...
      4c22909e
  10. Jul 30, 1999