Commit 4c22909e authored Jul 31, 1999 by Andy Polyakov

Extra i386+gcc bn_div.c tune-up featuring inline division and saving

the remainder left in %edx. Here is the resulting performance improvement
matrix (improvement as a result of this *and* previous tune-up committed
two days ago). The results were obtained by profiling the "div" part of
the crypto/bn/bnspeed.c.

CPU	BN_div	bn_div_words	overall	comment
------------------------------------------------------------------------
PII	+16%	accumulated by	+2-3%	PII multiplies damn fast! Taking
		inlining		multiplication out of the loop
					didn't make too much difference.
					Eliminating of the multiplication
					involved in remainder calculation
					is the major factor.

Pentium	+45%	accumulated by	+7-9%	mull isn't that fast and replacing
		inlining		multiplications with additions in
					the loop has more visible effect:-)

MIPS	+75%	+12%		+20-25%	In addition to the taking mults
R10000					out of the loop (giving 12% in the
					asm/mips3.s) three mults were
					eliminated in BN_div.

Alpha	+30%	+50%		+10-15%	Same as above. But remember that
EV4					bn_div_words is a C implementation.
					It takes 4 Alpha mults in C to do
					the same thing as 1 MIPS mult in
					assembler does. So the effect (50%)
					is more impressive. But not the
					overall one... Well, if Alpha
					bn_mul_add would be implemented
					in assembler overall improvement
					would be closer to MIPS...

parent 8d85b33e

Show whitespace changes

Inline Side-by-side

Please to comment