|
|
| |
Some Fast Math Functions
When MinGW's
pow function became 10x slower in release 3.0 and caused some of my
codes which used it heavily to become much slower, I started investigating
ways to implement some faster math functions. I first patched the 3.0
pow() function to go back to how it was in 2.0, but then I decided to
be more aggressive.
The floating point unit in most modern Intel and AMD CPU's (e.g. Pentiums
and Athlons) has many built-in transcendental functions such as sine,
cosine, arc-tangent, etc. These built-ins are automatically used by
the Microsoft C run-time library DLL which MinGW links to by default,
but making calls to the DLL typically incurs significant overhead.
You can use the header file here to in-line some of these functions
for faster performance on Pentiums and Athlons. It requires use of
the -ffast-math compile flag. I took
some of the code from Chapter 14 (pp. 807-808) of the Art of Assembly
Language link below. Note that the exp() and atan2() in-line versions
are actually slower on a 64-bit Opteron compile (SuSE Linux 8.0).
Also note that these in-line functions do not do any error
checking or trapping of any kind.
NOTE! My in-line pow() function now returns correct
results if the first argument is zero (Rev 1.01).
NOTE 2! GCC
v4.0 will include a more complete set of fast math intrinsics for
x87-compatible processors, including fsincos.
x87inline.h
|
x87test.c
|
Art of Assembly
In-line
Assy How-To
|
In-line
Assy Linux Docs
|
Gnu C In-line Assy docs
Results: PIII
|
P4 Xeon
|
Opteron (32-bit)
|
Opteron (64-bit)
|
|
|
|
|