Willus.com Home   |   Archive   |   About  

Willus.com's MinGW/Gnu C Tips

MinGW Tips
Overview
Install Notes
Benchmark
Starter Links
Compile Flags
Fast Math Funcs
DLLs
Globbing
Console
Binary Stdout
Predefined Macros
Linker Map
Show All Tips
 
  Some Fast Math Functions

[Note 2011: My inline math functions are essentially not necessary anymore with recent versions of gcc, but I leave this page up out of historical interest. See Note 3 below.]

When MinGW's pow function became 10x slower in release 3.0 and caused some of my codes which used it heavily to become much slower, I started investigating ways to implement some faster math functions. I first patched the 3.0 pow() function to go back to how it was in 2.0, but then I decided to be more aggressive. The floating point unit in most modern Intel and AMD CPU's (e.g. Pentiums and Athlons) has many built-in transcendental functions such as sine, cosine, arc-tangent, etc. These built-ins are automatically used by the Microsoft C run-time library DLL which MinGW links to by default, but making calls to the DLL typically incurs significant overhead. You can use the header file here to in-line some of these functions for faster performance on Pentiums and Athlons. It requires use of the -ffast-math compile flag. I took some of the code from Chapter 14 (pp. 807-808) of the Art of Assembly Language link below. Note that the exp() and atan2() in-line versions are actually slower on a 64-bit Opteron compile (SuSE Linux 8.0). Also note that these in-line functions do not do any error checking or trapping of any kind.

NOTE! My in-line pow() function now returns correct results if the first argument is zero (Rev 1.01).

NOTE 2! GCC v4.0 will include a more complete set of fast math intrinsics for x87-compatible processors, including fsincos.

NOTE 3! (4-11-2010) I've noticed lately that the difference between my in-lines and the gcc 3.x/4.x defaults depends significantly on what arguments are sent to the functions. Sometimes mine are faster; sometimes the gcc defaults are faster. In general, with gcc 4.x, I've found that only my sincos in-line gives me any benefit over the gcc default on Core 2 processors, and it's not by much.

x87inline.h   |   x87test.c   |   Art of Assembly
In-line Assy How-To   |   In-line Assy Linux Docs   |   Gnu C In-line Assy docs

Results:    PIII   |   P4 Xeon   |   Opteron (32-bit)   |   Opteron (64-bit)


 
MinGW Links
MinGW64 Home
MinGW Home
Users Mail Archive
MinGW Bug List
Gnu C Home

Willus.com Links
Willus.com Home
Win32/64 Compilers
Win32/64 Software
Compiler Benchmark

This page last modified
Saturday, 13-Jun-2020 11:54:52 MDT