Willus.com Home   |   Archive   |   About  

   CONTENTS

I. BACKGROUND
    1. Overview
    2. The Compilers
    3. Compiler Options
    4. The Programs
    5. Test Hardware
    6. Compiler Issues
    7. Other notes

II. RESULTS
    1. BW1D
    2. BZIP2
    3. CRAFTY
    4. K2PDFOPT (v1.30)
    5. LAME
    6. MESHER
    7. MODEL3D
    8. RESIZER
    9. TRANSCEND
    10. X264
    11. AVERAGE

III. SUMMARY

IV. COMMENTS

  
  
  
Willus.com's 2011 Win32/64 C Compiler Benchmarks:
I. BACKGROUND

3. Compiler Options

I tried to use all the options I could for each compiler that maximize the compiled executable performance, including math shortcuts, no frame pointers, etc. I pulled the Intel compiler options from those commonly used by Intel on their own SPEC CPU 2006 results submissions. There are four significant additional options that I wanted to try out (where possible): 1. 32-bit vs. 64-bit compiles, 2. inter-procedural optimizations (IPO--gcc calls this link-time optimization and Microsoft calls it link-time code generation), 3. profiled compiles (e.g. compile the code with profile generation option, run it, and then compile it again using the data generated by the run), and 4. automatic generation of parallel (multi-threaded) code (denoted by // in the results tables). I tried out all of these options in various combinations on Intel and gcc 4.6.3, and I included Microsoft VC++ 2010 on the first two options (64-bit and IPO). I discuss the effects of each of these in the Summary. Some of these options are available on the other compilers (gcc 3.4.2), but I on the rest of the compilers I just did a baseline set of optimization flags.

Click on a column heading to sort the rows based on the data in that column.
Company Version Command-line Options
MinGW (gcc 4.6.3) v4.6.3
Dec 9, 2011
(pre-release)
All Compiles
   -Ofast  maximum optimization level + -ffast-math
   -march=native  generate instructions for the highest instruction set available on the host CPU. Note that generally you just want to use -mtune=native since -march=native may not run on lesser CPUs, but I wanted absolute max performance.
   -fomit-frame-pointer  remove frame pointer for all functions
   -momit-leaf-frame-pointer  don't keep frame pointer in a register for leaf functions
   -Wall  show all compiler warnings
   -std=gnu99  C99 compliance (for x264 only)

Inter-procedural Optimizations (IPO)
   -flto  link-time optimizations (I tried -fwhole-program also, but it had virtually no effect.)

Profiled
   -fprofile-generate/use  generate/use profile data

Multi-threaded
   -fgraphite-identity  enable identity transformation for GRAPHITE
   -floop-interchange  perform loop-interchange transformations on loops
   -floop-block  perform loop-blocking transformations on loops
   -floop-parallelize-all  identify loops that can be parallelized
   -ftree-loop-distribution  perform loop distribution
   -ftree-parallelize-loops
Intel 2011 v12.1.1.259
Oct 11, 2011
All Compiles
   -QxHOST  generate instructions for the highest instruction set available on the host CPU (not used on x264--did not work)
   -O3  max optimization level
   -Qprec-div-  don't improve precision of float divides
   -Qopt-prefetch  enable pre-fetch insertion optimization
   -Qauto-ilp32  shrink 64-bit pointers/longs to 32-bit when safe to do so
   /F1000000000  reserve 1 GB of stack
   -fp:fast=2  most aggressive optimizations on floating point data
   -Qstd=c99  C99 compliance (for x264 only)

Inter-procedural Optimizations (IPO)
   -Qipo  interprocedural optimizations (affects linker and librarian also)

Profiled
   -Qprof-gen/use  generate/use profile data

Multi-threaded
   -Qparallel  enable multi-threaded (parallel) code generation
Tiny CC v0.9.25
May 29, 2009
All Compiles
   (No flags specified)  
Digital Mars v8.52
2004
All Compiles
   -o+all  run optimizer with "all" flag
   -6  generate P6 code
   -mn  Win32 memory model
   -ff  fast in-line 8087 code
Microsoft Visual C/C++ 2010 v16.00
.40219.01
for 80x86/x64
All Compiles
   /Ox  maximum optimizations (includes /Ob2 /Og /Oi /Ot /Oy)
   /GS-  disable security checks
   /fp:fast  "fast" floating point model (less predictable results)
   /Qfast_transcendentals  generate inline FP intrinsics
   /arch:SSE2  enable use of SSE2-enabled CPU (not available on 64-bit compiles!)

Inter-procedural Optimizations (IPO)
   /GL  enable link-time code generation
MinGW (gcc 3.4.2) v3.4.2
Sept 6, 2004
All Compiles
   -O3  maximum optimization level
   -ffast-math  fast floating point algorithms
   -fomit-frame-pointer  remove frame pointer for all functions
   -momit-leaf-frame-pointer  don't keep frame pointer in a register for leaf functions
   -Wall  show all compiler warnings
   -std=gnu99  C99 compliance (for x264 only)


      
  <<  Previous: The Compilers

Next: The Programs   >>

 
This page last modified
Saturday, 22-Sep-2012 08:54:01 MDT