Mar 062008

I found myself in a lull between storms a few days ago, so I decided to do some housecleaning in our makefile before the next code rush hit.  And therein lies a story…

We use one makefile to build all versions of PicLens for all browsers on all platforms.  It’s loosely patterned after the Google Gears project makefile – all source files are listed in one place, and make recursively invokes itself with different params to work its way through the various phases of the build process – checking dependencies, compiling source to obj, building the install bundles, etc.  We use Gnu make to run the build as it is available on all the platforms we need – and then some!

As the number of PicLens source files has grown rapidly over the past few months our build times have been getting longer and longer, and burning away a lot of our development productivity.  It was time to quit whining about it and just fix it!

One data point that bugged me about our build was CPU utilization – or lack thereof.  We all have multicore CPUs, but during the C++ compile phase the average CPU load was rarely more than one core (50% on dual core, 25% on quad).  This may be partly because gcc and msvc compilers are not aggressively multithreaded, and partly a reflection of the fact that C++ source compilation is largely disk I/O bound rather than CPU bound.

If you’ve got a lot of independent work items to do and spare CPU cores sitting idle, then you need to work on more than one item at a time.  One tantalizing prospect to improve build times was the Gnu make -j option. Make defaults to issuing one operation (invoke the compiler to compile this source file) at a time, and then waits for that to finish before issuing another operation.  With -j, you can tell make to issue multiple operations simultaneously as parallel processes. The idea is, if one instance of your compiler is only making use of one CPU core, then running two instances of the compiler to compile different source files should make use of two CPU cores, and finish compiling the two files in about the time it would take to compile just one file.

We added -j to our makefile for the source compilation phase, and with our first build test on an OSX Mac saw a nice 40% reduction in build time (dual core)! However, when we ran the build test on a Windows PC, we got a snippy little remark from make:

make: Do not specify -j or –jobs if sh.exe is not available.
make: Resetting make for single job mode.

Well!  How rude!

Consulting the Gnu make docs, I found section 5.3.1 Choosing the Shell, conveniently located right next to section 5.4 Parallel Execution. Unfortunately, the Gnu docs state right up front:

On MS-DOS, the `-j‘ option has no effect, since that system doesn’t support multi-processing.

Well that’s a bummer.  And a little confusing, since Windows has been doing multi-processing for a very long time. I chose to ignore the docs as possibly outdated (MS-DOS!  How quaint!) and pressed on.

Choosing the Shell” talks a lot about make wanting a Unix style shell, like sh.exe.  It also talks a lot about how special and delicate the shell selection process is for “MS-DOS and Windows,” including checks for the COMSPEC environment variable.

We have a basket of Unix utils built for Win32 in our toolbox, and one of them is named zsh.exe.  It looks like a Bash shell, it smells like a Bash shell, but it doesn’t fork processes like a Bash shell.  Configuring make to use zsh as its shell sent it crashing to the floor. Hmm.  Not much better with the msys shell, and don’t get me started about cygwin.

I scoured the Internet looking for clues.  Lots of articles about Gnu make and -j, but nothing about how to make it work in Windows.  Then purely by chance, I stumbled into this page at which offered a solution to a different make problem:  set make‘s SHELL environment variable to CMD.EXE, the Win32 shell, to resolve a “CreateProcess() failed” error.


Could it be that simple?

Yes! Telling make to use the Win32 shell magically enables make to fork processes and execute multiple compiler instances in parallel!

Now my question is this: Why is this setting needed at all?  cmd.exe is the default Win32 shell, has been for decades, and it’s the shell from which make is invoked.  Why doesn’t make figure this out on its own?  Having to set the SHELL variable in make seems so ridiculously redundant and deliberately obtuse.  “Hey Mr Make, I’d like you to drive my bus.  Yes, this bus.  The bus you rode in on.  The bus you’re still standing in.  Yes, that bus.”


 I’ll take the 40% faster build times and leave the drama, thanks.

Concurrency Notes

Set the number of concurrent jobs equal to or less than the number of execution cores on your system.  Trying to bake 5 cakes in 4 pans at the same time is just silly.

How many cores do you have?  In Win32, it’s reported in the NUMBER_OF_PROCESSORS environment variable.  make -j $(NUMBER_OF_PROCESSORS) will do the trick (from within the makefile spawning a recursive make). 

Just as with adding multithreading to existing source code, don’t expect your existing makefile logic to work on the first try after adding -j concurrency to your build.  Sequential make can hide implicit dependencies that will break when everything happens at once.

For example: if you use precompiled headers, you know that you should build the precompiled header target before you build any source files that use it.  For a sequential make, it’s enough to just put your precompiled header build instruction in the makefile in front of your source code build instruction.  For parallel make, though, that will simply start the precompiled header compile in one CPU core and spin up a source compile in another core, and they’ll soon collide. 

For parallel make, you need to be more explicit about dependencies and flows than seqential make will let you get by with – you need to tell make that the precompiled header output is a prerequisite of all source files, for example. Make is smart enough to know that it can’t generate A and B concurrently if B depends upon the output of A.

Besides the .PCH, we discovered we had been lax is specifying the dependencies for some of our .RES resource files.  We had an IDL producing a .TLB, and a .RC including the .TLB to produce a .RES.  This worked fine for ages as a sequential process, but ran into trouble when parallel make tried to start compiling the .RC before MIDL was finished generating the TLB from the IDL.  (we have now exhausted our TLA quota)  Fixed by listing the TLB as a prerequisite of the .RES, which it should have been all along anyway.

Be careful about processes that write to shared files.  If A appends a message to log.txt and B appends a message to log.txt, A and B are likely to collide when executed in parallel unless they are careful to use appropriate file locking.

An example of that is compiler error output.  If you’ve got a syntax error in a header file used by A and B, then the compile processes for A and B will both fail and both report the syntax error to stderr.  More often than not, you’ll see the text output of compile process A interlaced letter by letter with B.  When each process is reporting the same error, it looks quite comical – your compiler has acquired a stutter, or perhaps is shivering with cold!

To get coherent error messages, temporarily set your jobs count to one.  Shortcut:  “make NUMBER_OF_PROCESSORS=1”.  Command line arguments take precedence over environment variables.

I’ve heard nebulous stories that msvc++ doesn’t work with parallel make – that outputting debug symbols to a shared .PDB file will result in a corrupted .PDB.  I’ve looked for evidence of such corruption, but after a month of building a project with 10MBs of PDB symbol info I have yet to find any PDB corruption. Consider that a myth busted.