I was having performance problems with make-pl. It took a full second to plan a build distributed across ten directories. This was small compared to the time the build actually took, but it would take the whole second to plan even if it determined that it didn't need to build anything. At first I thought that this sloth was the inevitable result of using an interpreted language instead of a compiled one, but I set out to see if I could squeeze a bit more speed out of Perl.
My first angle of attack was to replace hashes as my primary data structure with arrays, because hashes are somewhat slower than arrays, and are known to be one of the main things that slow scripting languages down. However, this made no noticeable difference in speed.
My next candidate was the heavy use of calls to chdir
, since they require a kernel syscall. So I built a test program that did nothing but changing directories. Then I implemented an optimization whereby when the program went to change directories, it would skip the call to chdir
if the current working directory was the same as the new one. Surprisingly, this optimization made the test program multiple orders of magnitude slower!
It turns out that calling Cwd::cwd
(the portable way to get the current directory in Perl) is far, far slower than calling chdir
. This is because cwd
actually spawns a whole new process to find the current directory. The Cwd
module also provides a function called fastcwd
, with the warning "It might conceivably chdir
you out of a directory that it can't chdir
you back into." Needless to say, even fastcwd
is much slower than a single chdir
. With this new information, I made make-pl use its own variable to keep track of the current directory, reducing the number of calls to cwd
to one. And thus the time it took to plan a build dropped from one second to a tenth of a second.