A friend of mine recently sent me a link to an interesting article “Not Your Father’s Von Neumann Machine: A Crash Course in Modern Hardware”.
One of the interesting elements discussed in this article is the relative cost of memory access. An access satisfied from L1 cache (closest to the processor) might take just 3 clock cycles, but an access requiring access to main memory might take 200 clock cycles.
That’s a massive difference in performance.
I began to wonder if locality of reference was impacting on the performance of my Mandelbrot Screensaver. Over the years since the first public release in 2001, my constant quest has been to improve performance - to generate each new Mandelbrot frame in less time.
After a bit of work to rearrange processing to maximize locality of reference, I ran some benchmarks.
Current Public Release (v3.1): Single 1280x1024 frame in 12.7 seconds
Current Development Version: Single 1280x1024 frame in 2.6 seconds.
In a word, WOW!
The new code does the same amount of work as the old, but does it almost 5 time faster.
This implies that over 10 seconds of my original run-time was wait time - time spent by the processor waiting for memory to provide required information.
Clearly, not every application can be sped up to this extent - but it’s a useful thing to know for those rare times when performance is the #1 priority, bar none.