A friend of mine recently sent me a link to an interesting article “Not Your Father’s Von Neumann Machine: A Crash Course in Modern Hardware”.

One of the interesting elements discussed in this article is the relative cost of memory access. An access satisfied from L1 cache (closest to the processor) might take just 3 clock cycles, but an access requiring access to main memory might take 200 clock cycles.

That’s a massive difference in performance.

I began to wonder if locality of reference was impacting on the performance of my Mandelbrot Screensaver. Over the years since the first public release in 2001, my constant quest has been to improve performance - to generate each new Mandelbrot frame in less time.

After a bit of work to rearrange processing to maximize locality of reference, I ran some benchmarks.

Current Public Release (v3.1): Single 1280x1024 frame in 12.7 seconds

Current Development Version: Single 1280x1024 frame in 2.6 seconds.

In a word, WOW!

The new code does the same amount of work as the old, but does it almost 5 time faster.

This implies that over 10 seconds of my original run-time was wait time - time spent by the processor waiting for memory to provide required information.

Clearly, not every application can be sped up to this extent - but it’s a useful thing to know for those rare times when performance is the #1 priority, bar none.

Comments

blog comments powered by Disqus
Next Post
MappingLists with NHibernate  19 Aug 2009
Prior Post
Data Migration issues  11 Aug 2009
Related Posts
Using Constructors  27 Feb 2023
An Inconvenient API  18 Feb 2023
Method Archetypes  11 Sep 2022
A bash puzzle, solved  02 Jul 2022
A bash puzzle  25 Jun 2022
Improve your troubleshooting by aggregating errors  11 Jun 2022
Improve your troubleshooting by wrapping errors  28 May 2022
Keep your promises  14 May 2022
When are you done?  18 Apr 2022
Fixing GitHub Authentication  28 Nov 2021
Archives
August 2009
2009