CPU clock and L3 cache for programming

0

I'm interested in how much different hardware choices affect the following use cases:

  • Programming in Python: lots of heavy mathematical computations using numpy arrays
  • Data Applications in Python and Pandas, using several GB of data

I will be able to parallelise a minority of these applications using the Threating module. The logic of the majority will not allow this.

How important are the following two hardware configurations/extensions for my purposes?

  • 2.3 Ghz vs 2.7 Ghz
  • 6mb L3 vs 8mb L3

FooBar

Posted 2014-09-25T01:12:27.380

Reputation: 131

As it stands, this is essentially a purchase recommendation. However If you would generalise the question somewhat - perhaps asking about the advantages of more l3 cache for example, this might end up being more useful to a wider audience. – Journeyman Geek – 2014-09-25T01:29:52.820

Answers

2

Assuming you're talking about a current-generation Macbook or Macbook Pro, the difference in performance between the two models you cited (2.3 GHz w/ 6 MB L3 cache vs. 2.7 GHz w/ 8 MB L3 cache) will make, roughly, between a 2% to 15% difference, depending on the exact workload. It's definitely nothing earth-shattering. But it all depends on how long it takes for your data to calculate. My figure of 2% to 15% comes from what I have read in reviews and benchmarks of modern (Ivy Bridge and Haswell) laptop-scale processors, of the same generation, with different clock rates. Generally, the single-threaded performance difference between the slowest one and the fastest one is around 25% in extremely specific synthetic benchmarks; 10-15% in average cases; and 2% or less in some benchmarks that really don't come close to taxing the single-thread performance at all (or exhibit bottlenecks elsewhere in the system, e.g. I/O).

To use a ridiculous example, if it took 1 million years for your numpy array computations to come through on the 2.3 GHz processor, shaving off 15% would save you 150,000 years -- or about as long as homo sapiens sapiens has been mucking around.

Obviously, if you had a lifespan of, say, 2 million years, shaving off 150k would make a huge difference. You might even be able to run the calculation twice before you land on your death bed.

On the other hand, if your calculations run nearly instantaneously on most modern CPUs, adding 15% performance won't matter much at all. Take for example something like running Google Chrome. Will you notice any perceptible difference at all in the speed at which webpages and videos load, while running a current-gen Macbook with those two different processors? I severely doubt you would be able to perceive the difference at all. But then if you started loading a 24 GB HTML file that took several hours to parse, the difference might start to stack up into measurable time.

In the end, you're either going to sacrifice time, or sacrifice money, when you're doing stuff involving computationally-intensive algorithms that take more than a few milliseconds to complete. If you get the slow processor, you'll wait longer as a consequence of paying less money. If you get the fast processor, you'll pay more as a consequence of not having to wait as long.

allquixotic

Posted 2014-09-25T01:12:27.380

Reputation: 32 256

Could you perhaps distinguish between the two dimensions Herz versus L3 cache? For example, I'm not sure how much effect the 2mb on L3 cache really has, given that my matrices are all of sizes in the hundreds of mb at least: "How much could it potentially matter given that it has to read in from RAM all the time anyways". These percentages are actually quite interesting for me, a simulation takes about 5-6 days on my 2.4 i5, so 10% already would be half a day. – FooBar – 2014-09-25T01:44:40.973

Honestly, if you're running simulations that take that long, I would suggest seriously looking into some kind of server platform, or at least a high-end desktop, to do your computations on. You could communicate with it over ssh or some kind of remote desktop. Laptop CPUs are significantly scaled down compared to what servers can do. You'd do well with an overclocked Core i7-4960X or so. – allquixotic – 2014-09-25T18:51:13.440

For clock speed vs cache, I'd say it depends tremendously on the cache locality of your algorithms. Not sure how configurable python/numpy is, or whether you can directly control the order of data access, but if you are able to get good cache locality and repeatedly bang on the same data over and over, more cache could matter significantly more than more clock speed. On the other hand, if cache locality of your algorithm is poor, the raw amount of instructions per second (which comes from clock speed) would matter more. – allquixotic – 2014-09-25T18:53:49.200

To describe cache locality: imagine two different people eating at two separate buffets, which are laid out as long tables with varied dishes on them. They're both really hungry and will eventually clear out the buffet table. One of them eats 100% of the potatoes before touching any of the other food, while the other takes 1 spoonful of each item on their plate, eats that, and repeats. The guy who eats exclusively one food at a time before clearing it out would have excellent cache locality in his "algorithm". The one who eats a bit of everything has poor cache locality. – allquixotic – 2014-09-25T18:55:30.803