Power issues aside, I've seen similar sorts of performance gains for MR
workloads - around 15-20%.
I think a fair bit of it is due to poor CPU cache utilization in various
parts of Hadoop - hyperthreading gets some extra parallelism there while
the core is waiting on round trips to DRAM.
On Tue, Feb 5, 2013 at 10:03 AM, Brad Sarsfield <[EMAIL PROTECTED]> wrote:
> Hate to say it, but HyperThreading can have either positive or negative
> performance characteristics. It all depends on your workload. You have to
> measure very careful; it may not even be a bottleneck(!) :)
> I hit a pretty significant power issue when I enable HyperThreading at
> multi-thousand node scale. We hit a ~8-10% power utilization increase,
> which, if rolled out to the entire cluster, would put me a few %'ge over
> our max spec power. In this case, for our workload, we actually saw a 15%
> increase in processing throughput / job latency. We ended up literally
> turning off machines and enabling HyperThreading on the remaining and saw
> an overall ~10% efficiency gain in the cluster, with a few less machines,
> but running hot on power.
> -----Original Message-----
> From: Terry Healy [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 5, 2013 7:20 AM
> To: [EMAIL PROTECTED]
> Subject: HyperThreading in TaskTracker nodes?
> I would like to get some opinions / recommendations about the pros and
> cons of enabling HyperThreading on TaskTracker nodes. Presumably memory
> could be an issue, but is there anything to be gained, perhaps because of
> I/O wait? My small cluster is made of relatively slow and old systems,
> which mostly are quite slow to/from disk, if that matters.
Software Engineer, Cloudera