Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> benchmark choices

Copy link to this message
Re: benchmark choices
I just read the malstone report.  They report times for a Java version that
is many (5x) times slower than for a streaming implementation.  That single
fact indicates that the Java code is so appallingly bad that this is a very
bad benchmark.

On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout <[EMAIL PROTECTED]>wrote:

> We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
> data and the queries, if not the query generator. There is a Jira issue in
> Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I
> don't remember the issue number offhand.
> -----Original Message-----
> From: Shrinivas Joshi [mailto:[EMAIL PROTECTED]]
> Sent: Friday, February 18, 2011 3:32 PM
> Subject: benchmark choices
> Which workloads are used for serious benchmarking of Hadoop clusters? Do
> you care about any of the following workloads :
> TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
> sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.
> Thanks,
> -Shrinivas