This bounced since I wasn't subscribed. It should have been moderated through...
---------- Forwarded message ----------
From: Owen O'Malley <[EMAIL PROTECTED]>
Date: Fri, Jun 19, 2009 at 10:03 AM
Subject: Re: A simple performance benchmark for Hadoop, Hive and Pig
To: [EMAIL PROTECTED], [EMAIL PROTECTED],
On Thu, Jun 18, 2009 at 9:29 PM, Zheng Shao <[EMAIL PROTECTED]> wrote:
> Yuntao Jia, our intern this summer, did a simple performance benchmark for Hadoop, Hive and Pig based on the queries in the SIGMOD 2009 paper: A Comparison of Approaches to Large-Scale Data Analysis
It should be noted that no one on the Pig team was involved in setting
up the benchmarks and the queries don't follow the Pig cookbook
suggestions for writing efficient queries, so these results should be
considered *extremely* preliminary. Furthermore, I can't see any way
that Hive should be able to beat raw map/reduce, since Hive uses
map/reduce to run the job.
In the future, it would be better to involve the respective
communities (mapreduce-dev and pig-dev) far before pushing benchmark
results out to the user lists. The Hadoop project, which includes all
three subprojects, needs to be a cooperative community that is trying
to build the best software we can. Getting benchmark numbers is good,
but are better done in a collaborative manner.