Saurabh Nanda 2009-07-10, 04:45
Amr Awadallah 2009-07-10, 04:56
-Re: Weblog analysis -- Cloudbase vs Hive vs Pig?
Jeff Hammerbacher 2009-07-10, 19:06
Also see some benchmarks run by the Hive team at
On Thu, Jul 9, 2009 at 9:56 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote:
> see this thread:
> Also, Hive and Pig are both official Apache Hadoop projects with larger
> user/developer communities than Cloudbase (which is GPL2 license as opposed
> to Apache license).
> -- amr
> Saurabh Nanda wrote:
>> Does anyone have any pearls of wisdom around this?
>> I'm spending part of my work time on developing a scalable weblog analysis
>> system running on a 4 to 6 node cluster (standard desktop class machines).
>> don't have much time to try and benchmark all three tools (Cloudbase,
>> and Pig) and would really appreciate if someone can give me a heads-up on
>> what to spend my time on. Some specifics:
>> a) Which tool can give me the best performance for this problem? (eg. best
>> use of indexes, data partitioning, etc.)
>> b) Which tool has the most efficient data storage so that I can store more
>> days' worth of data into the cluster for ad-hoc analysis.
>> c) Which tool is more mature and will not crash (for example, the
>> on the Hive Wiki really scared me --