Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Weblog analysis -- Cloudbase vs Hive vs Pig?

Copy link to this message
Re: Weblog analysis -- Cloudbase vs Hive vs Pig?
Also see some benchmarks run by the Hive team at

On Thu, Jul 9, 2009 at 9:56 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote:

> see this thread:
> http://markmail.org/thread/wzekarj5vpylj3qc
> Also, Hive and Pig are both official Apache Hadoop projects with larger
> user/developer communities than Cloudbase (which is GPL2 license as opposed
> to Apache license).
> -- amr
> Saurabh Nanda wrote:
>> Hi,
>> Does anyone have any pearls of wisdom around this?
>> I'm spending part of my work time on developing a scalable weblog analysis
>> system running on a 4 to 6 node cluster (standard desktop class machines).
>> I
>> don't have much time to try and benchmark all three tools (Cloudbase,
>> Hive,
>> and Pig) and would really appreciate if someone can give me a heads-up on
>> what to spend my time on. Some specifics:
>> a) Which tool can give me the best performance for this problem? (eg. best
>> use of indexes, data partitioning, etc.)
>> b) Which tool has the most efficient data storage so that I can store more
>> days' worth of data into the cluster for ad-hoc analysis.
>> c) Which tool is more mature and will not crash (for example, the
>> disclaimer
>> on the Hive Wiki really scared me --
>> http://wiki.apache.org/hadoop/Hive/GettingStarted)
>> Thanks,
>> Saurabh.