Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Weblog analysis -- Cloudbase vs Hive vs Pig?


Copy link to this message
-
Re: Weblog analysis -- Cloudbase vs Hive vs Pig?
Also see some benchmarks run by the Hive team at
https://issues.apache.org/jira/browse/HIVE-396.

On Thu, Jul 9, 2009 at 9:56 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote:

> see this thread:
>
> http://markmail.org/thread/wzekarj5vpylj3qc
>
> Also, Hive and Pig are both official Apache Hadoop projects with larger
> user/developer communities than Cloudbase (which is GPL2 license as opposed
> to Apache license).
>
> -- amr
>
>
> Saurabh Nanda wrote:
>
>> Hi,
>>
>> Does anyone have any pearls of wisdom around this?
>>
>> I'm spending part of my work time on developing a scalable weblog analysis
>> system running on a 4 to 6 node cluster (standard desktop class machines).
>> I
>> don't have much time to try and benchmark all three tools (Cloudbase,
>> Hive,
>> and Pig) and would really appreciate if someone can give me a heads-up on
>> what to spend my time on. Some specifics:
>>
>> a) Which tool can give me the best performance for this problem? (eg. best
>> use of indexes, data partitioning, etc.)
>> b) Which tool has the most efficient data storage so that I can store more
>> days' worth of data into the cluster for ad-hoc analysis.
>> c) Which tool is more mature and will not crash (for example, the
>> disclaimer
>> on the Hive Wiki really scared me --
>> http://wiki.apache.org/hadoop/Hive/GettingStarted)
>>
>> Thanks,
>> Saurabh.
>>
>>
>