|
|
+
Saurabh Nanda 2009-07-10, 04:45
+
Amr Awadallah 2009-07-10, 04:56
-
Re: Weblog analysis -- Cloudbase vs Hive vs Pig?Jeff Hammerbacher 2009-07-10, 19:06
Also see some benchmarks run by the Hive team at
https://issues.apache.org/jira/browse/HIVE-396. On Thu, Jul 9, 2009 at 9:56 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > see this thread: > > http://markmail.org/thread/wzekarj5vpylj3qc > > Also, Hive and Pig are both official Apache Hadoop projects with larger > user/developer communities than Cloudbase (which is GPL2 license as opposed > to Apache license). > > -- amr > > > Saurabh Nanda wrote: > >> Hi, >> >> Does anyone have any pearls of wisdom around this? >> >> I'm spending part of my work time on developing a scalable weblog analysis >> system running on a 4 to 6 node cluster (standard desktop class machines). >> I >> don't have much time to try and benchmark all three tools (Cloudbase, >> Hive, >> and Pig) and would really appreciate if someone can give me a heads-up on >> what to spend my time on. Some specifics: >> >> a) Which tool can give me the best performance for this problem? (eg. best >> use of indexes, data partitioning, etc.) >> b) Which tool has the most efficient data storage so that I can store more >> days' worth of data into the cluster for ad-hoc analysis. >> c) Which tool is more mature and will not crash (for example, the >> disclaimer >> on the Hive Wiki really scared me -- >> http://wiki.apache.org/hadoop/Hive/GettingStarted) >> >> Thanks, >> Saurabh. >> >> > |