Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Slow Group By operator


+
Benjamin Jakobus 2013-08-20, 09:27
Copy link to this message
-
Re: Slow Group By operator
Hi Benjarmin,

Can you describe which step of group by is slow? Mapper side or reducer
side?

What's your query like? Can you share it? Do you call any algebraic UDF
after group by? I am wondering whether combiner matters in your test.

Thanks,
Cheolsoo
On Tue, Aug 20, 2013 at 2:27 AM, Benjamin Jakobus <[EMAIL PROTECTED]>wrote:

> Hi all,
>
> After benchmarking Hive and Pig, I found that the Group By operator in Pig
> is drastically slower that Hive's. I was wondering whether anybody has
> experienced the same? And whether people may have any tips for improving
> the performance of this operation? (Adding a DISTINCT as suggested by an
> earlier post on here doesn't help. I am currently re-running the benchmark
> with LZO compression enabled).
>
> Regards,
> Ben
>
+
Benjamin Jakobus 2013-08-21, 10:52
+
Cheolsoo Park 2013-08-22, 00:07
+
Benjamin Jakobus 2013-08-22, 11:01
+
Alan Gates 2013-08-22, 15:38
+
Benjamin Jakobus 2013-08-24, 10:11
+
Cheolsoo Park 2013-08-22, 15:33
+
Benjamin Jakobus 2013-08-24, 10:27
+
Cheolsoo Park 2013-08-25, 01:27
+
Benjamin Jakobus 2013-08-25, 16:11
+
Benjamin Jakobus 2013-08-25, 17:10
+
Cheolsoo Park 2013-08-25, 17:57
+
Benjamin Jakobus 2013-08-25, 18:14
+
Cheolsoo Park 2013-08-25, 19:31
+
Benjamin Jakobus 2013-08-25, 20:01