Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Slow Group By operator

Copy link to this message
Re: Slow Group By operator
Hi Benjarmin,

Can you describe which step of group by is slow? Mapper side or reducer

What's your query like? Can you share it? Do you call any algebraic UDF
after group by? I am wondering whether combiner matters in your test.

On Tue, Aug 20, 2013 at 2:27 AM, Benjamin Jakobus <[EMAIL PROTECTED]>wrote:

> Hi all,
> After benchmarking Hive and Pig, I found that the Group By operator in Pig
> is drastically slower that Hive's. I was wondering whether anybody has
> experienced the same? And whether people may have any tips for improving
> the performance of this operation? (Adding a DISTINCT as suggested by an
> earlier post on here doesn't help. I am currently re-running the benchmark
> with LZO compression enabled).
> Regards,
> Ben