Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - One file with sorted results.


+
sonia gehlot 2012-07-02, 23:59
Copy link to this message
-
Re: One file with sorted results.
Alan Gates 2012-07-03, 14:56
You can set different parallel levels at different parts of your script by attaching parallel to the different operations.  For example:

Y = join W by a, X by b parallel 100;
Z = order Y by a parallel 1;
store Z into 'onefile';

If your output is big I would suggest trying out ordering in parallel as well and then using HDFS's cat command in a separate pass to see if it is faster.  It will write twice but it won't flood one reducer with all of the data.

Alan.

On Jul 2, 2012, at 4:59 PM, sonia gehlot wrote:

> Hi Guys,
>
> I have use case, where I need to generate data feed using Pig script. Data
> feed in total is of about 12 GB.
>
> I want Pig script to generate 1 file and data in that data should be sorted
> as well. I know I can run it with one reducer as dataset is big with lot of
> Joins it takes forever to finish.
>
> What are the other options to get one sorted file with better performance.
>
> Thanks in advance,
>
> Sonia
+
sonia gehlot 2012-07-03, 19:18
+
Duckworth, Will 2012-07-03, 01:57