Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> One file with sorted results.


Copy link to this message
-
Re: One file with sorted results.
You can set different parallel levels at different parts of your script by attaching parallel to the different operations.  For example:

Y = join W by a, X by b parallel 100;
Z = order Y by a parallel 1;
store Z into 'onefile';

If your output is big I would suggest trying out ordering in parallel as well and then using HDFS's cat command in a separate pass to see if it is faster.  It will write twice but it won't flood one reducer with all of the data.

Alan.

On Jul 2, 2012, at 4:59 PM, sonia gehlot wrote:

> Hi Guys,
>
> I have use case, where I need to generate data feed using Pig script. Data
> feed in total is of about 12 GB.
>
> I want Pig script to generate 1 file and data in that data should be sorted
> as well. I know I can run it with one reducer as dataset is big with lot of
> Joins it takes forever to finish.
>
> What are the other options to get one sorted file with better performance.
>
> Thanks in advance,
>
> Sonia