Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> One file with sorted results.

sonia gehlot 2012-07-02, 23:59
Copy link to this message
Re: One file with sorted results.
You can set different parallel levels at different parts of your script by attaching parallel to the different operations.  For example:

Y = join W by a, X by b parallel 100;
Z = order Y by a parallel 1;
store Z into 'onefile';

If your output is big I would suggest trying out ordering in parallel as well and then using HDFS's cat command in a separate pass to see if it is faster.  It will write twice but it won't flood one reducer with all of the data.


On Jul 2, 2012, at 4:59 PM, sonia gehlot wrote:

> Hi Guys,
> I have use case, where I need to generate data feed using Pig script. Data
> feed in total is of about 12 GB.
> I want Pig script to generate 1 file and data in that data should be sorted
> as well. I know I can run it with one reducer as dataset is big with lot of
> Joins it takes forever to finish.
> What are the other options to get one sorted file with better performance.
> Thanks in advance,
> Sonia
sonia gehlot 2012-07-03, 19:18
Duckworth, Will 2012-07-03, 01:57