Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Changing pig.maxCombinedSplitSize dynamically in single run

Copy link to this message
Re: Changing pig.maxCombinedSplitSize dynamically in single run
Unfortunately, no. The settings are script-wide. Can you add an order-by
before storing your output and set its parallel to a smaller number? That
will force a reduce phase and combine small files. Of course, it will add
extra MR jobs.
On Sat, Nov 30, 2013 at 9:20 AM, Something Something <

> Is there a way in Pig to change this configuration
> (pig.maxCombinedSplitSize) at different steps inside the *same* Pig script?
> For example, when I am LOADing the data I want this value to be low so that
> we use the block size effectively & many mappers get triggered. (Otherwise,
> the job takes too long).
> But later when I SPLIT my output, I want split size to be large so we don't
> create 4000 small output files.  (SPLIT is a mapper only task).
> Is there a way to accomplish this?