-Re: Changing pig.maxCombinedSplitSize dynamically in single run
Cheolsoo Park 2013-12-02, 04:31
Unfortunately, no. The settings are script-wide. Can you add an order-by
before storing your output and set its parallel to a smaller number? That
will force a reduce phase and combine small files. Of course, it will add
extra MR jobs.
On Sat, Nov 30, 2013 at 9:20 AM, Something Something <
[EMAIL PROTECTED]> wrote:
> Is there a way in Pig to change this configuration
> (pig.maxCombinedSplitSize) at different steps inside the *same* Pig script?
> For example, when I am LOADing the data I want this value to be low so that
> we use the block size effectively & many mappers get triggered. (Otherwise,
> the job takes too long).
> But later when I SPLIT my output, I want split size to be large so we don't
> create 4000 small output files. (SPLIT is a mapper only task).
> Is there a way to accomplish this?