Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Changing pig.maxCombinedSplitSize dynamically in single run


Copy link to this message
-
Re: Changing pig.maxCombinedSplitSize dynamically in single run
Cheolsoo Park 2013-12-02, 04:31
Unfortunately, no. The settings are script-wide. Can you add an order-by
before storing your output and set its parallel to a smaller number? That
will force a reduce phase and combine small files. Of course, it will add
extra MR jobs.
On Sat, Nov 30, 2013 at 9:20 AM, Something Something <
[EMAIL PROTECTED]> wrote:

> Is there a way in Pig to change this configuration
> (pig.maxCombinedSplitSize) at different steps inside the *same* Pig script?
>
> For example, when I am LOADing the data I want this value to be low so that
> we use the block size effectively & many mappers get triggered. (Otherwise,
> the job takes too long).
>
> But later when I SPLIT my output, I want split size to be large so we don't
> create 4000 small output files.  (SPLIT is a mapper only task).
>
> Is there a way to accomplish this?
>