Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Changing pig.maxCombinedSplitSize dynamically in single run


Copy link to this message
-
Re: Changing pig.maxCombinedSplitSize dynamically in single run
Unfortunately, no. The settings are script-wide. Can you add an order-by
before storing your output and set its parallel to a smaller number? That
will force a reduce phase and combine small files. Of course, it will add
extra MR jobs.
On Sat, Nov 30, 2013 at 9:20 AM, Something Something <
[EMAIL PROTECTED]> wrote:

> Is there a way in Pig to change this configuration
> (pig.maxCombinedSplitSize) at different steps inside the *same* Pig script?
>
> For example, when I am LOADing the data I want this value to be low so that
> we use the block size effectively & many mappers get triggered. (Otherwise,
> the job takes too long).
>
> But later when I SPLIT my output, I want split size to be large so we don't
> create 4000 small output files.  (SPLIT is a mapper only task).
>
> Is there a way to accomplish this?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB