Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> different mapred.min.split.size within one pig script?

Copy link to this message
Re: different mapred.min.split.size within one pig script?
Correct; I don't think there is a good way to do that except perhaps
by inserting "exec" statements to separate parts of the script that
you need to execute with the different settings.


On Wed, Jun 13, 2012 at 11:08 PM, Yang <[EMAIL PROTECTED]> wrote:
> thanks,
> I tried, but it does not seem to work,  even after I put the second set
> split.size= at the very end of the script,
> it is the second SET that takes effect for both places i used the SET.
> Yang
> On Tue, Jun 12, 2012 at 3:56 PM, Alex Rovner <[EMAIL PROTECTED]> wrote:
>> Yes. Use the "set" keyword right before the operation that needs this
>> setting. Since pig will optimize certain statements and collapse them into
>> a single job, you would have to move your statement up a couple
>> instructions in order for it to take effect.
>> Sent from my iPhone
>> On Jun 10, 2012, at 10:06 PM, Yang <[EMAIL PROTECTED]> wrote:
>> > I need to set mapred.min.split.size for one part of my pig script
>> > because the mapper job corresponding to the first part of the script
>> takes
>> > much longer time per input record than other parts of the script.
>> >
>> > so I have to set the split size very small to take care of that
>> particular
>> > script,
>> >
>> > but then later parts of the script also used this value and used too many
>> > splits,
>> >
>> > is it possible to set min.split.size value to different values within the
>> > same script?
>> >
>> > Thanks
>> > Yang