Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> different mapred.min.split.size within one pig script?


Copy link to this message
-
Re: different mapred.min.split.size within one pig script?
Correct; I don't think there is a good way to do that except perhaps
by inserting "exec" statements to separate parts of the script that
you need to execute with the different settings.

D

On Wed, Jun 13, 2012 at 11:08 PM, Yang <[EMAIL PROTECTED]> wrote:
> thanks,
>
> I tried, but it does not seem to work,  even after I put the second set
> split.size= at the very end of the script,
> it is the second SET that takes effect for both places i used the SET.
>
> Yang
>
> On Tue, Jun 12, 2012 at 3:56 PM, Alex Rovner <[EMAIL PROTECTED]> wrote:
>
>> Yes. Use the "set" keyword right before the operation that needs this
>> setting. Since pig will optimize certain statements and collapse them into
>> a single job, you would have to move your statement up a couple
>> instructions in order for it to take effect.
>>
>> Sent from my iPhone
>>
>> On Jun 10, 2012, at 10:06 PM, Yang <[EMAIL PROTECTED]> wrote:
>>
>> > I need to set mapred.min.split.size for one part of my pig script
>> > because the mapper job corresponding to the first part of the script
>> takes
>> > much longer time per input record than other parts of the script.
>> >
>> > so I have to set the split size very small to take care of that
>> particular
>> > script,
>> >
>> > but then later parts of the script also used this value and used too many
>> > splits,
>> >
>> > is it possible to set min.split.size value to different values within the
>> > same script?
>> >
>> > Thanks
>> > Yang
>>