|
|
-
different mapred.min.split.size within one pig script?
Yang 2012-06-11, 02:06
I need to set mapred.min.split.size for one part of my pig script because the mapper job corresponding to the first part of the script takes much longer time per input record than other parts of the script.
so I have to set the split size very small to take care of that particular script,
but then later parts of the script also used this value and used too many splits,
is it possible to set min.split.size value to different values within the same script?
Thanks Yang
-
Re: different mapred.min.split.size within one pig script?
Alex Rovner 2012-06-12, 22:56
Yes. Use the "set" keyword right before the operation that needs this setting. Since pig will optimize certain statements and collapse them into a single job, you would have to move your statement up a couple instructions in order for it to take effect.
Sent from my iPhone
On Jun 10, 2012, at 10:06 PM, Yang <[EMAIL PROTECTED]> wrote:
> I need to set mapred.min.split.size for one part of my pig script > because the mapper job corresponding to the first part of the script takes > much longer time per input record than other parts of the script. > > so I have to set the split size very small to take care of that particular > script, > > but then later parts of the script also used this value and used too many > splits, > > is it possible to set min.split.size value to different values within the > same script? > > Thanks > Yang
-
Re: different mapred.min.split.size within one pig script?
Yang 2012-06-14, 06:08
thanks,
I tried, but it does not seem to work, even after I put the second set split.size= at the very end of the script, it is the second SET that takes effect for both places i used the SET.
Yang
On Tue, Jun 12, 2012 at 3:56 PM, Alex Rovner <[EMAIL PROTECTED]> wrote:
> Yes. Use the "set" keyword right before the operation that needs this > setting. Since pig will optimize certain statements and collapse them into > a single job, you would have to move your statement up a couple > instructions in order for it to take effect. > > Sent from my iPhone > > On Jun 10, 2012, at 10:06 PM, Yang <[EMAIL PROTECTED]> wrote: > > > I need to set mapred.min.split.size for one part of my pig script > > because the mapper job corresponding to the first part of the script > takes > > much longer time per input record than other parts of the script. > > > > so I have to set the split size very small to take care of that > particular > > script, > > > > but then later parts of the script also used this value and used too many > > splits, > > > > is it possible to set min.split.size value to different values within the > > same script? > > > > Thanks > > Yang >
-
Re: different mapred.min.split.size within one pig script?
Dmitriy Ryaboy 2012-06-15, 07:41
Correct; I don't think there is a good way to do that except perhaps by inserting "exec" statements to separate parts of the script that you need to execute with the different settings.
D
On Wed, Jun 13, 2012 at 11:08 PM, Yang <[EMAIL PROTECTED]> wrote: > thanks, > > I tried, but it does not seem to work, even after I put the second set > split.size= at the very end of the script, > it is the second SET that takes effect for both places i used the SET. > > Yang > > On Tue, Jun 12, 2012 at 3:56 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > >> Yes. Use the "set" keyword right before the operation that needs this >> setting. Since pig will optimize certain statements and collapse them into >> a single job, you would have to move your statement up a couple >> instructions in order for it to take effect. >> >> Sent from my iPhone >> >> On Jun 10, 2012, at 10:06 PM, Yang <[EMAIL PROTECTED]> wrote: >> >> > I need to set mapred.min.split.size for one part of my pig script >> > because the mapper job corresponding to the first part of the script >> takes >> > much longer time per input record than other parts of the script. >> > >> > so I have to set the split size very small to take care of that >> particular >> > script, >> > >> > but then later parts of the script also used this value and used too many >> > splits, >> > >> > is it possible to set min.split.size value to different values within the >> > same script? >> > >> > Thanks >> > Yang >>
|
|