Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - mapred.min.split.size


Copy link to this message
-
Re: mapred.min.split.size
Corbin Hoenes 2011-04-18, 20:40
I've upgraded to pig 0.8 and still not able to correctly set the input split
size.  It still defaults to DFS block size:

here are the params I set via the cmd line:
-Dmapred.min.split.size=512MB -Dpig.maxCombinedSplitSize=512MB
-Dpig.splitCombination=false

I'm starting to wonder if the ChukwaLoader isn't respecting the splits.
Anyone actually got this working?

On Thu, Aug 5, 2010 at 9:34 PM, Corbin Hoenes <[EMAIL PROTECTED]> wrote:

> Thanks guys this is the issue.  Need to move to pig 0.7 and while I'm at it
> upgrade to the latest chukwa.
>
> On Aug 5, 2010, at 6:38 PM, Richard Ding wrote:
>
> > Pig 0.6 implements its own splits (called slice) with size equal to the
> block size. So this explains why the setting doesn't work.
> >
> > Thanks,
> > -Richard
> >
> > -----Original Message-----
> > From: Bill Graham [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, August 05, 2010 5:06 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: mapred.min.split.size
> >
> > FYI, Chukwa support for Pig 0.7.0 was just committed last week:
> >
> > https://issues.apache.org/jira/browse/CHUKWA-495
> >
> > The patch was built on Chukwa 0.4.0, but you could try applying the patch
> > against Chukwa 0.3.0. I don't think the relevant code changed much
> between
> > 3-4.
> >
> >
> > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]>
> wrote:
> >
> >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
> >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
> >> property should work.
> >>
> >> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
> >>
> >> Thanks,
> >> -Richard
> >>
> >> -----Original Message-----
> >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
> >> Sent: Thursday, August 05, 2010 3:50 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: mapred.min.split.size
> >>
> >> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
> >> responsibility to deal with input splits?
> >>
> >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
> >>
> >>> I misunderstood your earlier question. If you have one large file, set
> >> mapred.min.split.size property will help to increase the file split
> size.
> >> Pig will pass system properties to Hadoop. What loader are you using?
> >>>
> >>> Thanks,
> >>> -Richard
> >>>
> >>> -----Original Message-----
> >>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
> >>> Sent: Thursday, August 05, 2010 1:22 PM
> >>> To: [EMAIL PROTECTED]
> >>> Subject: Re: mapred.min.split.size
> >>>
> >>> So what does pig do when I have a 5 gig file?  Does it simply hardcode
> >> the split size to block size?   Is there no way to tell it to just
> operate
> >> on a larger split size?
> >>>
> >>>
> >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> >>>
> >>>> For Pig loaders, each split can have at most one file, doesn't matter
> >> what split size is.
> >>>>
> >>>> You can concatenate the input files before loading them.
> >>>>
> >>>> Thanks,
> >>>> -Richard
> >>>> -----Original Message-----
> >>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
> >>>> Sent: Tuesday, July 27, 2010 2:09 PM
> >>>> To: [EMAIL PROTECTED]
> >>>> Subject: mapred.min.split.size
> >>>>
> >>>> Is there a way to set the mapred.min.split.size property in pig? I set
> >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ
> counter.
> >> My mappers are finishing ~10 secs.  I have ~20,000 of them.
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>