Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> mapred.min.split.size


Copy link to this message
-
Re: mapred.min.split.size
I've upgraded to pig 0.8 and still not able to correctly set the input split
size.  It still defaults to DFS block size:

here are the params I set via the cmd line:
-Dmapred.min.split.size=512MB -Dpig.maxCombinedSplitSize=512MB
-Dpig.splitCombination=false

I'm starting to wonder if the ChukwaLoader isn't respecting the splits.
Anyone actually got this working?

On Thu, Aug 5, 2010 at 9:34 PM, Corbin Hoenes <[EMAIL PROTECTED]> wrote:

> Thanks guys this is the issue.  Need to move to pig 0.7 and while I'm at it
> upgrade to the latest chukwa.
>
> On Aug 5, 2010, at 6:38 PM, Richard Ding wrote:
>
> > Pig 0.6 implements its own splits (called slice) with size equal to the
> block size. So this explains why the setting doesn't work.
> >
> > Thanks,
> > -Richard
> >
> > -----Original Message-----
> > From: Bill Graham [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, August 05, 2010 5:06 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: mapred.min.split.size
> >
> > FYI, Chukwa support for Pig 0.7.0 was just committed last week:
> >
> > https://issues.apache.org/jira/browse/CHUKWA-495
> >
> > The patch was built on Chukwa 0.4.0, but you could try applying the patch
> > against Chukwa 0.3.0. I don't think the relevant code changed much
> between
> > 3-4.
> >
> >
> > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]>
> wrote:
> >
> >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
> >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
> >> property should work.
> >>
> >> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
> >>
> >> Thanks,
> >> -Richard
> >>
> >> -----Original Message-----
> >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
> >> Sent: Thursday, August 05, 2010 3:50 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: mapred.min.split.size
> >>
> >> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
> >> responsibility to deal with input splits?
> >>
> >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
> >>
> >>> I misunderstood your earlier question. If you have one large file, set
> >> mapred.min.split.size property will help to increase the file split
> size.
> >> Pig will pass system properties to Hadoop. What loader are you using?
> >>>
> >>> Thanks,
> >>> -Richard
> >>>
> >>> -----Original Message-----
> >>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
> >>> Sent: Thursday, August 05, 2010 1:22 PM
> >>> To: [EMAIL PROTECTED]
> >>> Subject: Re: mapred.min.split.size
> >>>
> >>> So what does pig do when I have a 5 gig file?  Does it simply hardcode
> >> the split size to block size?   Is there no way to tell it to just
> operate
> >> on a larger split size?
> >>>
> >>>
> >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> >>>
> >>>> For Pig loaders, each split can have at most one file, doesn't matter
> >> what split size is.
> >>>>
> >>>> You can concatenate the input files before loading them.
> >>>>
> >>>> Thanks,
> >>>> -Richard
> >>>> -----Original Message-----
> >>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
> >>>> Sent: Tuesday, July 27, 2010 2:09 PM
> >>>> To: [EMAIL PROTECTED]
> >>>> Subject: mapred.min.split.size
> >>>>
> >>>> Is there a way to set the mapred.min.split.size property in pig? I set
> >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ
> counter.
> >> My mappers are finishing ~10 secs.  I have ~20,000 of them.
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB