|
|
-
Re: mapred.min.split.sizeCorbin Hoenes 2010-08-06, 03:34
Thanks guys this is the issue. Need to move to pig 0.7 and while I'm at it upgrade to the latest chukwa.
On Aug 5, 2010, at 6:38 PM, Richard Ding wrote: > Pig 0.6 implements its own splits (called slice) with size equal to the block size. So this explains why the setting doesn't work. > > Thanks, > -Richard > > -----Original Message----- > From: Bill Graham [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 5:06 PM > To: [EMAIL PROTECTED] > Subject: Re: mapred.min.split.size > > FYI, Chukwa support for Pig 0.7.0 was just committed last week: > > https://issues.apache.org/jira/browse/CHUKWA-495 > > The patch was built on Chukwa 0.4.0, but you could try applying the patch > against Chukwa 0.3.0. I don't think the relevant code changed much between > 3-4. > > > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]> wrote: > >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size >> property should work. >> >> But from the release date, Chukwa 0.3 seems not on Pig 0.7. >> >> Thanks, >> -Richard >> >> -----Original Message----- >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, August 05, 2010 3:50 PM >> To: [EMAIL PROTECTED] >> Subject: Re: mapred.min.split.size >> >> I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's >> responsibility to deal with input splits? >> >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: >> >>> I misunderstood your earlier question. If you have one large file, set >> mapred.min.split.size property will help to increase the file split size. >> Pig will pass system properties to Hadoop. What loader are you using? >>> >>> Thanks, >>> -Richard >>> >>> -----Original Message----- >>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >>> Sent: Thursday, August 05, 2010 1:22 PM >>> To: [EMAIL PROTECTED] >>> Subject: Re: mapred.min.split.size >>> >>> So what does pig do when I have a 5 gig file? Does it simply hardcode >> the split size to block size? Is there no way to tell it to just operate >> on a larger split size? >>> >>> >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: >>> >>>> For Pig loaders, each split can have at most one file, doesn't matter >> what split size is. >>>> >>>> You can concatenate the input files before loading them. >>>> >>>> Thanks, >>>> -Richard >>>> -----Original Message----- >>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >>>> Sent: Tuesday, July 27, 2010 2:09 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: mapred.min.split.size >>>> >>>> Is there a way to set the mapred.min.split.size property in pig? I set >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. >> My mappers are finishing ~10 secs. I have ~20,000 of them. >>>> >>>> >>>> >>> >> >> |