|
Corbin Hoenes
2010-07-27, 21:09
Richard Ding
2010-07-27, 21:41
Corbin Hoenes
2010-08-05, 20:21
Richard Ding
2010-08-05, 22:14
Corbin Hoenes
2010-08-05, 22:50
Richard Ding
2010-08-05, 23:40
Bill Graham
2010-08-06, 00:06
Richard Ding
2010-08-06, 00:38
Corbin Hoenes
2010-08-06, 03:34
Corbin Hoenes
2011-04-18, 20:40
|
-
mapred.min.split.sizeCorbin Hoenes 2010-07-27, 21:09
Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of them.
-
RE: mapred.min.split.sizeRichard Ding 2010-07-27, 21:41
For Pig loaders, each split can have at most one file, doesn't matter what split size is.
You can concatenate the input files before loading them. Thanks, -Richard -----Original Message----- From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 27, 2010 2:09 PM To: [EMAIL PROTECTED] Subject: mapred.min.split.size Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of them.
-
Re: mapred.min.split.sizeCorbin Hoenes 2010-08-05, 20:21
So what does pig do when I have a 5 gig file? Does it simply hardcode the split size to block size? Is there no way to tell it to just operate on a larger split size?
On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > For Pig loaders, each split can have at most one file, doesn't matter what split size is. > > You can concatenate the input files before loading them. > > Thanks, > -Richard > -----Original Message----- > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 27, 2010 2:09 PM > To: [EMAIL PROTECTED] > Subject: mapred.min.split.size > > Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of them. > > >
-
RE: mapred.min.split.sizeRichard Ding 2010-08-05, 22:14
I misunderstood your earlier question. If you have one large file, set mapred.min.split.size property will help to increase the file split size. Pig will pass system properties to Hadoop. What loader are you using?
Thanks, -Richard -----Original Message----- From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 05, 2010 1:22 PM To: [EMAIL PROTECTED] Subject: Re: mapred.min.split.size So what does pig do when I have a 5 gig file? Does it simply hardcode the split size to block size? Is there no way to tell it to just operate on a larger split size? On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > For Pig loaders, each split can have at most one file, doesn't matter what split size is. > > You can concatenate the input files before loading them. > > Thanks, > -Richard > -----Original Message----- > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 27, 2010 2:09 PM > To: [EMAIL PROTECTED] > Subject: mapred.min.split.size > > Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of them. > > >
-
Re: mapred.min.split.sizeCorbin Hoenes 2010-08-05, 22:50
I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's responsibility to deal with input splits?
On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: > I misunderstood your earlier question. If you have one large file, set mapred.min.split.size property will help to increase the file split size. Pig will pass system properties to Hadoop. What loader are you using? > > Thanks, > -Richard > > -----Original Message----- > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 1:22 PM > To: [EMAIL PROTECTED] > Subject: Re: mapred.min.split.size > > So what does pig do when I have a 5 gig file? Does it simply hardcode the split size to block size? Is there no way to tell it to just operate on a larger split size? > > > On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > >> For Pig loaders, each split can have at most one file, doesn't matter what split size is. >> >> You can concatenate the input files before loading them. >> >> Thanks, >> -Richard >> -----Original Message----- >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, July 27, 2010 2:09 PM >> To: [EMAIL PROTECTED] >> Subject: mapred.min.split.size >> >> Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of them. >> >> >> >
-
RE: mapred.min.split.sizeRichard Ding 2010-08-05, 23:40
What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses Hadoop FileInputFormat to generate splits so the mapred.min.split.size property should work.
But from the release date, Chukwa 0.3 seems not on Pig 0.7. Thanks, -Richard -----Original Message----- From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 05, 2010 3:50 PM To: [EMAIL PROTECTED] Subject: Re: mapred.min.split.size I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's responsibility to deal with input splits? On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: > I misunderstood your earlier question. If you have one large file, set mapred.min.split.size property will help to increase the file split size. Pig will pass system properties to Hadoop. What loader are you using? > > Thanks, > -Richard > > -----Original Message----- > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 1:22 PM > To: [EMAIL PROTECTED] > Subject: Re: mapred.min.split.size > > So what does pig do when I have a 5 gig file? Does it simply hardcode the split size to block size? Is there no way to tell it to just operate on a larger split size? > > > On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > >> For Pig loaders, each split can have at most one file, doesn't matter what split size is. >> >> You can concatenate the input files before loading them. >> >> Thanks, >> -Richard >> -----Original Message----- >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, July 27, 2010 2:09 PM >> To: [EMAIL PROTECTED] >> Subject: mapred.min.split.size >> >> Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of them. >> >> >> >
-
Re: mapred.min.split.sizeBill Graham 2010-08-06, 00:06
FYI, Chukwa support for Pig 0.7.0 was just committed last week:
https://issues.apache.org/jira/browse/CHUKWA-495 The patch was built on Chukwa 0.4.0, but you could try applying the patch against Chukwa 0.3.0. I don't think the relevant code changed much between 3-4. On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]> wrote: > What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses > Hadoop FileInputFormat to generate splits so the mapred.min.split.size > property should work. > > But from the release date, Chukwa 0.3 seems not on Pig 0.7. > > Thanks, > -Richard > > -----Original Message----- > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 3:50 PM > To: [EMAIL PROTECTED] > Subject: Re: mapred.min.split.size > > I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's > responsibility to deal with input splits? > > On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: > > > I misunderstood your earlier question. If you have one large file, set > mapred.min.split.size property will help to increase the file split size. > Pig will pass system properties to Hadoop. What loader are you using? > > > > Thanks, > > -Richard > > > > -----Original Message----- > > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, August 05, 2010 1:22 PM > > To: [EMAIL PROTECTED] > > Subject: Re: mapred.min.split.size > > > > So what does pig do when I have a 5 gig file? Does it simply hardcode > the split size to block size? Is there no way to tell it to just operate > on a larger split size? > > > > > > On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > > > >> For Pig loaders, each split can have at most one file, doesn't matter > what split size is. > >> > >> You can concatenate the input files before loading them. > >> > >> Thanks, > >> -Richard > >> -----Original Message----- > >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > >> Sent: Tuesday, July 27, 2010 2:09 PM > >> To: [EMAIL PROTECTED] > >> Subject: mapred.min.split.size > >> > >> Is there a way to set the mapred.min.split.size property in pig? I set > it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. > My mappers are finishing ~10 secs. I have ~20,000 of them. > >> > >> > >> > > > >
-
RE: mapred.min.split.sizeRichard Ding 2010-08-06, 00:38
Pig 0.6 implements its own splits (called slice) with size equal to the block size. So this explains why the setting doesn't work.
Thanks, -Richard -----Original Message----- From: Bill Graham [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 05, 2010 5:06 PM To: [EMAIL PROTECTED] Subject: Re: mapred.min.split.size FYI, Chukwa support for Pig 0.7.0 was just committed last week: https://issues.apache.org/jira/browse/CHUKWA-495 The patch was built on Chukwa 0.4.0, but you could try applying the patch against Chukwa 0.3.0. I don't think the relevant code changed much between 3-4. On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]> wrote: > What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses > Hadoop FileInputFormat to generate splits so the mapred.min.split.size > property should work. > > But from the release date, Chukwa 0.3 seems not on Pig 0.7. > > Thanks, > -Richard > > -----Original Message----- > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 3:50 PM > To: [EMAIL PROTECTED] > Subject: Re: mapred.min.split.size > > I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's > responsibility to deal with input splits? > > On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: > > > I misunderstood your earlier question. If you have one large file, set > mapred.min.split.size property will help to increase the file split size. > Pig will pass system properties to Hadoop. What loader are you using? > > > > Thanks, > > -Richard > > > > -----Original Message----- > > From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, August 05, 2010 1:22 PM > > To: [EMAIL PROTECTED] > > Subject: Re: mapred.min.split.size > > > > So what does pig do when I have a 5 gig file? Does it simply hardcode > the split size to block size? Is there no way to tell it to just operate > on a larger split size? > > > > > > On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > > > >> For Pig loaders, each split can have at most one file, doesn't matter > what split size is. > >> > >> You can concatenate the input files before loading them. > >> > >> Thanks, > >> -Richard > >> -----Original Message----- > >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > >> Sent: Tuesday, July 27, 2010 2:09 PM > >> To: [EMAIL PROTECTED] > >> Subject: mapred.min.split.size > >> > >> Is there a way to set the mapred.min.split.size property in pig? I set > it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. > My mappers are finishing ~10 secs. I have ~20,000 of them. > >> > >> > >> > > > >
-
Re: mapred.min.split.sizeCorbin Hoenes 2010-08-06, 03:34
Thanks guys this is the issue. Need to move to pig 0.7 and while I'm at it upgrade to the latest chukwa.
On Aug 5, 2010, at 6:38 PM, Richard Ding wrote: > Pig 0.6 implements its own splits (called slice) with size equal to the block size. So this explains why the setting doesn't work. > > Thanks, > -Richard > > -----Original Message----- > From: Bill Graham [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 5:06 PM > To: [EMAIL PROTECTED] > Subject: Re: mapred.min.split.size > > FYI, Chukwa support for Pig 0.7.0 was just committed last week: > > https://issues.apache.org/jira/browse/CHUKWA-495 > > The patch was built on Chukwa 0.4.0, but you could try applying the patch > against Chukwa 0.3.0. I don't think the relevant code changed much between > 3-4. > > > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]> wrote: > >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size >> property should work. >> >> But from the release date, Chukwa 0.3 seems not on Pig 0.7. >> >> Thanks, >> -Richard >> >> -----Original Message----- >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, August 05, 2010 3:50 PM >> To: [EMAIL PROTECTED] >> Subject: Re: mapred.min.split.size >> >> I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's >> responsibility to deal with input splits? >> >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: >> >>> I misunderstood your earlier question. If you have one large file, set >> mapred.min.split.size property will help to increase the file split size. >> Pig will pass system properties to Hadoop. What loader are you using? >>> >>> Thanks, >>> -Richard >>> >>> -----Original Message----- >>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >>> Sent: Thursday, August 05, 2010 1:22 PM >>> To: [EMAIL PROTECTED] >>> Subject: Re: mapred.min.split.size >>> >>> So what does pig do when I have a 5 gig file? Does it simply hardcode >> the split size to block size? Is there no way to tell it to just operate >> on a larger split size? >>> >>> >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: >>> >>>> For Pig loaders, each split can have at most one file, doesn't matter >> what split size is. >>>> >>>> You can concatenate the input files before loading them. >>>> >>>> Thanks, >>>> -Richard >>>> -----Original Message----- >>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] >>>> Sent: Tuesday, July 27, 2010 2:09 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: mapred.min.split.size >>>> >>>> Is there a way to set the mapred.min.split.size property in pig? I set >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. >> My mappers are finishing ~10 secs. I have ~20,000 of them. >>>> >>>> >>>> >>> >> >>
-
Re: mapred.min.split.sizeCorbin Hoenes 2011-04-18, 20:40
I've upgraded to pig 0.8 and still not able to correctly set the input split
size. It still defaults to DFS block size: here are the params I set via the cmd line: -Dmapred.min.split.size=512MB -Dpig.maxCombinedSplitSize=512MB -Dpig.splitCombination=false I'm starting to wonder if the ChukwaLoader isn't respecting the splits. Anyone actually got this working? On Thu, Aug 5, 2010 at 9:34 PM, Corbin Hoenes <[EMAIL PROTECTED]> wrote: > Thanks guys this is the issue. Need to move to pig 0.7 and while I'm at it > upgrade to the latest chukwa. > > On Aug 5, 2010, at 6:38 PM, Richard Ding wrote: > > > Pig 0.6 implements its own splits (called slice) with size equal to the > block size. So this explains why the setting doesn't work. > > > > Thanks, > > -Richard > > > > -----Original Message----- > > From: Bill Graham [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, August 05, 2010 5:06 PM > > To: [EMAIL PROTECTED] > > Subject: Re: mapred.min.split.size > > > > FYI, Chukwa support for Pig 0.7.0 was just committed last week: > > > > https://issues.apache.org/jira/browse/CHUKWA-495 > > > > The patch was built on Chukwa 0.4.0, but you could try applying the patch > > against Chukwa 0.3.0. I don't think the relevant code changed much > between > > 3-4. > > > > > > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]> > wrote: > > > >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses > >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size > >> property should work. > >> > >> But from the release date, Chukwa 0.3 seems not on Pig 0.7. > >> > >> Thanks, > >> -Richard > >> > >> -----Original Message----- > >> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > >> Sent: Thursday, August 05, 2010 3:50 PM > >> To: [EMAIL PROTECTED] > >> Subject: Re: mapred.min.split.size > >> > >> I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's > >> responsibility to deal with input splits? > >> > >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: > >> > >>> I misunderstood your earlier question. If you have one large file, set > >> mapred.min.split.size property will help to increase the file split > size. > >> Pig will pass system properties to Hadoop. What loader are you using? > >>> > >>> Thanks, > >>> -Richard > >>> > >>> -----Original Message----- > >>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > >>> Sent: Thursday, August 05, 2010 1:22 PM > >>> To: [EMAIL PROTECTED] > >>> Subject: Re: mapred.min.split.size > >>> > >>> So what does pig do when I have a 5 gig file? Does it simply hardcode > >> the split size to block size? Is there no way to tell it to just > operate > >> on a larger split size? > >>> > >>> > >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > >>> > >>>> For Pig loaders, each split can have at most one file, doesn't matter > >> what split size is. > >>>> > >>>> You can concatenate the input files before loading them. > >>>> > >>>> Thanks, > >>>> -Richard > >>>> -----Original Message----- > >>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]] > >>>> Sent: Tuesday, July 27, 2010 2:09 PM > >>>> To: [EMAIL PROTECTED] > >>>> Subject: mapred.min.split.size > >>>> > >>>> Is there a way to set the mapred.min.split.size property in pig? I set > >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ > counter. > >> My mappers are finishing ~10 secs. I have ~20,000 of them. > >>>> > >>>> > >>>> > >>> > >> > >> > > |