Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> mapred.min.split.size


Copy link to this message
-
Re: mapred.min.split.size
Thanks guys this is the issue.  Need to move to pig 0.7 and while I'm at it upgrade to the latest chukwa.

On Aug 5, 2010, at 6:38 PM, Richard Ding wrote:

> Pig 0.6 implements its own splits (called slice) with size equal to the block size. So this explains why the setting doesn't work.
>
> Thanks,
> -Richard
>
> -----Original Message-----
> From: Bill Graham [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 05, 2010 5:06 PM
> To: [EMAIL PROTECTED]
> Subject: Re: mapred.min.split.size
>
> FYI, Chukwa support for Pig 0.7.0 was just committed last week:
>
> https://issues.apache.org/jira/browse/CHUKWA-495
>
> The patch was built on Chukwa 0.4.0, but you could try applying the patch
> against Chukwa 0.3.0. I don't think the relevant code changed much between
> 3-4.
>
>
> On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[EMAIL PROTECTED]> wrote:
>
>> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
>> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
>> property should work.
>>
>> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
>>
>> Thanks,
>> -Richard
>>
>> -----Original Message-----
>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, August 05, 2010 3:50 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: mapred.min.split.size
>>
>> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
>> responsibility to deal with input splits?
>>
>> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
>>
>>> I misunderstood your earlier question. If you have one large file, set
>> mapred.min.split.size property will help to increase the file split size.
>> Pig will pass system properties to Hadoop. What loader are you using?
>>>
>>> Thanks,
>>> -Richard
>>>
>>> -----Original Message-----
>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
>>> Sent: Thursday, August 05, 2010 1:22 PM
>>> To: [EMAIL PROTECTED]
>>> Subject: Re: mapred.min.split.size
>>>
>>> So what does pig do when I have a 5 gig file?  Does it simply hardcode
>> the split size to block size?   Is there no way to tell it to just operate
>> on a larger split size?
>>>
>>>
>>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
>>>
>>>> For Pig loaders, each split can have at most one file, doesn't matter
>> what split size is.
>>>>
>>>> You can concatenate the input files before loading them.
>>>>
>>>> Thanks,
>>>> -Richard
>>>> -----Original Message-----
>>>> From: Corbin Hoenes [mailto:[EMAIL PROTECTED]]
>>>> Sent: Tuesday, July 27, 2010 2:09 PM
>>>> To: [EMAIL PROTECTED]
>>>> Subject: mapred.min.split.size
>>>>
>>>> Is there a way to set the mapred.min.split.size property in pig? I set
>> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.
>> My mappers are finishing ~10 secs.  I have ~20,000 of them.
>>>>
>>>>
>>>>
>>>
>>
>>