Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to use CombineFileInputFormat in Pig


Copy link to this message
-
Re: How to use CombineFileInputFormat in Pig
What load function are you using ? if it implements some of the
interfaces specified here, it turns off split combination -
http://pig.apache.org/docs/r0.9.1/perf.html#combine-files

-Thejas
On 1/11/12 11:07 PM, Marcel Holle wrote:
> In my pig.properties are only these parameters specified: log4jconf,
> fs.default.name, mapred.job.tracker. So it should use the
> CombineFileInputFormat by default. I have 100.000 files of around 16K.
>
> 2012/1/11 Prashant Kommireddi<[EMAIL PROTECTED]>
>
>> Hi Marcel,
>>
>> You might not find "pig.splitCombination" in your configuration if not
>> manually set. Pig internally defaults it to true.
>>
>> What is the value of  "pig.maxCombinedSplitSize", if you are not setting it
>> manually this should be equal to your block size. What is the individual
>> filesize of the small files?
>>
>> Thanks,
>> Prashant
>>
>>
>> On Wed, Jan 11, 2012 at 3:18 PM, Marcel Holle
>> <[EMAIL PROTECTED]>wrote:
>>
>>> If I got it right I should see an output like "Total input paths
>> (combined)
>>> to process : 7" when I run a pig script, but I'm missing the "(combined)"
>>> part, so CombineFileInputFormat is not used? Where could I find the pig
>>> configuration? I think I have to check the "pig.splitCombination" value.
>>>
>>> 2012/1/11 Daniel Dai<[EMAIL PROTECTED]>
>>>
>>>> Check PIG-1518.
>>>>
>>>> Daniel
>>>>
>>>> On Wed, Jan 11, 2012 at 11:01 AM, Marcel Holle
>>>> <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> How could I verify this information? Could you point me to a config
>> or
>>>> the
>>>>> source code?
>>>>>
>>>>> 2012/1/11 Daniel Dai<[EMAIL PROTECTED]>
>>>>>
>>>>>> It is default in 0.8 as well.
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> On Wed, Jan 11, 2012 at 10:43 AM, Marcel Holle
>>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> Is there also a way to activate the CombineFileInputFormat in Pig
>>>>> 0.8.1?
>>>>>>>
>>>>>>> 2012/1/10 Alex Rovner<[EMAIL PROTECTED]>
>>>>>>>
>>>>>>>> In versions 9+ default is CombineFileInputFormat
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 8:10 PM, Marcel Holle
>>>>>>>> <[EMAIL PROTECTED]>wrote:
>>>>>>>>
>>>>>>>>> How could I use the CombineFileInputFormat in Pig? I have a
>>>>>> performance
>>>>>>>>> issue with lots of small files which I want to get rid of. I
>>>> think
>>>>> by
>>>>>>>>> default the FileInputFormat is used.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>