|
|
-
Re: How to use CombineFileInputFormat in PigThejas Nair 2012-01-13, 01:12
What load function are you using ? if it implements some of the
interfaces specified here, it turns off split combination - http://pig.apache.org/docs/r0.9.1/perf.html#combine-files -Thejas On 1/11/12 11:07 PM, Marcel Holle wrote: > In my pig.properties are only these parameters specified: log4jconf, > fs.default.name, mapred.job.tracker. So it should use the > CombineFileInputFormat by default. I have 100.000 files of around 16K. > > 2012/1/11 Prashant Kommireddi<[EMAIL PROTECTED]> > >> Hi Marcel, >> >> You might not find "pig.splitCombination" in your configuration if not >> manually set. Pig internally defaults it to true. >> >> What is the value of "pig.maxCombinedSplitSize", if you are not setting it >> manually this should be equal to your block size. What is the individual >> filesize of the small files? >> >> Thanks, >> Prashant >> >> >> On Wed, Jan 11, 2012 at 3:18 PM, Marcel Holle >> <[EMAIL PROTECTED]>wrote: >> >>> If I got it right I should see an output like "Total input paths >> (combined) >>> to process : 7" when I run a pig script, but I'm missing the "(combined)" >>> part, so CombineFileInputFormat is not used? Where could I find the pig >>> configuration? I think I have to check the "pig.splitCombination" value. >>> >>> 2012/1/11 Daniel Dai<[EMAIL PROTECTED]> >>> >>>> Check PIG-1518. >>>> >>>> Daniel >>>> >>>> On Wed, Jan 11, 2012 at 11:01 AM, Marcel Holle >>>> <[EMAIL PROTECTED]>wrote: >>>> >>>>> How could I verify this information? Could you point me to a config >> or >>>> the >>>>> source code? >>>>> >>>>> 2012/1/11 Daniel Dai<[EMAIL PROTECTED]> >>>>> >>>>>> It is default in 0.8 as well. >>>>>> >>>>>> Daniel >>>>>> >>>>>> On Wed, Jan 11, 2012 at 10:43 AM, Marcel Holle >>>>>> <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> Is there also a way to activate the CombineFileInputFormat in Pig >>>>> 0.8.1? >>>>>>> >>>>>>> 2012/1/10 Alex Rovner<[EMAIL PROTECTED]> >>>>>>> >>>>>>>> In versions 9+ default is CombineFileInputFormat >>>>>>>> >>>>>>>> On Tue, Jan 10, 2012 at 8:10 PM, Marcel Holle >>>>>>>> <[EMAIL PROTECTED]>wrote: >>>>>>>> >>>>>>>>> How could I use the CombineFileInputFormat in Pig? I have a >>>>>> performance >>>>>>>>> issue with lots of small files which I want to get rid of. I >>>> think >>>>> by >>>>>>>>> default the FileInputFormat is used. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > |