Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> WholeFileInputFormat format


Copy link to this message
-
Re: WholeFileInputFormat format
Hello Harsh,

         Thank you so much for the quick response. Actually I have a
use case wherein I have to compare values that are coming from 2
mappers to one reducer. For that I am planning to use MultipleInputs
class. In one mapper I have a text file (these files may contain
1,00,000 to 2,00,000 lines), and I have to extract bytes from 2-13,
20-25, 32-38 and so on from each line of this file. In the second
mapper I have to read values from an Hbase table. The columns of this
table correspond to the fields which I am reading from the text file
in the first mapper.
        In the reducer I have to compare the results coming for both
the mappers and generate the final result. Need your guidance. Many
thanks.

Regards,
    Mohammad Tariq
On Tue, Jul 10, 2012 at 6:55 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> It depends on what you need. If your file is not splittable, or if you
> need to read the whole file from a single mapper itself (i.e. you do
> not _want_ it to be split), then use WholeFileInputFormats. Otherwise,
> you get more parallelism with regular splitting.
>
> On Tue, Jul 10, 2012 at 6:31 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> Hello list,
>>
>>        What could be the approximate maximum size of the files that
>> can be handled using WholeFileInputFormat format??I mean, if the file
>> is very big, then is it feasible to use WholeFileInputFormat as the
>> entire load will go to one mapper??Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>
>
>
> --
> Harsh J