Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> WholeFileInputFormat format


Copy link to this message
-
Re: WholeFileInputFormat format
Hello Harsh,

         Thank you so much for the quick response. Actually I have a
use case wherein I have to compare values that are coming from 2
mappers to one reducer. For that I am planning to use MultipleInputs
class. In one mapper I have a text file (these files may contain
1,00,000 to 2,00,000 lines), and I have to extract bytes from 2-13,
20-25, 32-38 and so on from each line of this file. In the second
mapper I have to read values from an Hbase table. The columns of this
table correspond to the fields which I am reading from the text file
in the first mapper.
        In the reducer I have to compare the results coming for both
the mappers and generate the final result. Need your guidance. Many
thanks.

Regards,
    Mohammad Tariq
On Tue, Jul 10, 2012 at 6:55 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> It depends on what you need. If your file is not splittable, or if you
> need to read the whole file from a single mapper itself (i.e. you do
> not _want_ it to be split), then use WholeFileInputFormats. Otherwise,
> you get more parallelism with regular splitting.
>
> On Tue, Jul 10, 2012 at 6:31 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> Hello list,
>>
>>        What could be the approximate maximum size of the files that
>> can be handled using WholeFileInputFormat format??I mean, if the file
>> is very big, then is it feasible to use WholeFileInputFormat as the
>> entire load will go to one mapper??Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>
>
>
> --
> Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB