Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> WholeFileInputFormat format

Copy link to this message
Re: WholeFileInputFormat format
Hello Harsh,

          Does Hadoop- API) has Avro support??

    Mohammad Tariq
On Wed, Jul 11, 2012 at 1:57 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello Harsh,
>           I am sorry to be a pest of questions. Actually I am kinda
> stuck. I have to write my MapReduce job such that the comparisons
> between each output from both the mappers must be in order. I mean I
> have to read one line from the file and extract the desired fields
> from the line in one mapper, and in the second mapper I have to read
> the values from Hbase table and compare those values with the fields
> read in the first mapper. I am wondering how to achieve that since
> reducer phase will not start until all the mappers are done.
>           Maybe a bit of elaboration of my use case would be helpful
> in understanding the problem in a better fashion. I have a file that
> contains several fields. I have created columns for these fields in my
> Hbase table. After that I am extracting value of each field from the
> file and storing it in the corresponding Hbase column. Now, I have a
> 'support file' for the same file whose values are already stored in
> Hbase, but with a totally different format. But the order of fields in
> the original file and the order of lines(containing corresponding
> fields) in the support file is exactly same. So I am trying to read
> one line from the support file, extract the field of interest in one
> mapper and read the same field from the Hbase table in second mapper
> and send these values to the reducer where the comparison will be made
> to conclude the test.
>          Please help me out by providing your able guidance, as being
> a novice I am not able to tackle with the situation.(Pardon my
> ignorance)
> May thanks.
> Regards,
>     Mohammad Tariq
> On Tue, Jul 10, 2012 at 8:34 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> I don't see why you'd have to use the WholeFileInputFormat for such a
>> task. Your task is very similar to joins, and you can see the section
>> "General reducer-side join" for what your overall logic should look
>> like, under Ricky's
>> http://horicky.blogspot.in/2010/08/designing-algorithmis-for-map-reduce.html
>> article.
>> On Tue, Jul 10, 2012 at 7:46 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>> Hello Harsh,
>>>          Thank you so much for the quick response. Actually I have a
>>> use case wherein I have to compare values that are coming from 2
>>> mappers to one reducer. For that I am planning to use MultipleInputs
>>> class. In one mapper I have a text file (these files may contain
>>> 1,00,000 to 2,00,000 lines), and I have to extract bytes from 2-13,
>>> 20-25, 32-38 and so on from each line of this file. In the second
>>> mapper I have to read values from an Hbase table. The columns of this
>>> table correspond to the fields which I am reading from the text file
>>> in the first mapper.
>>>         In the reducer I have to compare the results coming for both
>>> the mappers and generate the final result. Need your guidance. Many
>>> thanks.
>>> Regards,
>>>     Mohammad Tariq
>>> On Tue, Jul 10, 2012 at 6:55 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>> It depends on what you need. If your file is not splittable, or if you
>>>> need to read the whole file from a single mapper itself (i.e. you do
>>>> not _want_ it to be split), then use WholeFileInputFormats. Otherwise,
>>>> you get more parallelism with regular splitting.
>>>> On Tue, Jul 10, 2012 at 6:31 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>>>> Hello list,
>>>>>        What could be the approximate maximum size of the files that
>>>>> can be handled using WholeFileInputFormat format??I mean, if the file
>>>>> is very big, then is it feasible to use WholeFileInputFormat as the
>>>>> entire load will go to one mapper??Many thanks.
>>>>> Regards,
>>>>>     Mohammad Tariq
>>>> --
>>>> Harsh J
>> --
>> Harsh J