Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> reading a binary file


+
Francesco Silvestri 2012-09-03, 14:56
+
Mohammad Tariq 2012-09-03, 15:01
+
Francesco Silvestri 2012-09-03, 15:08
Copy link to this message
-
Re: reading a binary file
Hi Francesco

TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by line.
I believe you may have to write your own custom  Record Reader to get this
done.

On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <[EMAIL PROTECTED]>wrote:

> Hi Mohammad,
>
> SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html> requires
> the file to be a sequence of key/value stored in binary (i.e., the key is
> stored in the file). In my case, the key is implicitly given by the
> position of the value within the file.
>
> Thank you,
> Francesco
>
>
>
> On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>
>> Hello Francesco,
>>
>>         Have a look at SequenceFileInputFormat :
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <[EMAIL PROTECTED]>wrote:
>>
>>> Hello,
>>>
>>> I have a binary file of integers and I would like an input format that
>>> generates pairs <key,value>, where value is an integer in the file and key
>>> the position of the integer in the file. Which class should I use? (i.e.
>>> I'm looking for a kind of TextinputFormat for binary files)
>>>
>>> Thank you for your consideration,
>>>
>>> Francesco
>>>
>>
>>
>