Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Streaming value of (200MB) from a SequenceFile


+
Jerry Lam 2013-03-30, 17:52
+
Rahul Bhattacharjee 2013-04-01, 03:32
Copy link to this message
-
Re: Streaming value of (200MB) from a SequenceFile
Sorry for the multiple replies.

There is one more thing that can be done (I guess) for streaming the values
rather then constructing the whole object itself.We can store the value in
hdfs as file and have the location as value of the mapper.Mapper can open a
stream using the location specified.

Not sure if 200 MB file would qualify as small file wrt hadoop or if too
many 200 MB size files would have any impact to the NN.

Thanks,
Rahul

On Mon, Apr 1, 2013 at 9:02 AM, Rahul Bhattacharjee <[EMAIL PROTECTED]
> wrote:

> Hi Sandy,
>
> I am also new to Hadoop and have a question here.
> The writable does have a DataInput stream so that the objects can be
> constructed from the byte stream.
> Are you suggesting to save the stream for later use ,but late we cannot
> ascertain the state of the stream.
> For a large value , I think we can actually take the useful part and emmit
> it out of from a mapper , we might also have a custom input format to do
> this thing so that large value doesn't even reach the mapper.
>
> Am I missing anything here?
>
> Thanks,
> Rahul
>
>
>
> On Sat, Mar 30, 2013 at 11:22 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
>> Hi everyone,
>>
>> I'm having a problem to stream individual key-value pair of 200MB to 1GB
>> from a MapFile.
>> I need to stream the large value to an outputstream instead of reading
>> the entire value before processing because it potentially uses too much
>> memory.
>>
>> I read the API for MapFile, the next(WritableComparable key, Writable
>> val) does not return an input stream.
>>
>> How can I accomplish this?
>>
>> Thanks,
>>
>> Jerry
>>
>
>
+
Sandy Ryza 2013-03-31, 18:10
+
Jerry Lam 2013-03-31, 18:51