Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Problem in reading Map Output file via RecordReader<ImmutableBytesWritable, Put>

Copy link to this message
Problem in reading Map Output file via RecordReader<ImmutableBytesWritable, Put>
Hi All,

I am using HBase0.92.1. I am trying to break the HBase bulk loading into
multiple MR jobs since i want to populate more than one HBase table from a
single csv file. I have looked into MultiTableOutputFormat class but i
doesnt solve my purpose becasue it does not generates HFile.

I modified the bulk loader job of HBase and removed the reducer phase so
that i can generate  output of <ImmutableBytesWritable, Put> for multiple
tables in one MR job(phase 1).
Now, i ended up writing an input format that reads <ImmutableBytesWritable,
Put> to use it to read the output of mappers(phase 1) and generate the
HFiles for each table.

I implemented a RecordReader assuming that i can use the
readFields(DataInput) to read ImmutableBytesWritable and Put respectively.

As per my understanding, format of the input file(output files of mappers
of phase 1) is <deserialized ImmutableBytesWritable><deserialized Put>.
However when i am trying to read the file like that, the size of the
ImmutableBytesWritable is wrong and its throwing OOM due to that. Size of
ImmutableBytesWritable(rowkey) should not be greater than 32 bytes for my
use case but the as per the input it is 808460337 bytes. I am pretty sure
that either my understanding of input format is wrong or my implementation
of record reader is having some problem.

Can someone tell me the correct way of deserializing the output file of
mapper? or There is some problem with my code?
Here is the link to my initial stab at RecordReader:
Thanks & Regards,
Anil Gupta