Did you get a solution for your question below?
One option might be to sort your input file, split the content the same way
your regions are splitted, and run a MapReduce job. Each mapper will read
one split and look into the corresponding region to see if the entry is
there, and then emit a result.
You will need to make sure that each slice of the file is on the same
region server where the region is. That way there will be no traffic (or
almost none) on the network, and each task will access the server locally.
Will that be an option?
Le jeudi 26 septembre 2013, Ramasubramanian Narayanan a écrit :
> Dear All,
> I need suggestion on the below....
> I have a huge HBase table and I need to do lookup in that table for every
> records in the Input file (around 1 million records in input file).
> Please suggest whether it is advisable to load the table content into Java
> Hash Map table and avoid hitting Hbase for every record of the input file.
> Please send a sample code on how to use Java Hash Map table with Hbase..
> If you have any other suggestion, pls suggest...
> Thanks and Regards,