Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MapReduce to load data in HBase

Copy link to this message
Re: MapReduce to load data in HBase
One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as
well, then it must be both.

Warm Regards,
On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello Panshul,
>     My answers :
> 1- You can serialize the entire jSON into a byte[ ] and store it in a
> cell.(Is it important for you extract individual values from your JSON and
> then put them into the table?)
> 2- You can write your own datatype to pass your object to the reducer.
> But, it must be a Writable+Comparable. Alternatively you van use Avro.
> 3- For generating unique keys, you can use MR counters.
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:
>> Hello,
>> I am trying to write MapReduce jobs to read data from JSON files and load
>> it into HBase tables.
>> Please suggest me an efficient way to do it. I am trying to do it using
>> Spring Data Hbase Template to make it thread safe and enable table locking.
>> I use the Map methods to read and parse the JSON files. I use the Reduce
>> methods to call the HBase Template and store the data into the HBase tables.
>> My questions:
>> 1. Is this the right approach or should I do all of the above the Map
>> method?
>> 2. How can I pass the Java Object I create holding the data read from the
>> Json file to the Reduce method, which needs to be saved to the HBase table?
>> I can only pass the inbuilt data types to the reduce method from my mapper.
>> 3. I thought of using the distributed cache for the above problem, to
>> store the object in the cache and pass only the key to the reduce method.
>> But how do I generate the unique key for all the objects I store in the
>> distributed cache.
>> Please help me with the above. Please tell me if I am missing some detail
>> or over looking some important detail.
>> Thanking You,
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101