Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MapReduce to load data in HBase


Copy link to this message
-
Re: MapReduce to load data in HBase
You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026
http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:

> Hello,
>
> Thank you for the reply.
> 1. I cannot serialize the Json and store it as a whole. I need to extract
> individual values and store them as later I need to query the stored values
> in various aggregation algorithms.
> 2. Can u please point me in direction where I can find out how to write a
> data type to be Writable+Comparable. I will look into Avro, but I prefer to
> write my owm data type.
> 3. I will look into MR counters.
>
> Regards,
>
>
> On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> Hello Panshul,
>>
>>     My answers :
>> 1- You can serialize the entire jSON into a byte[ ] and store it in a
>> cell.(Is it important for you extract individual values from your JSON and
>> then put them into the table?)
>> 2- You can write your own datatype to pass your object to the reducer.
>> But, it must be a Writable+Comparable. Alternatively you van use Avro.
>> 3- For generating unique keys, you can use MR counters.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:
>>
>>> Hello,
>>>
>>> I am trying to write MapReduce jobs to read data from JSON files and
>>> load it into HBase tables.
>>> Please suggest me an efficient way to do it. I am trying to do it using
>>> Spring Data Hbase Template to make it thread safe and enable table locking.
>>>
>>> I use the Map methods to read and parse the JSON files. I use the Reduce
>>> methods to call the HBase Template and store the data into the HBase tables.
>>>
>>> My questions:
>>> 1. Is this the right approach or should I do all of the above the Map
>>> method?
>>> 2. How can I pass the Java Object I create holding the data read from
>>> the Json file to the Reduce method, which needs to be saved to the HBase
>>> table? I can only pass the inbuilt data types to the reduce method from my
>>> mapper.
>>> 3. I thought of using the distributed cache for the above problem, to
>>> store the object in the cache and pass only the key to the reduce method.
>>> But how do I generate the unique key for all the objects I store in the
>>> distributed cache.
>>>
>>> Please help me with the above. Please tell me if I am missing some
>>> detail or over looking some important detail.
>>>
>>> Thanking You,
>>>
>>>
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>>>
>>
>>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>