|
Panshul Whisper
2013-02-07, 11:22
Mohammad Tariq
2013-02-07, 11:28
Mohammad Tariq
2013-02-07, 11:34
Panshul Whisper
2013-02-07, 11:35
Mohammad Tariq
2013-02-07, 11:40
Damien Hardy
2013-02-07, 11:55
Panshul Whisper
2013-02-07, 14:24
|
-
MapReduce to load data in HBasePanshul Whisper 2013-02-07, 11:22
Hello,
I am trying to write MapReduce jobs to read data from JSON files and load it into HBase tables. Please suggest me an efficient way to do it. I am trying to do it using Spring Data Hbase Template to make it thread safe and enable table locking. I use the Map methods to read and parse the JSON files. I use the Reduce methods to call the HBase Template and store the data into the HBase tables. My questions: 1. Is this the right approach or should I do all of the above the Map method? 2. How can I pass the Java Object I create holding the data read from the Json file to the Reduce method, which needs to be saved to the HBase table? I can only pass the inbuilt data types to the reduce method from my mapper. 3. I thought of using the distributed cache for the above problem, to store the object in the cache and pass only the key to the reduce method. But how do I generate the unique key for all the objects I store in the distributed cache. Please help me with the above. Please tell me if I am missing some detail or over looking some important detail. Thanking You, -- Regards, Ouch Whisper 010101010101
-
Re: MapReduce to load data in HBaseMohammad Tariq 2013-02-07, 11:28
Hello Panshul,
My answers : 1- You can serialize the entire jSON into a byte[ ] and store it in a cell.(Is it important for you extract individual values from your JSON and then put them into the table?) 2- You can write your own datatype to pass your object to the reducer. But, it must be a Writable+Comparable. Alternatively you van use Avro. 3- For generating unique keys, you can use MR counters. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote: > Hello, > > I am trying to write MapReduce jobs to read data from JSON files and load > it into HBase tables. > Please suggest me an efficient way to do it. I am trying to do it using > Spring Data Hbase Template to make it thread safe and enable table locking. > > I use the Map methods to read and parse the JSON files. I use the Reduce > methods to call the HBase Template and store the data into the HBase tables. > > My questions: > 1. Is this the right approach or should I do all of the above the Map > method? > 2. How can I pass the Java Object I create holding the data read from the > Json file to the Reduce method, which needs to be saved to the HBase table? > I can only pass the inbuilt data types to the reduce method from my mapper. > 3. I thought of using the distributed cache for the above problem, to > store the object in the cache and pass only the key to the reduce method. > But how do I generate the unique key for all the objects I store in the > distributed cache. > > Please help me with the above. Please tell me if I am missing some detail > or over looking some important detail. > > Thanking You, > > > -- > Regards, > Ouch Whisper > 010101010101 >
-
Re: MapReduce to load data in HBaseMohammad Tariq 2013-02-07, 11:34
One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as well, then it must be both. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Hello Panshul, > > My answers : > 1- You can serialize the entire jSON into a byte[ ] and store it in a > cell.(Is it important for you extract individual values from your JSON and > then put them into the table?) > 2- You can write your own datatype to pass your object to the reducer. > But, it must be a Writable+Comparable. Alternatively you van use Avro. > 3- For generating unique keys, you can use MR counters. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote: > >> Hello, >> >> I am trying to write MapReduce jobs to read data from JSON files and load >> it into HBase tables. >> Please suggest me an efficient way to do it. I am trying to do it using >> Spring Data Hbase Template to make it thread safe and enable table locking. >> >> I use the Map methods to read and parse the JSON files. I use the Reduce >> methods to call the HBase Template and store the data into the HBase tables. >> >> My questions: >> 1. Is this the right approach or should I do all of the above the Map >> method? >> 2. How can I pass the Java Object I create holding the data read from the >> Json file to the Reduce method, which needs to be saved to the HBase table? >> I can only pass the inbuilt data types to the reduce method from my mapper. >> 3. I thought of using the distributed cache for the above problem, to >> store the object in the cache and pass only the key to the reduce method. >> But how do I generate the unique key for all the objects I store in the >> distributed cache. >> >> Please help me with the above. Please tell me if I am missing some detail >> or over looking some important detail. >> >> Thanking You, >> >> >> -- >> Regards, >> Ouch Whisper >> 010101010101 >> > >
-
Re: MapReduce to load data in HBasePanshul Whisper 2013-02-07, 11:35
Hello,
Thank you for the reply. 1. I cannot serialize the Json and store it as a whole. I need to extract individual values and store them as later I need to query the stored values in various aggregation algorithms. 2. Can u please point me in direction where I can find out how to write a data type to be Writable+Comparable. I will look into Avro, but I prefer to write my owm data type. 3. I will look into MR counters. Regards, On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Hello Panshul, > > My answers : > 1- You can serialize the entire jSON into a byte[ ] and store it in a > cell.(Is it important for you extract individual values from your JSON and > then put them into the table?) > 2- You can write your own datatype to pass your object to the reducer. > But, it must be a Writable+Comparable. Alternatively you van use Avro. > 3- For generating unique keys, you can use MR counters. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote: > >> Hello, >> >> I am trying to write MapReduce jobs to read data from JSON files and load >> it into HBase tables. >> Please suggest me an efficient way to do it. I am trying to do it using >> Spring Data Hbase Template to make it thread safe and enable table locking. >> >> I use the Map methods to read and parse the JSON files. I use the Reduce >> methods to call the HBase Template and store the data into the HBase tables. >> >> My questions: >> 1. Is this the right approach or should I do all of the above the Map >> method? >> 2. How can I pass the Java Object I create holding the data read from the >> Json file to the Reduce method, which needs to be saved to the HBase table? >> I can only pass the inbuilt data types to the reduce method from my mapper. >> 3. I thought of using the distributed cache for the above problem, to >> store the object in the cache and pass only the key to the reduce method. >> But how do I generate the unique key for all the objects I store in the >> distributed cache. >> >> Please help me with the above. Please tell me if I am missing some detail >> or over looking some important detail. >> >> Thanking You, >> >> >> -- >> Regards, >> Ouch Whisper >> 010101010101 >> > > -- Regards, Ouch Whisper 010101010101
-
Re: MapReduce to load data in HBaseMohammad Tariq 2013-02-07, 11:40
You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026 http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688 HTH Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote: > Hello, > > Thank you for the reply. > 1. I cannot serialize the Json and store it as a whole. I need to extract > individual values and store them as later I need to query the stored values > in various aggregation algorithms. > 2. Can u please point me in direction where I can find out how to write a > data type to be Writable+Comparable. I will look into Avro, but I prefer to > write my owm data type. > 3. I will look into MR counters. > > Regards, > > > On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: > >> Hello Panshul, >> >> My answers : >> 1- You can serialize the entire jSON into a byte[ ] and store it in a >> cell.(Is it important for you extract individual values from your JSON and >> then put them into the table?) >> 2- You can write your own datatype to pass your object to the reducer. >> But, it must be a Writable+Comparable. Alternatively you van use Avro. >> 3- For generating unique keys, you can use MR counters. >> >> Warm Regards, >> Tariq >> https://mtariq.jux.com/ >> cloudfront.blogspot.com >> >> >> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote: >> >>> Hello, >>> >>> I am trying to write MapReduce jobs to read data from JSON files and >>> load it into HBase tables. >>> Please suggest me an efficient way to do it. I am trying to do it using >>> Spring Data Hbase Template to make it thread safe and enable table locking. >>> >>> I use the Map methods to read and parse the JSON files. I use the Reduce >>> methods to call the HBase Template and store the data into the HBase tables. >>> >>> My questions: >>> 1. Is this the right approach or should I do all of the above the Map >>> method? >>> 2. How can I pass the Java Object I create holding the data read from >>> the Json file to the Reduce method, which needs to be saved to the HBase >>> table? I can only pass the inbuilt data types to the reduce method from my >>> mapper. >>> 3. I thought of using the distributed cache for the above problem, to >>> store the object in the cache and pass only the key to the reduce method. >>> But how do I generate the unique key for all the objects I store in the >>> distributed cache. >>> >>> Please help me with the above. Please tell me if I am missing some >>> detail or over looking some important detail. >>> >>> Thanking You, >>> >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >>> >> >> > > > -- > Regards, > Ouch Whisper > 010101010101 >
-
Re: MapReduce to load data in HBaseDamien Hardy 2013-02-07, 11:55
Hello,
Why not using a PIG script for that ? make the json file available on HDFS Load with http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html Store with http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html http://pig.apache.org/docs/r0.10.0/ Cheers, -- Damien
-
Re: MapReduce to load data in HBasePanshul Whisper 2013-02-07, 14:24
I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer. With Avro I need to maintain the schema for all the types of Jason files I am receiving and since there will be many different map reduce methods running, so a different schema for every type. 1. Since the Json schema might change very frequently almost 3 times every month. Is it advisable to use Avro to create custom data types? or I can use the distributed cache and store the Java Object in the cache and pass the key to the object to the Reducer? 2. Will there be any performance issues with using the distributed cache? since the data will be very large and very high speed performance required. Thanking You, Regards, On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Size is not a prob, frequently changing schema might be. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <[EMAIL PROTECTED] > >wrote: > > > Hello, > > > > Thank you for the replies. > > > > I have not used pig yet. I am looking into it. I wanted to implement both > > the approaches. > > Are pig scripts maintainable? Because the Json structure that I will be > > receiving will be changing quite often. Almost 3 times a month. > > I will be processing 24 million Json files per month. > > I am getting one big file with almost 3 million Json files aggregated. > One > > Json per line. I need to process this file and store all values into > HBase. > > > > Thanking You, > > > > > > > > > > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <[EMAIL PROTECTED]> > > wrote: > > > > > Good point sir. If Pig fits into Panshul's requirements then it's a > much > > > better option. > > > > > > Warm Regards, > > > Tariq > > > https://mtariq.jux.com/ > > > cloudfront.blogspot.com > > > > > > > > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hello, > > > > Why not using a PIG script for that ? > > > > make the json file available on HDFS > > > > Load with > > > > > > > > > > > > > > http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html > > > > Store with > > > > > > > > > > > > > > http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html > > > > > > > > http://pig.apache.org/docs/r0.10.0/ > > > > > > > > Cheers, > > > > > > > > -- > > > > Damien > > > > > > > > > > > > > > > -- > > Regards, > > Ouch Whisper > > 010101010101 > > > -- Regards, Ouch Whisper 010101010101 |