Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: MapReduce to load data in HBase


Copy link to this message
-
Re: MapReduce to load data in HBase
Panshul Whisper 2013-02-07, 14:24
I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,
On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <[EMAIL PROTECTED]
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

--
Regards,
Ouch Whisper
010101010101
+
Panshul Whisper 2013-02-07, 11:22
+
Damien Hardy 2013-02-07, 11:55
+
Mohammad Tariq 2013-02-07, 11:28
+
Panshul Whisper 2013-02-07, 11:35
+
Mohammad Tariq 2013-02-07, 11:40
+
Mohammad Tariq 2013-02-07, 11:34