Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Loading data, hbase slower than Hive?


+
Austin Chungath 2013-01-17, 16:44
+
Michael Segel 2013-01-17, 16:48
+
Anoop John 2013-01-17, 17:00
+
ramkrishna vasudevan 2013-01-17, 17:09
+
Mohammad Tariq 2013-01-17, 17:46
+
praveenesh kumar 2013-01-18, 17:57
+
Doug Meil 2013-01-18, 18:00
+
Asaf Mesika 2013-01-19, 19:50
+
Mohammad Tariq 2013-01-19, 21:12
+
Doug Meil 2013-01-20, 15:13
+
Vikas Jadhav 2013-01-20, 18:04
+
Austin Chungath 2013-01-21, 05:45
+
Anoop Sam John 2013-01-21, 05:54
+
Austin Chungath 2013-01-21, 06:16
+
Mohammad Tariq 2013-01-21, 06:31
+
Anoop Sam John 2013-01-21, 06:36
Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Thank you so much for pointing out the mistake sir.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Jan 21, 2013 at 12:06 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> @Mohammad
> As he is using HFileOutputFormat, there is no put call happening on
> HTable. In this case the MR will create the HFiles directly with out using
> the normal HBase write path. Then later using HRS API the HFiles are loaded
> to the table regions.
> In this case the number of reducers will be that of the table regions. So
> Austin you can check with proper presplit of table.
>
> -Anoop-
> ________________________________________
> From: Mohammad Tariq [[EMAIL PROTECTED]]
> Sent: Monday, January 21, 2013 12:01 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Loading data, hbase slower than Hive?
>
> Apart from this you can have some additional tweaks to improve
> put performance. Like, creating pre-splitted tables, making use of
> put(List<Put> puts) instead of normal put etc.
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 11:46 AM, Austin Chungath <[EMAIL PROTECTED]
> >wrote:
>
> > Anoop,
> >
> > I am using HFileOutputFormat. I am doing nothing but splitting the data
> > from each row by the delimiter and sending it into their respective
> > columns.
> > Is there some kind of preprocessing or steps that I should do before
> this?
> > As suggested I will look into the above solutions and let you guys know
> > what the problem was. I might have to rethink the Rowkey design.
> >
> > Regards,
> > Austin.
> >
> > On Mon, Jan 21, 2013 at 11:24 AM, Anoop Sam John <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Austin,
> > >         You are using HFileOutputFormat or TableOutputFormat?
> > >
> > > -Anoop-
> > > ________________________________________
> > > From: Austin Chungath [[EMAIL PROTECTED]]
> > > Sent: Monday, January 21, 2013 11:15 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Loading data, hbase slower than Hive?
> > >
> > > Thank you Tariq.
> > > I will let you know how things went after I implement these
> suggestions.
> > >
> > > Regards,
> > > Austin
> > >
> > > On Sun, Jan 20, 2013 at 2:42 AM, Mohammad Tariq <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hello Austin,
> > > >
> > > >           I am sorry for the late response.
> > > >
> > > > Asaf has made a very valid point. Rowkwey design is very crucial.
> > > > Specially if the data is gonna be sequential(timeseries kinda thing).
> > > > You may end up with hotspotting problem. Use pre-splitted tables
> > > > or hash the keys to avoid that. It'll also allow you to fetch the
> > results
> > > > faster.
> > > >
> > > > Warm Regards,
> > > > Tariq
> > > > https://mtariq.jux.com/
> > > > cloudfront.blogspot.com
> > > >
> > > >
> > > > On Sun, Jan 20, 2013 at 1:20 AM, Asaf Mesika <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > Start by telling us your row key design.
> > > > > Check for pre splitting your table regions.
> > > > > I managed to get to 25mb/sec write throughput in Hbase using 1
> region
> > > > > server. If your data is evenly spread you can get around 7 times
> that
> > > in
> > > > a
> > > > > 10 regions server environment. Should mean that 1 gig should take 4
> > > sec.
> > > > >
> > > > >
> > > > > On Friday, January 18, 2013, praveenesh kumar wrote:
> > > > >
> > > > > > Hey,
> > > > > > Can someone throw some pointers on what would be the best
> practice
> > > for
> > > > > bulk
> > > > > > imports in hbase ?
> > > > > > That would be really helpful.
> > > > > >
> > > > > > Regards,
> > > > > > Praveenesh
> > > > > >
> > > > > > On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <
> > [EMAIL PROTECTED]
> > > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Just to add to whatever all the heavyweights have said above,
> > your
> > > MR
> > > > > job
> > > > > > > may not be as efficient as the MR job corresponding to your
> Hive
> > > > query.
> > > > >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB