Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Loading data, hbase slower than Hive?


+
Austin Chungath 2013-01-17, 16:44
+
Michael Segel 2013-01-17, 16:48
+
Anoop John 2013-01-17, 17:00
+
ramkrishna vasudevan 2013-01-17, 17:09
+
Mohammad Tariq 2013-01-17, 17:46
+
praveenesh kumar 2013-01-18, 17:57
+
Doug Meil 2013-01-18, 18:00
+
Asaf Mesika 2013-01-19, 19:50
+
Mohammad Tariq 2013-01-19, 21:12
+
Doug Meil 2013-01-20, 15:13
+
Vikas Jadhav 2013-01-20, 18:04
+
Austin Chungath 2013-01-21, 05:45
+
Anoop Sam John 2013-01-21, 05:54
+
Austin Chungath 2013-01-21, 06:16
Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Apart from this you can have some additional tweaks to improve
put performance. Like, creating pre-splitted tables, making use of
put(List<Put> puts) instead of normal put etc.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Jan 21, 2013 at 11:46 AM, Austin Chungath <[EMAIL PROTECTED]>wrote:

> Anoop,
>
> I am using HFileOutputFormat. I am doing nothing but splitting the data
> from each row by the delimiter and sending it into their respective
> columns.
> Is there some kind of preprocessing or steps that I should do before this?
> As suggested I will look into the above solutions and let you guys know
> what the problem was. I might have to rethink the Rowkey design.
>
> Regards,
> Austin.
>
> On Mon, Jan 21, 2013 at 11:24 AM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
>
> > Austin,
> >         You are using HFileOutputFormat or TableOutputFormat?
> >
> > -Anoop-
> > ________________________________________
> > From: Austin Chungath [[EMAIL PROTECTED]]
> > Sent: Monday, January 21, 2013 11:15 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Loading data, hbase slower than Hive?
> >
> > Thank you Tariq.
> > I will let you know how things went after I implement these suggestions.
> >
> > Regards,
> > Austin
> >
> > On Sun, Jan 20, 2013 at 2:42 AM, Mohammad Tariq <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hello Austin,
> > >
> > >           I am sorry for the late response.
> > >
> > > Asaf has made a very valid point. Rowkwey design is very crucial.
> > > Specially if the data is gonna be sequential(timeseries kinda thing).
> > > You may end up with hotspotting problem. Use pre-splitted tables
> > > or hash the keys to avoid that. It'll also allow you to fetch the
> results
> > > faster.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Sun, Jan 20, 2013 at 1:20 AM, Asaf Mesika <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Start by telling us your row key design.
> > > > Check for pre splitting your table regions.
> > > > I managed to get to 25mb/sec write throughput in Hbase using 1 region
> > > > server. If your data is evenly spread you can get around 7 times that
> > in
> > > a
> > > > 10 regions server environment. Should mean that 1 gig should take 4
> > sec.
> > > >
> > > >
> > > > On Friday, January 18, 2013, praveenesh kumar wrote:
> > > >
> > > > > Hey,
> > > > > Can someone throw some pointers on what would be the best practice
> > for
> > > > bulk
> > > > > imports in hbase ?
> > > > > That would be really helpful.
> > > > >
> > > > > Regards,
> > > > > Praveenesh
> > > > >
> > > > > On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <
> [EMAIL PROTECTED]
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Just to add to whatever all the heavyweights have said above,
> your
> > MR
> > > > job
> > > > > > may not be as efficient as the MR job corresponding to your Hive
> > > query.
> > > > > You
> > > > > > can enhance the performance by setting the mapred config
> parameters
> > > > > wisely
> > > > > > and by tuning your MR job.
> > > > > >
> > > > > > Warm Regards,
> > > > > > Tariq
> > > > > > https://mtariq.jux.com/
> > > > > > cloudfront.blogspot.com
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan <
> > > > > > [EMAIL PROTECTED] <javascript:;>> wrote:
> > > > > >
> > > > > > > Hive is more for batch and HBase is for more of real time data.
> > > > > > >
> > > > > > > Regards
> > > > > > > Ram
> > > > > > >
> > > > > > > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <
> > > [EMAIL PROTECTED]
> > > > <javascript:;>
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > In case of Hive data insertion means placing the file under
> > table
> > > > > path
> > > > > > in
> > > > > > > > HDFS.  HBase need to read the data and convert it into its
> > > format.
> > > > > > > (HFiles)
> > > > > > > > MR is doing this work..  So this makes it clear that HBase
+
Anoop Sam John 2013-01-21, 06:36
+
Mohammad Tariq 2013-01-21, 06:39