Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Loading data, hbase slower than Hive?


Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Asaf Mesika 2013-01-19, 19:50
Start by telling us your row key design.
Check for pre splitting your table regions.
I managed to get to 25mb/sec write throughput in Hbase using 1 region
server. If your data is evenly spread you can get around 7 times that in a
10 regions server environment. Should mean that 1 gig should take 4 sec.
On Friday, January 18, 2013, praveenesh kumar wrote:

> Hey,
> Can someone throw some pointers on what would be the best practice for bulk
> imports in hbase ?
> That would be really helpful.
>
> Regards,
> Praveenesh
>
> On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <[EMAIL PROTECTED]<javascript:;>>
> wrote:
>
> > Just to add to whatever all the heavyweights have said above, your MR job
> > may not be as efficient as the MR job corresponding to your Hive query.
> You
> > can enhance the performance by setting the mapred config parameters
> wisely
> > and by tuning your MR job.
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
> >
> > On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan <
> > [EMAIL PROTECTED] <javascript:;>> wrote:
> >
> > > Hive is more for batch and HBase is for more of real time data.
> > >
> > > Regards
> > > Ram
> > >
> > > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <[EMAIL PROTECTED]<javascript:;>
> >
> > > wrote:
> > >
> > > > In case of Hive data insertion means placing the file under table
> path
> > in
> > > > HDFS.  HBase need to read the data and convert it into its format.
> > > (HFiles)
> > > > MR is doing this work..  So this makes it clear that HBase will be
> > > slower.
> > > > :)  As Michael said the read operation...
> > > >
> > > >
> > > >
> > > > -Anoop-
> > > >
> > > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath <
> [EMAIL PROTECTED] <javascript:;>
> > > > >wrote:
> > > >
> > > > >   Hi,
> > > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14
> > mins.
> > > > > It's a 20 gb data set approx 230 million records. The data is in
> > hdfs,
> > > > > single text file. The cluster is 11 nodes, 8 cores.
> > > > >
> > > > > I loaded this in hive, partitioned by date and bucketed into 32 and
> > > > sorted.
> > > > > Time taken is 6 mins.
> > > > >
> > > > > I loaded the same data into hbase, in the same cluster by writing a
> > map
> > > > > reduce code. It took 1hr 14 mins. The cluster wasn't running
> anything
> > > > else
> > > > > and assuming that the code that i wrote is good enough, what is it
> > that
> > > > > makes hbase slower than hive in loading the data?
> > > > >
> > > > > Thanks,
> > > > > Austin
> > > > >
> > > >
> > >
> >
>