Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Loading data, hbase slower than Hive?


Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Mohammad Tariq 2013-01-17, 17:46
Just to add to whatever all the heavyweights have said above, your MR job
may not be as efficient as the MR job corresponding to your Hive query. You
can enhance the performance by setting the mapred config parameters wisely
and by tuning your MR job.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> Hive is more for batch and HBase is for more of real time data.
>
> Regards
> Ram
>
> On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <[EMAIL PROTECTED]>
> wrote:
>
> > In case of Hive data insertion means placing the file under table path in
> > HDFS.  HBase need to read the data and convert it into its format.
> (HFiles)
> > MR is doing this work..  So this makes it clear that HBase will be
> slower.
> > :)  As Michael said the read operation...
> >
> >
> >
> > -Anoop-
> >
> > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath <[EMAIL PROTECTED]
> > >wrote:
> >
> > >   Hi,
> > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins.
> > > It's a 20 gb data set approx 230 million records. The data is in hdfs,
> > > single text file. The cluster is 11 nodes, 8 cores.
> > >
> > > I loaded this in hive, partitioned by date and bucketed into 32 and
> > sorted.
> > > Time taken is 6 mins.
> > >
> > > I loaded the same data into hbase, in the same cluster by writing a map
> > > reduce code. It took 1hr 14 mins. The cluster wasn't running anything
> > else
> > > and assuming that the code that i wrote is good enough, what is it that
> > > makes hbase slower than hive in loading the data?
> > >
> > > Thanks,
> > > Austin
> > >
> >
>