Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Loading data, hbase slower than Hive?


Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Michael Segel 2013-01-17, 16:48
The writes take longer in HBase.

Just how much longer may depend on how well you tuned HBase.

Now, having said that... suppose you want to find a single record in either HBase or Hive.
Which do you think will be faster? ;-)
On Jan 17, 2013, at 10:44 AM, Austin Chungath <[EMAIL PROTECTED]> wrote:

>  Hi,
> Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins.
> It's a 20 gb data set approx 230 million records. The data is in hdfs,
> single text file. The cluster is 11 nodes, 8 cores.
>
> I loaded this in hive, partitioned by date and bucketed into 32 and sorted.
> Time taken is 6 mins.
>
> I loaded the same data into hbase, in the same cluster by writing a map
> reduce code. It took 1hr 14 mins. The cluster wasn't running anything else
> and assuming that the code that i wrote is good enough, what is it that
> makes hbase slower than hive in loading the data?
>
> Thanks,
> Austin