Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Why HBase integation with Hive makes Hive slow


Copy link to this message
-
Re: Why HBase integation with Hive makes Hive slow
Need to set scanner caching, otherwise each call to next will be an network RTT.

________________________________
 From: Hao Ren <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, August 1, 2013 7:45 AM
Subject: Why HBase integation with Hive makes Hive slow
 

Hi,

I have a cluster (1 master + 3 slaves) on which there Hive, Hbase, and
Hadoop.

In order to do some daily row-level update routine, we need to integrate
Hbase with hive, but the performance is not good.

E.g. There are 2 tables in hive,
     hbase_table:  a hbase table created via Hive
     hive_table: a native hive table
  both hold the same data set.

When runing:
     select count(*) from hbase_table; ===> takes 500 s
     select count(*) from hive_table; ===> takes 6 s

I have tried a lot of queries on the two tables. But hbase_table is
always very slow.

To be claire, I created the hbase_ table as below:

CREATE TABLE hbase_table (
idvisite string,
client_list Array<string>,
nb_client int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,clients:id_list,clients:nb")
TBLPROPERTIES("hbase.table.name" = "table_test")
;

And my Hbase is on pseudo-distributed mode.

I guess, at the beginning of a hive query execution, hive will load data
from Hbase, where serde takes a long time.

Could someone tell me how to improve my poor performance ?
Is this cause by my wrongly configured integration ?
Is a fully-distributed mode needed here ?

Thank you in advance for your time.

Hao.
--
Hao Ren
ClaraVista
www.claravista.fr
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB