Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Loading data from Hive to HBase takes too long


Copy link to this message
-
Loading data from Hive to HBase takes too long
Hao Ren 2013-08-19, 08:40
Hi,

I am runing Hive and Hbase on the same Amazon EC2 cluster, where Hbase
is in a pseudo-distributed mode.

After integrating HBase in Hive, I find that it takes a long time when
runing a "insert overwrite" query from hive in order to load data into a
related HBase table.

In fact, the size of data is about 1.3Gb. I dont think it's normal.

Maybe there are something wrong with my configuration.

Here are some queries:

CREATE TABLE hbase_table (
material_id int,
new_id_client int,
last_purchase_date int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf1:idclt,cf1:dt_last_purchase")
TBLPROPERTIES("hbase.table.name" = "test");

insert OVERWRITE TABLE t_LIGNES_DERN_VENTES
select * from test;  -- takes a long time (about 8 hours)
Here are some configurations files for my cluster :

# cat hive/conf/hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

     <property>
         <name>hbase.zookeeper.quorum</name>
         <value>ip-10-159-41-177.ec2.internal</value>
     </property>

     <property>
         <name>hive.aux.jars.path</name>
<value>/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar</value>
     </property>

     <property>
         <name>hbase.client.scanner.caching</name>
         <value>10000</value>
     </property>

</configuration>

# cat hbase-0.92.0/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

     <property>
         <name>hbase.rootdir</name>
<value>hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/hbase</value>
     </property>

     <property>
         <name>hbase.cluster.distributed</name>
         <value>true</value>
     </property>

     <property>
         <name>hbase.zookeeper.quorum</name>
         <value>ip-10-159-41-177.ec2.internal</value>
     </property>

     <property>
         <name>hbase.client.scanner.caching</name>
         <value>10000</value>
     </property>

</configuration>

Any help is highly appreciated!

Thank you.

Hao

--
Hao Ren
ClaraVista
www.claravista.fr