Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Loading data from Hive to HBase takes too long


+
Hao Ren 2013-08-19, 08:40
Copy link to this message
-
Re: Loading data from Hive to HBase takes too long
Update:

I messed up some queries, here are the right ones:

CREATE TABLE hbase_table (
material_id int,
new_id_client int,
last_purchase_date int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf1:idclt,cf1:dt_last_purchase")
TBLPROPERTIES("hbase.table.name" = "test");

insert OVERWRITE TABLE hbase_table
select * from test;  -- takes a long time (about 8 hours)

# bin/hadoop dfs -dus /user/hive/warehouse/test
hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/user/hive/warehouse/test
1318012108

the table 'test' is just about 1.3 GB.

Le 19/08/2013 10:40, Hao Ren a �crit :
> Hi,
>
> I am runing Hive and Hbase on the same Amazon EC2 cluster, where Hbase
> is in a pseudo-distributed mode.
>
> After integrating HBase in Hive, I find that it takes a long time when
> runing a "insert overwrite" query from hive in order to load data into
> a related HBase table.
>
> In fact, the size of data is about 1.3Gb. I dont think it's normal.
>
> Maybe there are something wrong with my configuration.
>
> Here are some queries:
>
> CREATE TABLE hbase_table (
> material_id int,
> new_id_client int,
> last_purchase_date int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,cf1:idclt,cf1:dt_last_purchase")
> TBLPROPERTIES("hbase.table.name" = "test");
>
> insert OVERWRITE TABLE t_LIGNES_DERN_VENTES
> select * from test;  -- takes a long time (about 8 hours)
>
>
> Here are some configurations files for my cluster :
>
> # cat hive/conf/hive-site.xml
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>ip-10-159-41-177.ec2.internal</value>
>     </property>
>
>     <property>
>         <name>hive.aux.jars.path</name>
> <value>/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar</value>
>
>     </property>
>
>     <property>
>         <name>hbase.client.scanner.caching</name>
>         <value>10000</value>
>     </property>
>
> </configuration>
>
> # cat hbase-0.92.0/conf/hbase-site.xml
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
>     <property>
>         <name>hbase.rootdir</name>
> <value>hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/hbase</value>
>     </property>
>
>     <property>
>         <name>hbase.cluster.distributed</name>
>         <value>true</value>
>     </property>
>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>ip-10-159-41-177.ec2.internal</value>
>     </property>
>
>     <property>
>         <name>hbase.client.scanner.caching</name>
>         <value>10000</value>
>     </property>
>
> </configuration>
>
> Any help is highly appreciated!
>
> Thank you.
>
> Hao
>
--
Hao Ren
ClaraVista
www.claravista.fr
+
Hao Ren 2013-08-19, 08:50
+
lars hofhansl 2013-08-19, 23:51
+
Hao Ren 2013-08-20, 08:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB