Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Loading data from Hive to HBase takes too long


Copy link to this message
-
Re: Loading data from Hive to HBase takes too long
Hao Ren 2013-08-19, 08:44
Update:

I messed up some queries, here are the right ones:

CREATE TABLE hbase_table (
material_id int,
new_id_client int,
last_purchase_date int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf1:idclt,cf1:dt_last_purchase")
TBLPROPERTIES("hbase.table.name" = "test");

insert OVERWRITE TABLE hbase_table
select * from test;  -- takes a long time (about 8 hours)

# bin/hadoop dfs -dus /user/hive/warehouse/test
hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/user/hive/warehouse/test
1318012108

the table 'test' is just about 1.3 GB.

Le 19/08/2013 10:40, Hao Ren a �crit :
> Hi,
>
> I am runing Hive and Hbase on the same Amazon EC2 cluster, where Hbase
> is in a pseudo-distributed mode.
>
> After integrating HBase in Hive, I find that it takes a long time when
> runing a "insert overwrite" query from hive in order to load data into
> a related HBase table.
>
> In fact, the size of data is about 1.3Gb. I dont think it's normal.
>
> Maybe there are something wrong with my configuration.
>
> Here are some queries:
>
> CREATE TABLE hbase_table (
> material_id int,
> new_id_client int,
> last_purchase_date int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,cf1:idclt,cf1:dt_last_purchase")
> TBLPROPERTIES("hbase.table.name" = "test");
>
> insert OVERWRITE TABLE t_LIGNES_DERN_VENTES
> select * from test;  -- takes a long time (about 8 hours)
>
>
> Here are some configurations files for my cluster :
>
> # cat hive/conf/hive-site.xml
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>ip-10-159-41-177.ec2.internal</value>
>     </property>
>
>     <property>
>         <name>hive.aux.jars.path</name>
> <value>/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar</value>
>
>     </property>
>
>     <property>
>         <name>hbase.client.scanner.caching</name>
>         <value>10000</value>
>     </property>
>
> </configuration>
>
> # cat hbase-0.92.0/conf/hbase-site.xml
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
>     <property>
>         <name>hbase.rootdir</name>
> <value>hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/hbase</value>
>     </property>
>
>     <property>
>         <name>hbase.cluster.distributed</name>
>         <value>true</value>
>     </property>
>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>ip-10-159-41-177.ec2.internal</value>
>     </property>
>
>     <property>
>         <name>hbase.client.scanner.caching</name>
>         <value>10000</value>
>     </property>
>
> </configuration>
>
> Any help is highly appreciated!
>
> Thank you.
>
> Hao
>
--
Hao Ren
ClaraVista
www.claravista.fr