Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Loading data from Hive to HBase takes too long


+
Hao Ren 2013-08-19, 08:40
+
Hao Ren 2013-08-19, 08:44
Copy link to this message
-
Re: Loading data from Hive to HBase takes too long
Update:

There are 1 master and 3 slaves in my cluster.
They are all m1.medium instances.

*Instance Family* *Instance Type* *Processor Arch* *vCPU* *ECU*
*Memory (GiB)* *Instance Storage (GB)* *EBS-optimized Available*
*Network Performance*









General purpose m1.medium 32-bit or
64-bit 1 2 3.75 1 x 410 - Moderate
Le 19/08/2013 10:44, Hao Ren a �crit :
> Update:
>
> I messed up some queries, here are the right ones:
>
> CREATE TABLE hbase_table (
> material_id int,
> new_id_client int,
> last_purchase_date int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,cf1:idclt,cf1:dt_last_purchase")
> TBLPROPERTIES("hbase.table.name" = "test");
>
> insert OVERWRITE TABLE hbase_table
> select * from test;  -- takes a long time (about 8 hours)
>
> # bin/hadoop dfs -dus /user/hive/warehouse/test
> hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/user/hive/warehouse/test
> 1318012108
>
> the table 'test' is just about 1.3 GB.
>
>
>
> Le 19/08/2013 10:40, Hao Ren a �crit :
>> Hi,
>>
>> I am runing Hive and Hbase on the same Amazon EC2 cluster, where
>> Hbase is in a pseudo-distributed mode.
>>
>> After integrating HBase in Hive, I find that it takes a long time
>> when runing a "insert overwrite" query from hive in order to load
>> data into a related HBase table.
>>
>> In fact, the size of data is about 1.3Gb. I dont think it's normal.
>>
>> Maybe there are something wrong with my configuration.
>>
>> Here are some queries:
>>
>> CREATE TABLE hbase_table (
>> material_id int,
>> new_id_client int,
>> last_purchase_date int)
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
>> ":key,cf1:idclt,cf1:dt_last_purchase")
>> TBLPROPERTIES("hbase.table.name" = "test");
>>
>> insert OVERWRITE TABLE t_LIGNES_DERN_VENTES
>> select * from test;  -- takes a long time (about 8 hours)
>>
>>
>> Here are some configurations files for my cluster :
>>
>> # cat hive/conf/hive-site.xml
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>>
>>     <property>
>>         <name>hbase.zookeeper.quorum</name>
>>         <value>ip-10-159-41-177.ec2.internal</value>
>>     </property>
>>
>>     <property>
>>         <name>hive.aux.jars.path</name>
>> <value>/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar</value>
>>
>>     </property>
>>
>>     <property>
>>         <name>hbase.client.scanner.caching</name>
>>         <value>10000</value>
>>     </property>
>>
>> </configuration>
>>
>> # cat hbase-0.92.0/conf/hbase-site.xml
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>>
>>     <property>
>>         <name>hbase.rootdir</name>
>> <value>hdfs://ec2-54-234-17-36.compute-1.amazonaws.com:9010/hbase</value>
>>
>>     </property>
>>
>>     <property>
>>         <name>hbase.cluster.distributed</name>
>>         <value>true</value>
>>     </property>
>>
>>     <property>
>>         <name>hbase.zookeeper.quorum</name>
>>         <value>ip-10-159-41-177.ec2.internal</value>
>>     </property>
>>
>>     <property>
>>         <name>hbase.client.scanner.caching</name>
>>         <value>10000</value>
>>     </property>
>>
>> </configuration>
>>
>> Any help is highly appreciated!
>>
>> Thank you.
>>
>> Hao
>>
>
>
--
Hao Ren
ClaraVista
www.claravista.fr

+
lars hofhansl 2013-08-19, 23:51
+
Hao Ren 2013-08-20, 08:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB