Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hive + Hbase scanning performance


Copy link to this message
-
Re: Hive + Hbase scanning performance
I do not know much about Hive. Sorry.

It all depends on where Hive creates the ClientScanner object. Normally you would call HTable.getScanner(Scan) in order to get a scanner.
ClientScanner checks whether the scannerCaching on the passed Scan object is > 0, if so it takes that, otherwise it looks into the environment Configuration for hbase.client.scanner.caching and defaults to 1 if not set.

So it all depends on what Configuration Hive sees.
________________________________
 From: java8964 <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, February 10, 2014 2:33 PM
Subject: RE: Hive + Hbase scanning performance
 

Hi, Lars:
Is there any logging I can enable to verify this?
I am not questioning your knowledge, but from my performance testing, I really didn't see any result.
I read org.apache.hadoop.hbase.client.Scan of Hbase 0.94.3 version, I didn't see any logging I can use to check if the cache value is being set on what value.
From the Hive code org.apache.hadoop.hive.hbase.HiveBaseTableInputFormat, it will create a Scan object with default caching value (-1), and set this scan into its BaseClass, which is

org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.
I believe then this Scan class will be serialized to the server and I didn't find any place its caching value will be reset based on the Configuration. Of course, I maybe miss it since I just start reading Hbase codebase and not knowing too much about it.
Any log in the server side can show the cache value, if I change any log level? If so, how?
Also, can you comment out about Hive Jira https://issues.apache.org/jira/browse/HIVE-3603?
In fact, I have the same question as the 2nd to last comment in the Jira ticket, but no one ever answered it.
Quoted:
Swarnim Kulkarni added a comment - 26/Aug/13 19:28Edward Capriolo Thanks! Also how is setting this property different than directly setting the "hbase.client.scanner.caching" property in hive-site.xml without this enhancement? Wouldn't they have the same effect?

Thanks
Yong

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB