-Re: Hive + Hbase scanning performance
lars hofhansl 2014-02-10, 23:04
I do not know much about Hive. Sorry.
It all depends on where Hive creates the ClientScanner object. Normally you would call HTable.getScanner(Scan) in order to get a scanner.
ClientScanner checks whether the scannerCaching on the passed Scan object is > 0, if so it takes that, otherwise it looks into the environment Configuration for hbase.client.scanner.caching and defaults to 1 if not set.
So it all depends on what Configuration Hive sees.
From: java8964 <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Monday, February 10, 2014 2:33 PM
Subject: RE: Hive + Hbase scanning performance
Is there any logging I can enable to verify this?
I am not questioning your knowledge, but from my performance testing, I really didn't see any result.
I read org.apache.hadoop.hbase.client.Scan of Hbase 0.94.3 version, I didn't see any logging I can use to check if the cache value is being set on what value.
From the Hive code org.apache.hadoop.hive.hbase.HiveBaseTableInputFormat, it will create a Scan object with default caching value (-1), and set this scan into its BaseClass, which is
I believe then this Scan class will be serialized to the server and I didn't find any place its caching value will be reset based on the Configuration. Of course, I maybe miss it since I just start reading Hbase codebase and not knowing too much about it.
Any log in the server side can show the cache value, if I change any log level? If so, how?
Also, can you comment out about Hive Jira https://issues.apache.org/jira/browse/HIVE-3603?
In fact, I have the same question as the 2nd to last comment in the Jira ticket, but no one ever answered it.
Swarnim Kulkarni added a comment - 26/Aug/13 19:28Edward Capriolo Thanks! Also how is setting this property different than directly setting the "hbase.client.scanner.caching" property in hive-site.xml without this enhancement? Wouldn't they have the same effect?