Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Scan vs Put vs Get


+
Jean-Marc Spaggiari 2012-06-28, 12:12
Copy link to this message
-
RE: Scan vs Put vs Get
>blockCacheHitRatio=69%
Seems blocks you are getting from cache.  
You can check with Blooms also once.

You can enable the usage of bloom using the config param "io.storefile.bloom.enabled" set to true  . This will enable the usage of bloom globally
Now you need to set the bloom type for your CF
HColumnDescriptor#setBloomFilterType()   U can check with type BloomType.ROW

-Anoop-

_____________________________________
From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
Sent: Thursday, June 28, 2012 5:42 PM
To: [EMAIL PROTECTED]
Subject: Re: Scan vs Put vs Get

Oh! I never looked at this part ;) Ok. I have it.

Here are the numbers for one server before the read:

blockCacheSizeMB=186.28
blockCacheFreeMB=55.4
blockCacheCount=2923
blockCacheHitCount=195999
blockCacheMissCount=89297
blockCacheEvictedCount=69858
blockCacheHitRatio=68%
blockCacheHitCachingRatio=72%

And here are the numbers after 100 iterations of 1000 gets for  the same server:

blockCacheSizeMB=194.44
blockCacheFreeMB=47.25
blockCacheCount=3052
blockCacheHitCount=232034
blockCacheMissCount=103250
blockCacheEvictedCount=83682
blockCacheHitRatio=69%
blockCacheHitCachingRatio=72%

Don't forget that there is between 40B and 50B of lines in the table,
so I don't think the servers can store all of them in memory. And
since I'm accessing based on a random key, odds to have the right row
in memory are small I think.

JM

2012/6/28, Ramkrishna.S.Vasudevan <[EMAIL PROTECTED]>:
> In 0.94
>
> The UI of the RS has a metrics table.  In that you can see
> blockCacheHitCount, blockCacheMissCount etc.  May be there is a variation
> when you do scan() and get() here.
>
> Regards
> Ram
>
>
>
>> -----Original Message-----
>> From: Jean-Marc Spaggiari [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, June 28, 2012 4:44 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Scan vs Put vs Get
>>
>> Wow. First, thanks a lot all for jumping into this.
>>
>> Let me try to reply to everyone in a single post.
>>
>> > How many Gets you batch together in one call
>> I tried with multiple different values from 10 to 3000 with similar
>> results.
>> Time to read 10 lines : 181.0 mseconds (55 lines/seconds)
>> Time to read 100 lines : 484.0 mseconds (207 lines/seconds)
>> Time to read 1000 lines : 4739.0 mseconds (211 lines/seconds)
>> Time to read 3000 lines : 13582.0 mseconds (221 lines/seconds)
>>
>> > Is this equal to the Scan#setCaching () that u are using?
>> The scan call is done after the get test. So I can't set the cache for
>> the scan before I do the gets. Also, I tried to run them separatly (On
>> time only the put, one time only the get, etc.) so I did not find a
>> way to setup the cache for the get.
>>
>> > If both are same u can be sure that the the number of NW calls is
>> coming almost same.
>> Here are the results for 10 000 gets and 10 000 scan.next(). Each time
>> I access the result to be sure they are sent to the client.
>> (gets) Time to read 10000 lines : 36620.0 mseconds (273 lines/seconds)
>> (scan) Time to read 10000 lines : 119.0 mseconds (84034 lines/seconds)
>>
>> >[Block caching is enabled?]
>> Good question. I don't know :( Is it enabled by default? How can I
>> verify or activate it?
>>
>> > Also have you tried using Bloom filters?
>> Not yet. They are on page 381 on Lars' book and I'm only on page 168 ;)
>>
>>
>> > What's the hbase version you're using?
>> I manually installed 0.94.0. I can try with an other version.
>>
>> > Is it repeatable?
>> Yes. I tries many many times by adding some options, closing some
>> process on the server side, remonving one datanode, adding one, etc. I
>> can see some small variations, but still in the same range. I was able
>> to move from 200 rows/second  to 300 rows/second. But that's not
>> really a significant improvment. Also, here are the results for 7
>> iterations of the same code.
>>
>> Time to read 1000 lines : 4171.0 mseconds (240 lines/seconds)
>> Time to read 1000 lines : 3439.0 mseconds (291 lines/seconds)
+
Jean-Marc Spaggiari 2012-06-28, 13:41
+
Jean-Marc Spaggiari 2012-06-28, 13:45
+
N Keywal 2012-06-28, 14:35
+
Jean-Marc Spaggiari 2012-06-28, 16:04
+
N Keywal 2012-06-28, 16:25
+
Jean-Marc Spaggiari 2012-06-28, 16:49
+
Jean-Marc Spaggiari 2012-06-28, 11:13
+
N Keywal 2012-06-28, 13:37
+
Ramkrishna.S.Vasudevan 2012-06-28, 11:59
+
Jean-Marc Spaggiari 2012-06-27, 23:34
+
Anoop Sam John 2012-06-28, 04:56
+
N Keywal 2012-06-28, 08:30
+
Ramkrishna.S.Vasudevan 2012-06-28, 08:44