Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Scan vs Put vs Get


+
Jean-Marc Spaggiari 2012-06-28, 12:12
Copy link to this message
-
RE: Scan vs Put vs Get
>blockCacheHitRatio=69%
Seems blocks you are getting from cache.  
You can check with Blooms also once.

You can enable the usage of bloom using the config param "io.storefile.bloom.enabled" set to true  . This will enable the usage of bloom globally
Now you need to set the bloom type for your CF
HColumnDescriptor#setBloomFilterType()   U can check with type BloomType.ROW

-Anoop-

_____________________________________
From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
Sent: Thursday, June 28, 2012 5:42 PM
To: [EMAIL PROTECTED]
Subject: Re: Scan vs Put vs Get

Oh! I never looked at this part ;) Ok. I have it.

Here are the numbers for one server before the read:

blockCacheSizeMB=186.28
blockCacheFreeMB=55.4
blockCacheCount=2923
blockCacheHitCount=195999
blockCacheMissCount=89297
blockCacheEvictedCount=69858
blockCacheHitRatio=68%
blockCacheHitCachingRatio=72%

And here are the numbers after 100 iterations of 1000 gets for  the same server:

blockCacheSizeMB=194.44
blockCacheFreeMB=47.25
blockCacheCount=3052
blockCacheHitCount=232034
blockCacheMissCount=103250
blockCacheEvictedCount=83682
blockCacheHitRatio=69%
blockCacheHitCachingRatio=72%

Don't forget that there is between 40B and 50B of lines in the table,
so I don't think the servers can store all of them in memory. And
since I'm accessing based on a random key, odds to have the right row
in memory are small I think.

JM

2012/6/28, Ramkrishna.S.Vasudevan <[EMAIL PROTECTED]>:
> In 0.94
>
> The UI of the RS has a metrics table.  In that you can see
> blockCacheHitCount, blockCacheMissCount etc.  May be there is a variation
> when you do scan() and get() here.
>
> Regards
> Ram
>
>
>
>> -----Original Message-----
>> From: Jean-Marc Spaggiari [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, June 28, 2012 4:44 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Scan vs Put vs Get
>>
>> Wow. First, thanks a lot all for jumping into this.
>>
>> Let me try to reply to everyone in a single post.
>>
>> > How many Gets you batch together in one call
>> I tried with multiple different values from 10 to 3000 with similar
>> results.
>> Time to read 10 lines : 181.0 mseconds (55 lines/seconds)
>> Time to read 100 lines : 484.0 mseconds (207 lines/seconds)
>> Time to read 1000 lines : 4739.0 mseconds (211 lines/seconds)
>> Time to read 3000 lines : 13582.0 mseconds (221 lines/seconds)
>>
>> > Is this equal to the Scan#setCaching () that u are using?
>> The scan call is done after the get test. So I can't set the cache for
>> the scan before I do the gets. Also, I tried to run them separatly (On
>> time only the put, one time only the get, etc.) so I did not find a
>> way to setup the cache for the get.
>>
>> > If both are same u can be sure that the the number of NW calls is
>> coming almost same.
>> Here are the results for 10 000 gets and 10 000 scan.next(). Each time
>> I access the result to be sure they are sent to the client.
>> (gets) Time to read 10000 lines : 36620.0 mseconds (273 lines/seconds)
>> (scan) Time to read 10000 lines : 119.0 mseconds (84034 lines/seconds)
>>
>> >[Block caching is enabled?]
>> Good question. I don't know :( Is it enabled by default? How can I
>> verify or activate it?
>>
>> > Also have you tried using Bloom filters?
>> Not yet. They are on page 381 on Lars' book and I'm only on page 168 ;)
>>
>>
>> > What's the hbase version you're using?
>> I manually installed 0.94.0. I can try with an other version.
>>
>> > Is it repeatable?
>> Yes. I tries many many times by adding some options, closing some
>> process on the server side, remonving one datanode, adding one, etc. I
>> can see some small variations, but still in the same range. I was able
>> to move from 200 rows/second  to 300 rows/second. But that's not
>> really a significant improvment. Also, here are the results for 7
>> iterations of the same code.
>>
>> Time to read 1000 lines : 4171.0 mseconds (240 lines/seconds)
>> Time to read 1000 lines : 3439.0 mseconds (291 lines/seconds)
+
Jean-Marc Spaggiari 2012-06-28, 13:41
+
Jean-Marc Spaggiari 2012-06-28, 13:45
+
N Keywal 2012-06-28, 14:35
+
Jean-Marc Spaggiari 2012-06-28, 16:04
+
N Keywal 2012-06-28, 16:25
+
Jean-Marc Spaggiari 2012-06-28, 16:49
+
Jean-Marc Spaggiari 2012-06-28, 11:13
+
N Keywal 2012-06-28, 13:37
+
Ramkrishna.S.Vasudevan 2012-06-28, 11:59
+
Jean-Marc Spaggiari 2012-06-27, 23:34
+
Anoop Sam John 2012-06-28, 04:56
+
N Keywal 2012-06-28, 08:30
+
Ramkrishna.S.Vasudevan 2012-06-28, 08:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB