Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Using HBase serving to replace memcached


+
Lin Ma 2012-08-18, 07:12
+
Drew Dahlke 2012-08-20, 13:26
+
Lin Ma 2012-08-20, 16:09
+
Asif Ali 2012-08-20, 19:26
+
J Mohamed Zahoor 2012-08-21, 10:55
+
Lin Ma 2012-08-21, 13:32
+
jmozah 2012-08-21, 14:45
+
Lin Ma 2012-08-21, 15:42
+
jmozah 2012-08-21, 15:56
+
Lin Ma 2012-08-21, 16:30
+
J Mohamed Zahoor 2012-08-22, 04:51
+
Lin Ma 2012-08-22, 12:11
Copy link to this message
-
RE: Using HBase serving to replace memcached
> I could be wrong. I think HFile index block (which is located at the end
>> of HFile) is a binary search tree containing all row-key values (of the
>> HFile) in the binary search tree. Searching a specific row-key in the
>> binary search tree could easily find whether a row-key exists (some node in
>> the tree has the same row-key value) or not. Why we need load every block
>> to find if the row exists?

I think there is some confusion with you people regarding the blooms and the block index.I will try to clarify this point.
Block index will be there with every HFile. Within an HFile the data will be written as multiple blocks. While reading data block by block only HBase read data from the HDFS layer. The block index contains the information regarding the blocks within that HFile. The information include the start and end rowkeys which resides in that particular block and the block information like offset of that block and its length etc. Now when a request comes for getting a rowkey 'x' all the HFiles within that region need to be checked.[KV can be present in any of the HFile] Now in order to know this row will be present in which block within an HFile, this block index will be used. Well this block index will be there in memory always. This lookup will tell only the possible block in which the row is present. HBase will load that block and will read through it to get the row which we are interested in now.
Bloom is like it will have information about each and every row added into that HFile[Block index wont have info about each and every row]. This bloom information will be there in memory always. So when a read request to get row 'x' in an Hfile comes, 1st the bloom is checked whether this row is there in this file or not. If this is not there, as per the bloom, no block at all will be fetched. But if bloom is not enabled, we might find one block which is having a row range such that 'x' comes in between and Hbase will load that block. So usage of blooms can avoid this IO. Hope this is clear for you now.

-Anoop-
________________________________________
From: Lin Ma [[EMAIL PROTECTED]]
Sent: Wednesday, August 22, 2012 5:41 PM
To: J Mohamed Zahoor; [EMAIL PROTECTED]
Subject: Re: Using HBase serving to replace memcached

Thanks Zahoor,

I read through the document you referred to, I am confused about what means
leaf-level index, intermediate-level index and root-level index. It is
appreciate if you could give more details what they are, or point me to the
related documents.

BTW: the document you pointed me is very good, however I miss some basic
background of 3 terms I mentioned above. :-)

regards,
Lin

On Wed, Aug 22, 2012 at 12:51 PM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote:

> I could be wrong. I think HFile index block (which is located at the end
>> of HFile) is a binary search tree containing all row-key values (of the
>> HFile) in the binary search tree. Searching a specific row-key in the
>> binary search tree could easily find whether a row-key exists (some node in
>> the tree has the same row-key value) or not. Why we need load every block
>> to find if the row exists?
>>
>>
> Hmm...
> It is a multilevel index. Only the root Index's (Data, Meta etc) are
> loaded when a region is opened. The rest of the tree (intermediate and leaf
> index's) are present in each block level.
> I am assuming a HFile v2 here for the discussion.
> Read this for more clarity http://hbase.apache.org/book/apes03.html
>
> Nice discussion. You made me read lot of things. :-)
> Now i will dig in to the code and check this out.
>
> ./Zahoor
>
+
anil gupta 2012-08-22, 16:57
+
Pamecha, Abhishek 2012-08-22, 17:28
+
J Mohamed Zahoor 2012-08-23, 04:04
+
Pamecha, Abhishek 2012-08-23, 05:05
+
Anoop Sam John 2012-08-23, 03:50
+
Lin Ma 2012-08-22, 13:28
+
Stack 2012-08-22, 15:28
+
Lin Ma 2012-08-21, 13:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB