Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how does hbase get the latest version with immutable hfiles?


Copy link to this message
-
how does hbase get the latest version with immutable hfiles?
(reference:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html)

A row consists of a key, and column families, along with a timestamp.

So for example:

key = com.example.com/some/path

cf: outboundlinks {
      com.example.com/link1,
     com.example.com/link2,
     ..
}

Data is stored like this:

Region Server -> Store -> StoreFile -> HFile

Now when a client requests a particular key, the hmaster figures out which
region server holds the data, this information is returned the client
(which saves it locally), and then it makes a request to the region server.

Now since the actual data files are immutable, if you modify a particular
value in a CF, it is tombestombed (not sure how that works but understand
it at a high level).

So if I make a request for a given key, going with the example above, a
particular url on the website example.com, and i want all the outboundlinks
I reference the column family "outboudnlinks" which can store millions of
urls.

What process/service/class is in charge of assembling the various files to
get all the correct data?

Summary of my question:
What I am trying to understand is, if a particular CF has millions of
values, and if a single value is mutated, a new file has to be created.  So
this means, if I query for that value i.e. it is included in my result set,
how does hbase know where to look for the latest data?

So basically from what I understand, making a get request for a particular
key, cf will have to potentially look at more than one StoreFile (or
HFile?) correct?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB