Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how does hbase get the latest version with immutable hfiles?


Copy link to this message
-
Re: how does hbase get the latest version with immutable hfiles?

Hi there, I think you probably want to look at thisŠ

Hbase catalog metadataŠ

http://hbase.apache.org/book.html#arch.catalog

How data is stored internallyŠ

http://hbase.apache.org/book.html#regions.arch

Lots of versioning description hereŠ

http://hbase.apache.org/book.html#datamodel

Long story short, client talks directly to RegionServers, Hbase looks at
multiple StoreFiles.

On 6/1/12 4:27 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:

>(reference:
>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html)
>
>A row consists of a key, and column families, along with a timestamp.
>
>So for example:
>
>key = com.example.com/some/path
>
>cf: outboundlinks {
>      com.example.com/link1,
>     com.example.com/link2,
>     ..
>}
>
>Data is stored like this:
>
>Region Server -> Store -> StoreFile -> HFile
>
>Now when a client requests a particular key, the hmaster figures out which
>region server holds the data, this information is returned the client
>(which saves it locally), and then it makes a request to the region
>server.
>
>Now since the actual data files are immutable, if you modify a particular
>value in a CF, it is tombestombed (not sure how that works but understand
>it at a high level).
>
>So if I make a request for a given key, going with the example above, a
>particular url on the website example.com, and i want all the
>outboundlinks
>I reference the column family "outboudnlinks" which can store millions of
>urls.
>
>What process/service/class is in charge of assembling the various files to
>get all the correct data?
>
>Summary of my question:
>What I am trying to understand is, if a particular CF has millions of
>values, and if a single value is mutated, a new file has to be created.
>So
>this means, if I query for that value i.e. it is included in my result
>set,
>how does hbase know where to look for the latest data?
>
>So basically from what I understand, making a get request for a particular
>key, cf will have to potentially look at more than one StoreFile (or
>HFile?) correct?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB