Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - how does hbase get the latest version with immutable hfiles?


Copy link to this message
-
Re: how does hbase get the latest version with immutable hfiles?
Elliott Clark 2012-06-02, 18:18
If you want to get into the really nitty gritty I found Lars' presentation
really insightful.

http://www.hbasecon.com/sessions/learning-hbase-internals/

On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil <[EMAIL PROTECTED]>wrote:

>
> Hi there, I think you probably want to look at thisŠ
>
> Hbase catalog metadataŠ
>
> http://hbase.apache.org/book.html#arch.catalog
>
> How data is stored internallyŠ
>
> http://hbase.apache.org/book.html#regions.arch
>
> Lots of versioning description hereŠ
>
> http://hbase.apache.org/book.html#datamodel
>
>
>
> Long story short, client talks directly to RegionServers, Hbase looks at
> multiple StoreFiles.
>
>
>
> On 6/1/12 4:27 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:
>
> >(reference:
> >http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html)
> >
> >A row consists of a key, and column families, along with a timestamp.
> >
> >So for example:
> >
> >key = com.example.com/some/path
> >
> >cf: outboundlinks {
> >      com.example.com/link1,
> >     com.example.com/link2,
> >     ..
> >}
> >
> >Data is stored like this:
> >
> >Region Server -> Store -> StoreFile -> HFile
> >
> >Now when a client requests a particular key, the hmaster figures out which
> >region server holds the data, this information is returned the client
> >(which saves it locally), and then it makes a request to the region
> >server.
> >
> >Now since the actual data files are immutable, if you modify a particular
> >value in a CF, it is tombestombed (not sure how that works but understand
> >it at a high level).
> >
> >So if I make a request for a given key, going with the example above, a
> >particular url on the website example.com, and i want all the
> >outboundlinks
> >I reference the column family "outboudnlinks" which can store millions of
> >urls.
> >
> >What process/service/class is in charge of assembling the various files to
> >get all the correct data?
> >
> >Summary of my question:
> >What I am trying to understand is, if a particular CF has millions of
> >values, and if a single value is mutated, a new file has to be created.
> >So
> >this means, if I query for that value i.e. it is included in my result
> >set,
> >how does hbase know where to look for the latest data?
> >
> >So basically from what I understand, making a get request for a particular
> >key, cf will have to potentially look at more than one StoreFile (or
> >HFile?) correct?
>
>
>