Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how does hbase get the latest version with immutable hfiles?


Copy link to this message
-
Re: how does hbase get the latest version with immutable hfiles?
Elliot,

Is there a video or slides?  I guess I have to register to view it?

On Sat, Jun 2, 2012 at 2:18 PM, Elliott Clark <[EMAIL PROTECTED]>wrote:

> If you want to get into the really nitty gritty I found Lars' presentation
> really insightful.
>
> http://www.hbasecon.com/sessions/learning-hbase-internals/
>
> On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil <[EMAIL PROTECTED]
> >wrote:
>
> >
> > Hi there, I think you probably want to look at thisŠ
> >
> > Hbase catalog metadataŠ
> >
> > http://hbase.apache.org/book.html#arch.catalog
> >
> > How data is stored internallyŠ
> >
> > http://hbase.apache.org/book.html#regions.arch
> >
> > Lots of versioning description hereŠ
> >
> > http://hbase.apache.org/book.html#datamodel
> >
> >
> >
> > Long story short, client talks directly to RegionServers, Hbase looks at
> > multiple StoreFiles.
> >
> >
> >
> > On 6/1/12 4:27 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:
> >
> > >(reference:
> > >http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html)
> > >
> > >A row consists of a key, and column families, along with a timestamp.
> > >
> > >So for example:
> > >
> > >key = com.example.com/some/path
> > >
> > >cf: outboundlinks {
> > >      com.example.com/link1,
> > >     com.example.com/link2,
> > >     ..
> > >}
> > >
> > >Data is stored like this:
> > >
> > >Region Server -> Store -> StoreFile -> HFile
> > >
> > >Now when a client requests a particular key, the hmaster figures out
> which
> > >region server holds the data, this information is returned the client
> > >(which saves it locally), and then it makes a request to the region
> > >server.
> > >
> > >Now since the actual data files are immutable, if you modify a
> particular
> > >value in a CF, it is tombestombed (not sure how that works but
> understand
> > >it at a high level).
> > >
> > >So if I make a request for a given key, going with the example above, a
> > >particular url on the website example.com, and i want all the
> > >outboundlinks
> > >I reference the column family "outboudnlinks" which can store millions
> of
> > >urls.
> > >
> > >What process/service/class is in charge of assembling the various files
> to
> > >get all the correct data?
> > >
> > >Summary of my question:
> > >What I am trying to understand is, if a particular CF has millions of
> > >values, and if a single value is mutated, a new file has to be created.
> > >So
> > >this means, if I query for that value i.e. it is included in my result
> > >set,
> > >how does hbase know where to look for the latest data?
> > >
> > >So basically from what I understand, making a get request for a
> particular
> > >key, cf will have to potentially look at more than one StoreFile (or
> > >HFile?) correct?
> >
> >
> >
>