Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - how does hbase get the latest version with immutable hfiles?


Copy link to this message
-
Re: how does hbase get the latest version with immutable hfiles?
S Ahmed 2012-06-04, 18:36
Once hbase has identified the file that contains the row key, what
algorithm is used?

I understand that keys are ordered lexically.
And are files ordered using quicksort?

On Sun, Jun 3, 2012 at 9:37 PM, Elliott Clark <[EMAIL PROTECTED]>wrote:

> There are slide.  I think you have to register with an email and fist/last
> name to download the ppt.
>
> On Sun, Jun 3, 2012 at 12:21 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
>
> > Elliot,
> >
> > Is there a video or slides?  I guess I have to register to view it?
> >
> > On Sat, Jun 2, 2012 at 2:18 PM, Elliott Clark <[EMAIL PROTECTED]
> > >wrote:
> >
> > > If you want to get into the really nitty gritty I found Lars'
> > presentation
> > > really insightful.
> > >
> > > http://www.hbasecon.com/sessions/learning-hbase-internals/
> > >
> > > On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > >
> > > > Hi there, I think you probably want to look at thisŠ
> > > >
> > > > Hbase catalog metadataŠ
> > > >
> > > > http://hbase.apache.org/book.html#arch.catalog
> > > >
> > > > How data is stored internallyŠ
> > > >
> > > > http://hbase.apache.org/book.html#regions.arch
> > > >
> > > > Lots of versioning description hereŠ
> > > >
> > > > http://hbase.apache.org/book.html#datamodel
> > > >
> > > >
> > > >
> > > > Long story short, client talks directly to RegionServers, Hbase looks
> > at
> > > > multiple StoreFiles.
> > > >
> > > >
> > > >
> > > > On 6/1/12 4:27 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:
> > > >
> > > > >(reference:
> > > > >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
> > )
> > > > >
> > > > >A row consists of a key, and column families, along with a
> timestamp.
> > > > >
> > > > >So for example:
> > > > >
> > > > >key = com.example.com/some/path
> > > > >
> > > > >cf: outboundlinks {
> > > > >      com.example.com/link1,
> > > > >     com.example.com/link2,
> > > > >     ..
> > > > >}
> > > > >
> > > > >Data is stored like this:
> > > > >
> > > > >Region Server -> Store -> StoreFile -> HFile
> > > > >
> > > > >Now when a client requests a particular key, the hmaster figures out
> > > which
> > > > >region server holds the data, this information is returned the
> client
> > > > >(which saves it locally), and then it makes a request to the region
> > > > >server.
> > > > >
> > > > >Now since the actual data files are immutable, if you modify a
> > > particular
> > > > >value in a CF, it is tombestombed (not sure how that works but
> > > understand
> > > > >it at a high level).
> > > > >
> > > > >So if I make a request for a given key, going with the example
> above,
> > a
> > > > >particular url on the website example.com, and i want all the
> > > > >outboundlinks
> > > > >I reference the column family "outboudnlinks" which can store
> millions
> > > of
> > > > >urls.
> > > > >
> > > > >What process/service/class is in charge of assembling the various
> > files
> > > to
> > > > >get all the correct data?
> > > > >
> > > > >Summary of my question:
> > > > >What I am trying to understand is, if a particular CF has millions
> of
> > > > >values, and if a single value is mutated, a new file has to be
> > created.
> > > > >So
> > > > >this means, if I query for that value i.e. it is included in my
> result
> > > > >set,
> > > > >how does hbase know where to look for the latest data?
> > > > >
> > > > >So basically from what I understand, making a get request for a
> > > particular
> > > > >key, cf will have to potentially look at more than one StoreFile (or
> > > > >HFile?) correct?
> > > >
> > > >
> > > >
> > >
> >
>