Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how does hbase get the latest version with immutable hfiles?


Copy link to this message
-
Re: how does hbase get the latest version with immutable hfiles?
Once hbase has identified the file that contains the row key, what
algorithm is used?

I understand that keys are ordered lexically.
And are files ordered using quicksort?

On Sun, Jun 3, 2012 at 9:37 PM, Elliott Clark <[EMAIL PROTECTED]>wrote:

> There are slide.  I think you have to register with an email and fist/last
> name to download the ppt.
>
> On Sun, Jun 3, 2012 at 12:21 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
>
> > Elliot,
> >
> > Is there a video or slides?  I guess I have to register to view it?
> >
> > On Sat, Jun 2, 2012 at 2:18 PM, Elliott Clark <[EMAIL PROTECTED]
> > >wrote:
> >
> > > If you want to get into the really nitty gritty I found Lars'
> > presentation
> > > really insightful.
> > >
> > > http://www.hbasecon.com/sessions/learning-hbase-internals/
> > >
> > > On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > >
> > > > Hi there, I think you probably want to look at thisŠ
> > > >
> > > > Hbase catalog metadataŠ
> > > >
> > > > http://hbase.apache.org/book.html#arch.catalog
> > > >
> > > > How data is stored internallyŠ
> > > >
> > > > http://hbase.apache.org/book.html#regions.arch
> > > >
> > > > Lots of versioning description hereŠ
> > > >
> > > > http://hbase.apache.org/book.html#datamodel
> > > >
> > > >
> > > >
> > > > Long story short, client talks directly to RegionServers, Hbase looks
> > at
> > > > multiple StoreFiles.
> > > >
> > > >
> > > >
> > > > On 6/1/12 4:27 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:
> > > >
> > > > >(reference:
> > > > >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
> > )
> > > > >
> > > > >A row consists of a key, and column families, along with a
> timestamp.
> > > > >
> > > > >So for example:
> > > > >
> > > > >key = com.example.com/some/path
> > > > >
> > > > >cf: outboundlinks {
> > > > >      com.example.com/link1,
> > > > >     com.example.com/link2,
> > > > >     ..
> > > > >}
> > > > >
> > > > >Data is stored like this:
> > > > >
> > > > >Region Server -> Store -> StoreFile -> HFile
> > > > >
> > > > >Now when a client requests a particular key, the hmaster figures out
> > > which
> > > > >region server holds the data, this information is returned the
> client
> > > > >(which saves it locally), and then it makes a request to the region
> > > > >server.
> > > > >
> > > > >Now since the actual data files are immutable, if you modify a
> > > particular
> > > > >value in a CF, it is tombestombed (not sure how that works but
> > > understand
> > > > >it at a high level).
> > > > >
> > > > >So if I make a request for a given key, going with the example
> above,
> > a
> > > > >particular url on the website example.com, and i want all the
> > > > >outboundlinks
> > > > >I reference the column family "outboudnlinks" which can store
> millions
> > > of
> > > > >urls.
> > > > >
> > > > >What process/service/class is in charge of assembling the various
> > files
> > > to
> > > > >get all the correct data?
> > > > >
> > > > >Summary of my question:
> > > > >What I am trying to understand is, if a particular CF has millions
> of
> > > > >values, and if a single value is mutated, a new file has to be
> > created.
> > > > >So
> > > > >this means, if I query for that value i.e. it is included in my
> result
> > > > >set,
> > > > >how does hbase know where to look for the latest data?
> > > > >
> > > > >So basically from what I understand, making a get request for a
> > > particular
> > > > >key, cf will have to potentially look at more than one StoreFile (or
> > > > >HFile?) correct?
> > > >
> > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB