Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - how client location a region/tablet?


Copy link to this message
-
Re: how client location a region/tablet?
Doug Meil 2012-08-23, 12:21

For further information about the catalog tables and region-regionserver
assignment, see thisŠ

http://hbase.apache.org/book.html#arch.catalog
On 8/19/12 7:36 AM, "Lin Ma" <[EMAIL PROTECTED]> wrote:

>Thank you Stack, especially for the smart 6 round trip guess for the
>puzzle. :-)
>
>1. "Yeah, we client cache's locations, not the data." -- does it mean for
>each client, it will cache all location information of a HBase cluster,
>i.e. which physical server owns which region? Supposing each region has
>128M bytes, for a big cluster (P-bytes level), total data size / 128M is
>not a trivial number, not sure if any overhead to client?
>2. A bit confused by what do you mean "not the data"? For the client
>cached
>location information, it should be the data in table METADATA, which is
>region / physical server mapping data. Why you say not data (do you mean
>real content in each region)?
>
>regards,
>Lin
>
>On Sun, Aug 19, 2012 at 12:40 PM, Stack <[EMAIL PROTECTED]> wrote:
>
>> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>> > Hello guys,
>> >
>> > I am referencing the Big Table paper about how a client locates a
>>tablet.
>> > In section 5.1 Tablet location, it is mentioned that client will cache
>> all
>> > tablet locations, I think it means client will cache root tablet in
>> > METADATA table, and all other tablets in METADATA table (which means
>> client
>> > cache the whole METADATA table?). My question is, whether HBase
>> implements
>> > in the same or similar way? My concern or confusion is, supposing each
>> > tablet or region file is 128M bytes, it will be very huge space (i.e.
>> > memory footprint) for each client to cache all tablets or region
>>files of
>> > METADATA table. Is it doable or feasible in real HBase clusters?
>>Thanks.
>> >
>>
>> Yeah, we client cache's locations, not the data.
>>
>>
>> > BTW: another confusion from me is in the paper of Big Table section
>>5.1
>> > Tablet location, it is mentioned that "If the client¹s cache is stale,
>> the
>> > location algorithm could take up to six round-trips, because stale
>>cache
>> > entries are only discovered upon misses (assuming that METADATA
>>tablets
>> do
>> > not move very frequently).", I do not know how the 6 times round trip
>> time
>> > is calculated, if anyone could answer this puzzle, it will be great.
>>:-)
>> >
>>
>> I'm not sure what the 6 is about either.  Here is a guesstimate:
>>
>> 1. Go to cached location for a server for a particular user region,
>> but server says that it does not have a region, the client location is
>> stale
>> 2. Go back to client cached meta region that holds user region w/ row
>> we want, but its location is stale.
>> 3. Go to root location, to find new location of meta, but the root
>> location has moved.... what the client has is stale
>> 4. Find new root location and do lookup of meta region location
>> 5. Go to meta region location to find new user region
>> 6. Go to server w/ user region
>>
>> St.Ack
>>