Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Region/Table Hotspotting

Joarder KAMAL 2013-02-11, 02:17
lars hofhansl 2013-02-11, 05:14
Joarder KAMAL 2013-02-11, 05:56
Ted Yu 2013-02-11, 14:55
Kevin Odell 2013-02-11, 02:32
Joarder KAMAL 2013-02-11, 03:43
Copy link to this message
Re: HBase Region/Table Hotspotting
Matt Corgan summarized the pro and con of having large number of regions



On Sun, Feb 10, 2013 at 7:43 PM, Joarder KAMAL <[EMAIL PROTECTED]> wrote:

> Hi Kevin,
> Thanks a lot for your great answers.
> Regarding Q5. To clarify,
> lets say Facebook is using HBase for the integrated messaging/chat/email
> system in a very large-scale setup. And schema design of such system can
> change over the years (even over the months). Workload patterns may also
> change due to different usage characteristics (like the rate of messaging
> may be higher during a protest/specific event in a particular country). So,
> region/table hotspots have been created at random region servers within the
> cluster despite careful schema design and pre-planning.
> The facebook team rush to split the hotspotted regions manually and
> redistribute them over a new set of physical machines which are recently
> added to the system to increase scalability in the face of high user
> demand. Now hotspotted region data could be transferred into new physical
> machines gradually to handle the situations. Now if the shard (region) size
> is small enough then data transfer cost over the network could be minimum
> otherwise large volume of data needs to be transferred instantly.
> I have found in many places it is discouraged to have a large number of
> regions systems. However, would it be possible to have very large number of
> regions in a system thus minimizing data transfer cost in case hotspotting
> due to workload/design characteristics. Is there any drawbacks or known
> side-effects? I am rethinking other possibilities other pre-planned schema
> and row-key designs.
> Thanks again.
> Regards,
> Joarder Kamal
> On 11 February 2013 13:32, Kevin O'dell <[EMAIL PROTECTED]> wrote:
> > Hi Joarder,
> >
> >   Welcome to the HBase world.  Let me take some time to address your
> > questions the best I can:
> >
> >  1. How often you are facing Region or Table Hotspotting in HBase
> >    production systems? <--- Hotspotting is not something that just
> happens.
> >  This is usually caused by bad key design and writing to one region more
> > than the others.  I would recommend watching some of Lar's YouTube videos
> > on Schema Design in HBase.
> >    2. If a hotspot is created, how quickly it is automatically cleared
> out
> >    (assuming sudden workload change)? <--- It will not be automatically
> > "cleared out" I think you may be misinformed here.  Basically, it is on
> you
> > to watch you table and your write distribution and determine that you
> have
> > a hotspot and take the necessary action.  Usually the only action is to
> > split the region.  If hotspots become a habitual problem you would most
> > likely want to go back and re-evaluate your current key.
> >    3. How often this kind of situation happens - A hotspot is detected
> and
> >    vanished out before taking an action? or hotspots stays longer period
> of
> >    time? <--- Please see above
> >    4. Or if the hotspot is stays, how it is handled (in general) in
> >    production system? <--- Some people have to hotspot on purpose early
> on,
> > because they only write to a subset of regions.  You will have to
> manually
> > watch for hotspots(which is much easier in later releases).
> >    5. How large data transfer cost is minimized or avoid for re-sharding
> >    regions within a cluster in a single data center or within WAN? <---
> Not
> > quite sure what you are saying here, so I will take a best guess at it.
> >  Sharding is handled in HBase by region splitting.  The best way to
> success
> > in HBase is to understand your data and you needs BEFORE you create you
> > table and start writing into HBase.  This way you can presplit your table
> > to handle the incoming data and you won't have to do a massive amounts of