Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Questions about HBase load balancing and HFile


Copy link to this message
-
Re: Questions about HBase load balancing and HFile
If hot means many requests, then it's only in 0.96 right? 0.94 is only
addressing capacity load on terms of numbers of regions per region server
of the same table.

On Monday, January 20, 2014, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. under heavy load by serving to hot regions
>
> Did you mean 'two hot regions' ?
> If so, the master will move one of them to another RS.
>
> Cheers
>
>
> On Mon, Jan 20, 2014 at 6:17 AM, Bill Q <[EMAIL PROTECTED]> wrote:
>
> > Hi Ted and Bharath,
> > Thanks a lot for the replies.
> >
> > For question #1, if there is a RS is under heavy load by serving to hot
> > regions, the HMaster will move one of the two regions to another RS, or
> > HMaster will split both of them and move the newly crated halves to other
> > RSs?
> >
> > For question #3, does this mean that a HFile has many 64k blocks, but
> > itself is around 64M (or 128M)?
> >
> >
> > Many thanks.
> >
> >
> > Bill
> >
> >
> > On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > For question #3, The block size Lars talks about is the blocksize
> inside
> > a
> > > HFile which is different from HDFS block size. Look at
> > > http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks
> to
> > > facilitate random access to data so that we can skip unnecessary disk
> > > blocks while gets/scans. Smaller the hfile block size better is the
> > random
> > > read performance. You can see the detailed hfile layout in that link.
> > >
> > > For question #4, You are correct, since the data resides on HDFS, each
> > > region server has access to all the storefiles (they just use hdfs api
> to
> > > read them). The reason they are still available after a (RS+datanode)
> > crash
> > > is because of the replication in hdfs. The store files still have valid
> > > replicas and namenode tries to maintain the replication factor by
> > > re-replicating them eventually.
> > >
> > >
> > > On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > For question #1, there is load balancer in HMaster which does the job
> > of
> > > > balancing region load.
> > > >
> > > > For number 2, the daughter regions stay on the same server as the
> > parent
> > > > after split. Later one or both of them may be moved to other region
> > > servers.
> > > >
> > > > Cheers
> > > >
> > > > On Jan 19, 2014, at 10:27 PM, Bill Q <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hi,
> > > > > I am trying to get more information about HBase. I would appreciate
> > > some
> > > > > answers to these few questions. Thanks a lot.
> > > > >
> > > > > 1. About load balancing: does HMaster monitor overloaded or low
> > loaded
> > > > > HRegionServer, and move some regions from the hot HRegionServer to
> > low
> > > > > loaded ones (with or without add new servers into the cluster,
> > > > > respectively)?
> > > > >
> > > > > 2. About region splitting: when splitting a region, will the newly
> > > > created
> > > > > regions stay on the current HRegionSever, or will HMaster assign
> some
> > > new
> > > > > HRegionServers to take the newly created two regions?
> > > > >
> > > > > 3. About HFile size: Lars mentioned here
> > > > >
> > >
> >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > > > the HFile size is default to 64k. How does this work while the
> > default
> > > > HDFS
> > > > > block is 64M/128M? Would the small HFile size waste lots of space
> on
> > > > HDFS?
> > > > >
> > > > > 4. About data locality: if a HRegionServer fails, the HMaster would
> > > > assign
> > > > > a new HRegionServer to take its place. But does this new
> > HRegionServer
> > > > > should have access to the storeFiles? I assumed that's how it works
> > by
> > > > > using HDFS's data replication. But after some readings, I got
> > confused.
> > > > It
> > > > > seems that the new HRegionServer can work without the storeFiles
> data
> > > a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB