yun peng 2013-06-19, 20:04
Asaf Mesika 2013-06-19, 20:58
yun peng 2013-06-19, 21:10
Asaf Mesika 2013-06-19, 21:26
yun peng 2013-06-19, 21:59
Asaf Mesika 2013-06-20, 13:32
Thanks Asaf, I made the response inline.
On Thu, Jun 20, 2013 at 9:32 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
> On Thu, Jun 20, 2013 at 12:59 AM, yun peng <[EMAIL PROTECTED]> wrote:
> > Thanks for the reply. The idea is interesting, but in practice, our
> > don't know in advance how many data should be put to one RS. The data
> > is redirected to next RS, only when current RS is initialising a flush()
> > and begins to block the stream..
> > Can a single RS handle the load of the duration until HBase splits the
> region and load balancing kicks in and moves the region another server?
> Right, currently the timeseries data (i.e., with sequential rowkey) is
meta data in our system,
and is not that heavy weight... it can be handled by a single RS...
> > The real problem is not about splitting existing region, but instead
> > adding a new region (or new key range).
> > In the original example, before node n3 overflows, the system is like
> > n1 [0,4],
> > n2 [5,9],
> > n3 [10,14]
> > then n3 start to flush() (say Memstore.size = 5) which may block the
> > stream to n3. We want the subsequent write stream to redirect back to,
> > n1. so now n1 is accepting 15, 16... for range [15,19].
> Flush does not block HTable.put() or HTable.batch(), unless your system is
> not tuned and your flushes are slow.
> If I understand right, flush() need to sort data, build index and
sequentially write to disk.. which I think
should, if not block, atleast interfere a lot with the thread for in-memory
write (plus WAL). A drop in write
throughput can be expected.
> > As I understand it right, the above behaviour should change HBase's
> > way to manage region-key mapping. And we want to know how much effort to
> > put to change HBase?
> Well, as I understand it - you write to n3, to a specific region (say
> 10,inf). Once you pass the max size, it splits into (10,14) and (15,inf).
> If now n3 RS has more than the average regions per RS, one region will move
> to another RS. It may be (10,14) or (15,inf).
> For example, is it possible to specify the "max size" of split to be equal
so that flush and split (actually just updating range from [10,inf) to
[10,14] in .META table,
without actual data split) can co-occur?
Given this possible, is it even possible to mandatorily indicate the new
interval [15, inf) should
be mapped to next RS (i.e., not based on # of regions on RS n3).
> > Besides, I found Chapter 9 Advanced usage in Definitive Book talks a bit
> > about this issue. And they are based on the idea of adding prefix or
> > In their terminology, we need the "sequential key" approach, but with
> > managed region mapping.
> Why do you need the sequential key approach? Let's say you have a group
> data correlated in some way but is scattered in 2-3 RS. You can always
> write a coprocessor to run some logic close to the data, and then run it
> again on the merged data in the client side, right?
> I agree with you on this general idea. Let me think a bit...
> > Yun
> > On Wed, Jun 19, 2013 at 5:26 PM, Asaf Mesika <[EMAIL PROTECTED]>
> > wrote:
> > > You can use prefix split policy. Put the Same prefix for the data you
> > need
> > > in the same region and thus achieve locality of this data and also
> > a
> > > good load of your data and avoid split policy.
> > > I'm not sure you really need the requirement you described below
> unless I
> > > didn't follow your business requirements very well
> > >
> > > On Thursday, June 20, 2013, yun peng wrote:
> > >
> > > > It is our requirement that one batch of data writes (say of Memstore
> > > size)
> > > > should be in one RS. And
> > > > salting prefix, while even the load, may not have this property.
> > > >
> > > > Our problem is really how to manipulate/customise the mapping of row
> > key
> > > > (or row key range) to the region servers,
> > > > so that after one region overflows and starts to flush, the write
Asaf Mesika 2013-06-21, 05:26
yun peng 2013-06-21, 15:38
Anoop John 2013-06-21, 08:55
谢良 2013-06-20, 03:35
Bing Jiang 2013-06-20, 04:37
yun peng 2013-06-20, 17:45