Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Possibility of using timestamp as row key in HBase


+
yun peng 2013-06-19, 20:04
+
Asaf Mesika 2013-06-19, 20:58
+
yun peng 2013-06-19, 21:10
+
Asaf Mesika 2013-06-19, 21:26
+
yun peng 2013-06-19, 21:59
+
Asaf Mesika 2013-06-20, 13:32
+
yun peng 2013-06-20, 18:42
+
Asaf Mesika 2013-06-21, 05:26
+
yun peng 2013-06-21, 15:38
Copy link to this message
-
Re: Possibility of using timestamp as row key in HBase
You can specify max size to indicate the region split (when a region should
get split) But this size is the size of the HFile. To be precise it is the
size of the biggest HFile under that region. If u specify this size as 10G
and when the region is having a file of size bigger than 10G the region
will get split into 2.  There were some proposal and jira to consider all
the HFiles size sum to decide on the split but it is not yet done.   Means,
the data in the memstore and all wont be considered for the split. I think
ur idea is to keep the data in memory and when the memstore limit reaches
split it into 2 regions and so 2 memstores??

-Anoop-

On Fri, Jun 21, 2013 at 10:56 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

>  On Thu, Jun 20, 2013 at 9:42 PM, yun peng <[EMAIL PROTECTED]> wrote:
>
> > Thanks Asaf, I made the response inline.
> >
> > On Thu, Jun 20, 2013 at 9:32 AM, Asaf Mesika <[EMAIL PROTECTED]>
> > wrote:
> >
> > > On Thu, Jun 20, 2013 at 12:59 AM, yun peng <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Thanks for the reply. The idea is interesting, but in practice, our
> > > client
> > > > don't know in advance how many data should be put to one RS. The data
> > > write
> > > > is redirected to next RS, only when current RS is initialising a
> > flush()
> > > > and begins to block the stream..
> > > >
> > > > Can a single RS handle the load of the duration until HBase splits
> the
> > > region and load balancing kicks in and moves the region another server?
> > >
> > > Right, currently the timeseries data (i.e., with sequential rowkey) is
> > meta data in our system,
> > and is not that heavy weight... it can be handled by a single RS...
> >
> >
> >
> > > > The real problem is not about splitting existing region, but instead
> > > about
> > > > adding a new region (or new key range).
> > > > In the original example, before node n3 overflows, the system is like
> > > > n1 [0,4],
> > > > n2 [5,9],
> > > > n3 [10,14]
> > > > then n3 start to flush() (say Memstore.size = 5) which may block the
> > > write
> > > > stream to n3. We want the subsequent write stream to redirect back
> to,
> > > say
> > > > n1. so now n1 is accepting 15, 16... for range [15,19].
> > > >
> > > Flush does not block HTable.put() or HTable.batch(), unless your system
> > is
> > > not tuned and your flushes are slow.
> > >
> > > If I understand right, flush() need to sort data, build index and
> > sequentially write to disk.. which I think
> > should, if not block, atleast interfere a lot with the thread for
> in-memory
> > write (plus WAL). A drop in write
> > throughput can be expected.
> >
> > I think all those phases of sorting and index building are done per
> insertion of Put to the Memstore, thus the flush only dumps the bytes from
> memory to disk (network). It doesn't interfere with other write happening
> at the same time, since they open a new memstore and directs the write
> there, and asynchronously flush the old memstore to disk. They only if the
> new memstore if filled up very quickly before you finish flushing the first
> one.
> Regarding WAL, it happens before writing to the memstore. They first get an
> ack on writing to the WAL, then write to the memstore and then ack back to
> the client. I don't see any blocking here.
>
>
>
> > >
> > > > As I understand it right, the above behaviour should change HBase's
> > > normal
> > > > way to manage region-key mapping. And we want to know how much effort
> > to
> > > > put to change HBase?
> > > >
> > > Well, as I understand it - you write to n3, to a specific region (say
> > > 10,inf). Once you  pass the max size, it splits into (10,14) and
> > (15,inf).
> > > If now n3 RS has more than the average regions per RS, one region will
> > move
> > > to another RS. It may be (10,14) or (15,inf).
> > >
> > > For example, is it possible to specify the "max size" of split to be
> > equal
> > to Memstore.size
> > so that flush and split (actually just updating range from [10,inf) to
+
谢良 2013-06-20, 03:35
+
Bing Jiang 2013-06-20, 04:37
+
yun peng 2013-06-20, 17:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB