Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Possibility of using timestamp as row key in HBase


Copy link to this message
-
Re: Possibility of using timestamp as row key in HBase
Asaf Mesika 2013-06-19, 21:26
You can use prefix split policy. Put the Same prefix for the data you need
in the same region and thus achieve locality of this data and also haves a
good load of your data and avoid split policy.
I'm not sure you really need the requirement you described below unless I
didn't follow your business requirements very well

On Thursday, June 20, 2013, yun peng wrote:

> It is our requirement that one batch of data writes (say of Memstore size)
> should be in one RS. And
> salting prefix, while even the load, may not have this property.
>
> Our problem is really how to manipulate/customise the mapping of row key
> (or row key range) to the region servers,
> so that after one region overflows and starts to flush, the write stream
> can be automatically redirected to next region server,
> like in a round robin way?
>
> Is it possible to customize such policy on hmaster? Or there is a similiar
> way as what CoProcessor does on region servers...
>
>
> On Wed, Jun 19, 2013 at 4:58 PM, Asaf Mesika <[EMAIL PROTECTED]<javascript:;>>
> wrote:
>
> > The new splitted region might be moved due to load balancing. Aren't you
> > experiencing the classic hot spotting? Only 1 RS getting all write
> traffic?
> > Just place a preceding byte before the time stamp and round robin each
> put
> > on values 1-num of region servers.
> >
> > On Wednesday, June 19, 2013, yun peng wrote:
> >
> > > Hi, All,
> > > Our use case requires to persist a stream into system like HBase. The
> > > stream data is in format of <timestamp, value>. In other word,
> timestamp
> > is
> > > used as rowkey. We want to explore whether HBase is suitable for such
> > kind
> > > of data.
> > >
> > > The problem is that the domain of row key (or timestamp) grow
> constantly.
> > > For example, given 3 nodes, n1 n2 n3, they are resp. hosting row key
> > > partition [0,4], [5, 9], [10,12]. Currently it is the last node n3 who
> is
> > > busy receiving upcoming writes (of row key 13 and 14). This continues
> > until
> > > the region reaches max size 5 (that is, partition grows to [10,14]) and
> > > potentially splits.
> > >
> > > I am not expert on HBase split, but I am wondering after split, will
> the
> > > new writes still go to node n3 (for [10,14]) or the write stream can be
> > > intelligently redirected to other less busy node, like n1.
> > >
> > > In case HBase can't do things like this, how easy is it to extend HBase
> > for
> > > such functionality? Thanks...
> > > Yun
> > >
> >
>