Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Possibility of using timestamp as row key in HBase


Copy link to this message
-
Re: Possibility of using timestamp as row key in HBase
You can use prefix split policy. Put the Same prefix for the data you need
in the same region and thus achieve locality of this data and also haves a
good load of your data and avoid split policy.
I'm not sure you really need the requirement you described below unless I
didn't follow your business requirements very well

On Thursday, June 20, 2013, yun peng wrote:

> It is our requirement that one batch of data writes (say of Memstore size)
> should be in one RS. And
> salting prefix, while even the load, may not have this property.
>
> Our problem is really how to manipulate/customise the mapping of row key
> (or row key range) to the region servers,
> so that after one region overflows and starts to flush, the write stream
> can be automatically redirected to next region server,
> like in a round robin way?
>
> Is it possible to customize such policy on hmaster? Or there is a similiar
> way as what CoProcessor does on region servers...
>
>
> On Wed, Jun 19, 2013 at 4:58 PM, Asaf Mesika <[EMAIL PROTECTED]<javascript:;>>
> wrote:
>
> > The new splitted region might be moved due to load balancing. Aren't you
> > experiencing the classic hot spotting? Only 1 RS getting all write
> traffic?
> > Just place a preceding byte before the time stamp and round robin each
> put
> > on values 1-num of region servers.
> >
> > On Wednesday, June 19, 2013, yun peng wrote:
> >
> > > Hi, All,
> > > Our use case requires to persist a stream into system like HBase. The
> > > stream data is in format of <timestamp, value>. In other word,
> timestamp
> > is
> > > used as rowkey. We want to explore whether HBase is suitable for such
> > kind
> > > of data.
> > >
> > > The problem is that the domain of row key (or timestamp) grow
> constantly.
> > > For example, given 3 nodes, n1 n2 n3, they are resp. hosting row key
> > > partition [0,4], [5, 9], [10,12]. Currently it is the last node n3 who
> is
> > > busy receiving upcoming writes (of row key 13 and 14). This continues
> > until
> > > the region reaches max size 5 (that is, partition grows to [10,14]) and
> > > potentially splits.
> > >
> > > I am not expert on HBase split, but I am wondering after split, will
> the
> > > new writes still go to node n3 (for [10,14]) or the write stream can be
> > > intelligently redirected to other less busy node, like n1.
> > >
> > > In case HBase can't do things like this, how easy is it to extend HBase
> > for
> > > such functionality? Thanks...
> > > Yun
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB