Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How practical is it to add a timestamp oracle on Zookeeper

yun peng 2013-04-16, 12:14
Jean-Marc Spaggiari 2013-04-16, 12:19
yun peng 2013-04-16, 12:40
Jean-Marc Spaggiari 2013-04-16, 13:31
PG 2013-04-21, 15:10
kishore g 2013-04-21, 16:22
Jimmy Xiang 2013-04-21, 16:33
Bijieshan 2013-04-16, 12:23
Ted Yu 2013-04-16, 12:37
Michel Segel 2013-04-21, 16:36
Copy link to this message
Re: How practical is it to add a timestamp oracle on Zookeeper

I presume you have read the percolator paper. The design there uses a
single ts oracle, and BigTable itself as the transaction manager. In omid,
they also have a TS oracle, but I do not know how scalable it is. But using
ZK as the TS oracle would not work, since ZK can scale up to 40-50K
requests per second, but depending on the cluster size, you should be
getting much more than that. Especially considering all clients doing reads
and writes has to obtain a TS. Instead what you want is a TS that can scale
to millions of requests per sec. This can be achieved by the technique in
the percolator paper, by pre allocating a range by persisting to disk, and
an extremely lightweight rpc. I do not know whether Omid provides this.
There is a twitter project https://github.com/twitter/snowflake that you
might want to look at.

Hope this helps.

On Sun, Apr 21, 2013 at 9:36 AM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Time is relative.
> What does the timestamp mean?
> Sounds like a simple question, but its not. Is it the time your
> application says they wrote to HBase? Is it the time HBase first gets the
> row? Or is it the time that the row was written to the memstore?
> Each RS has its own clock in addition to your app server.
> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Apr 16, 2013, at 7:14 AM, yun peng <[EMAIL PROTECTED]> wrote:
> > Hi, All,
> > I'd like to add a global timestamp oracle on Zookeep to assign globally
> > unique timestamp for each Put/Get issued from HBase cluster. The reason I
> > put it on Zookeeper is that each Put/Get needs to go through it and
> unique
> > timestamp needs some global centralised facility to do it. But I am
> asking
> > how practical is this scheme, like anyone used in practice?
> >
> > Also, how difficulty is it to extend Zookeeper, or to inject code to the
> > code path of HBase inside Zookeeper. I know HBase has Coprocessor on
> region
> > server to let programmer to extend without recompiling HBase itself. Does
> > Zk allow such extensibility? Thanks.
> >
> > Regards
> > Yun