Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How practical is it to add a timestamp oracle on Zookeeper


Copy link to this message
-
Re: How practical is it to add a timestamp oracle on Zookeeper
Jimmy Xiang 2013-04-21, 16:33
I think Yun wants some global timestamp, not uniq ids.

This is doable, technically. However, not sure what's the performance
requirement.

Thanks,
Jimmy
On Sun, Apr 21, 2013 at 9:22 AM, kishore g <[EMAIL PROTECTED]> wrote:

> Its probably not practical to do this for every put. Instead each client
> can get a chunk of ids, and use it for every  put. Each chunk of ids will
> be mutually exclusive and monotonically increases. You need to know that
> there can be holes in ids and ids will not be according to timestamp within
> a small interval
>
> If I remember correctly, tweet ids are generated like this. Take a look at
> snowflake https://github.com/twitter/snowflake
>
>
> thanks,
> Kishore G
>
>
> On Sun, Apr 21, 2013 at 8:10 AM, PG <[EMAIL PROTECTED]> wrote:
>
> > Hi, ted and JM, Thanks for the nice introduction. I have read the Omid
> > paper, which looks use a centralized party to do the coordination and
> > achieves 72K transactions per sec. And It does much more work than just
> > assigning timestamps, and I think it implicitly justifies the usage of a
> > global timestamp oracle in practice.... Appreciate the suggestion.
> > Regards,
> > Yun
> >
> > Sent from my iPad
> >
> > On Apr 16, 2013, at 9:31 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Yun,
> > >
> > > Attachements are not working on the mailing list. However, everyone
> > > using HBase should have the book on its desk, so I have ;)
> > >
> > > On the figure 8-11, you can see that client wil contact ZK to know
> > > where the root region is. Then the root region to find the meta, and
> > > so on.
> > >
> > > BUT.... This will be done only once per client! If you do 10 gets from
> > > your client, once you know where the root region is, you don't need to
> > > query ZK anymore. It will be cached locally.
> > >
> > > For your usecase, you might want to take a look at what Ted send.
> > > https://github.com/yahoo/omid/wiki I looked a it quickly and seems to
> > > be a good fit for you.
> > >
> > > JM
> > >
> > > 2013/4/16 yun peng <[EMAIL PROTECTED]>:
> > >> Hi, Jean and Jieshan,
> > >> Are you saying client can directly contact region servers? Maybe I
> > >> overlooked, but I think the client may need lookup regions by first
> > >> contacting Zk as in figure 8-11 from definitive book(as attached)...
> > >>
> > >> Nevertheless, if it is the case, to assign a global timestamp, what is
> > the
> > >> practical solutions in real production today? since it still needs
> some
> > >> centralised facility.. Please enlighten me. thanks.
> > >> Regards
> > >> Yun
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Apr 16, 2013 at 8:19 AM, Jean-Marc Spaggiari
> > >> <[EMAIL PROTECTED]> wrote:
> > >>>
> > >>> Hi Yun,
> > >>>
> > >>> If I understand you correctly, that mean that each time our are going
> > to
> > >>> do
> > >>> a put or a get you will need to call ZK first?
> > >>>
> > >>> Since ZK has only one master active, that mean that this ZK master
> > will be
> > >>> called for each HBase get/put?
> > >>>
> > >>> You are going to create a bottle neck there. I don't know how many RS
> > you
> > >>> have, but you will certainly hotspot you ZK server. I'm not sure
> it's a
> > >>> good idea.
> > >>>
> > >>> JM
> > >>>
> > >>> 2013/4/16 yun peng <[EMAIL PROTECTED]>
> > >>>
> > >>>> Hi, All,
> > >>>> I'd like to add a global timestamp oracle on Zookeep to assign
> > globally
> > >>>> unique timestamp for each Put/Get issued from HBase cluster. The
> > reason
> > >>>> I
> > >>>> put it on Zookeeper is that each Put/Get needs to go through it and
> > >>>> unique
> > >>>> timestamp needs some global centralised facility to do it. But I am
> > >>>> asking
> > >>>> how practical is this scheme, like anyone used in practice?
> > >>>>
> > >>>> Also, how difficulty is it to extend Zookeeper, or to inject code to
> > the
> > >>>> code path of HBase inside Zookeeper. I know HBase has Coprocessor on
> > >>>> region
> > >>>> server to let programmer to extend without recompiling HBase itself.