Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How practical is it to add a timestamp oracle on Zookeeper


Copy link to this message
-
Re: How practical is it to add a timestamp oracle on Zookeeper
I think Yun wants some global timestamp, not uniq ids.

This is doable, technically. However, not sure what's the performance
requirement.

Thanks,
Jimmy
On Sun, Apr 21, 2013 at 9:22 AM, kishore g <[EMAIL PROTECTED]> wrote:

> Its probably not practical to do this for every put. Instead each client
> can get a chunk of ids, and use it for every  put. Each chunk of ids will
> be mutually exclusive and monotonically increases. You need to know that
> there can be holes in ids and ids will not be according to timestamp within
> a small interval
>
> If I remember correctly, tweet ids are generated like this. Take a look at
> snowflake https://github.com/twitter/snowflake
>
>
> thanks,
> Kishore G
>
>
> On Sun, Apr 21, 2013 at 8:10 AM, PG <[EMAIL PROTECTED]> wrote:
>
> > Hi, ted and JM, Thanks for the nice introduction. I have read the Omid
> > paper, which looks use a centralized party to do the coordination and
> > achieves 72K transactions per sec. And It does much more work than just
> > assigning timestamps, and I think it implicitly justifies the usage of a
> > global timestamp oracle in practice.... Appreciate the suggestion.
> > Regards,
> > Yun
> >
> > Sent from my iPad
> >
> > On Apr 16, 2013, at 9:31 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Yun,
> > >
> > > Attachements are not working on the mailing list. However, everyone
> > > using HBase should have the book on its desk, so I have ;)
> > >
> > > On the figure 8-11, you can see that client wil contact ZK to know
> > > where the root region is. Then the root region to find the meta, and
> > > so on.
> > >
> > > BUT.... This will be done only once per client! If you do 10 gets from
> > > your client, once you know where the root region is, you don't need to
> > > query ZK anymore. It will be cached locally.
> > >
> > > For your usecase, you might want to take a look at what Ted send.
> > > https://github.com/yahoo/omid/wiki I looked a it quickly and seems to
> > > be a good fit for you.
> > >
> > > JM
> > >
> > > 2013/4/16 yun peng <[EMAIL PROTECTED]>:
> > >> Hi, Jean and Jieshan,
> > >> Are you saying client can directly contact region servers? Maybe I
> > >> overlooked, but I think the client may need lookup regions by first
> > >> contacting Zk as in figure 8-11 from definitive book(as attached)...
> > >>
> > >> Nevertheless, if it is the case, to assign a global timestamp, what is
> > the
> > >> practical solutions in real production today? since it still needs
> some
> > >> centralised facility.. Please enlighten me. thanks.
> > >> Regards
> > >> Yun
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Apr 16, 2013 at 8:19 AM, Jean-Marc Spaggiari
> > >> <[EMAIL PROTECTED]> wrote:
> > >>>
> > >>> Hi Yun,
> > >>>
> > >>> If I understand you correctly, that mean that each time our are going
> > to
> > >>> do
> > >>> a put or a get you will need to call ZK first?
> > >>>
> > >>> Since ZK has only one master active, that mean that this ZK master
> > will be
> > >>> called for each HBase get/put?
> > >>>
> > >>> You are going to create a bottle neck there. I don't know how many RS
> > you
> > >>> have, but you will certainly hotspot you ZK server. I'm not sure
> it's a
> > >>> good idea.
> > >>>
> > >>> JM
> > >>>
> > >>> 2013/4/16 yun peng <[EMAIL PROTECTED]>
> > >>>
> > >>>> Hi, All,
> > >>>> I'd like to add a global timestamp oracle on Zookeep to assign
> > globally
> > >>>> unique timestamp for each Put/Get issued from HBase cluster. The
> > reason
> > >>>> I
> > >>>> put it on Zookeeper is that each Put/Get needs to go through it and
> > >>>> unique
> > >>>> timestamp needs some global centralised facility to do it. But I am
> > >>>> asking
> > >>>> how practical is this scheme, like anyone used in practice?
> > >>>>
> > >>>> Also, how difficulty is it to extend Zookeeper, or to inject code to
> > the
> > >>>> code path of HBase inside Zookeeper. I know HBase has Coprocessor on
> > >>>> region
> > >>>> server to let programmer to extend without recompiling HBase itself.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB