-Re: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase
On Mon, Sep 17, 2012 at 5:44 PM, Dai, Jason <[EMAIL PROTECTED]> wrote:
> This could work, though I think we need to figure out how to address several implications brought by the proposal, such as:
> (1) How do the users figure out what co-processor applications are stable, so that they can use in their production deployment?
As they would any other piece of software?
Or what are you thinking here Jason? That you need to deliver the
whole stack -- from Document App down through Coprocessor and on down
through HBase too -- to be able to say your document store is stable?
> (2) How do we ensure the co-processor applications continue to be compatible with the changes in the HBase project, and compatible with each other?
Testing would be the short answer. Taking on a new HBase version,
you'd run your tests to ensure core works as your Document
Regards compatibility, the project is very careful regards our public
APIs. They only change rarely, and only if extremely good reason. If
they do change, they are first deprecated for a release and only
removed on the release subsequent.
Regards Coprocessors in particular, they are not yet part of our
public API. They are by agreement, more developer-facing at the
moment. This makes sense for something we are still evolving -- e.g.
sounds like you found that we are missing CP hooks in filters -- and
for a tech that gives you the enough rope to hang your cluster.
So, your CPs, given the caveat above, should remain relatively stable
across HBase versions. You may have to adjust some as you go across
major versions but even this requirement, post-0.96, should lessen as
all moves up on to protobufs.
Regards intra-CP compatibility, thats beyond core concern.
> (3) How do the users get the co-processor applications?
Not sure. We should work on this. Should we make it you point your
cluster at a repository, select a CP, and it then downloads it and
installs like an eclipse plugin only hopefully the deploy does not
require a cluster restart -- of if a restart, its a rolling restart.
That'd be kinda sweet (we'd have to first figure out the CPs that are
vetted and not going to kill your cluster and/or move CP execution out
of the regionserver process to run beside it so they don't bring the
RS if they go rogue, etc.)
> They can no longer get these from the Apache HBase release, and may need to perform manual integrations - not something average business users will do, and the main reason that we put the full HBase source tree out (several of our users and customers want to get a prototype of DOT to try it out).
We don't intend to ship all CPs as part of core. Its untenable (I can
explain why that would not work but my guess is that you can figure it
A DOT package that bundles HBase is fine for folks to try. But do
you intend to keep your own fork of hbase or is the intent to move
toward DOT running on a released HBase? If you'd like to do the
latter, we'd like to help.