-Re: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase
Please see my replies below inline.
On Monday, September 17, 2012, Dai, Jason wrote:
> Hi Andrew,
> See my comments below (I have also replied at
> >>>> coprocessor based applications should begin as independent code
> contributions, perhaps hosted in a GitHub repository
> >>>> It would be helpful if only the changes on top of stock HBase code
> appear here.
> This could work, though I think we need to figure out how to address
> several implications brought by the proposal, such as:
> (1) How do the users figure out what co-processor applications are stable,
> so that they can use in their production deployment?
This is exactly the motivation for starting all coprocessor based
applications/contributions as external projects. We will have no registry
of "approved" or "stable" coprocessor applications. I'd imagine users would
expect all such apps in the HBase distribution proper to be in such a
state. Beyond that, I don't think the project can have the bandwidth to
track a number of ideas in development. We can't know in advance what
support, interest, or stability any given contribution would have, so
starting as an external project establishes this on its own merit. A
popular and well cared for contribution would eventually be candidate for
inclusion into the HBase source distribution proper. This is my
characterization of what has been discussed and the consensus reached by
the PMC. If others feel this in error, or if we should do something
differently here, please speak up.
> (2) How do we ensure the co-processor applications continue to be
> compatible with the changes in the HBase project, and compatible with each
We don't. The onus is on the contributor. If at some point the consensus of
the project is to bring in a particular contribution into the ASF HBase
source distribution, then at that point we must insure these things... But
only with what is in the source distribution.
> (3) How do the users get the co-processor applications? They can no longer
> get these from the Apache HBase release, and may need to perform manual
> integrations - not something average business users will do, and the main
> reason that we put the full HBase source tree out
HBase is a mavenized project and your DOT system is a coprocessor
application. There is no technical reason, barring issues with the CP
framework itself, I can see why you have to include and maintain a full
fork of HBase. Simply depend on HBase project artifacts and the complete
DOT application can be compiled as a jar to drop on the classpath of a
HBase installation. Where the CP framework may be insufficient, we can
address that. Or, like Stack says, if there is some other technical reason
(like a patch to core HBase), please list those so we can look at
addressing it. We would definitely like to support your DOT on stock ASF
> >>>> We would be delighted to work with you on the necessary coprocessor
> framework extensions. I'd recommend a separate JIRA specifically for this.
> Yes, we do plan to submit the proposal for observers for the filter
> operations as a separate JIRA (the original plan was to make it a sub task
> of this JIRA).
Sure, that would be great.
> -----Original Message-----
> Sent: Tuesday, September 18, 2012 3:23 AM
> Dai, Jason
> Subject: Re: Announcement of Project Panthera: Better Analytics with SQL,
> MapReduce and HBase
> Hi Jason,
> > I'd like to announce Project Panthera, our open source efforts that
> showcase better data analytics capabilities on Hadoop/HBase (through both
> SW and HW improvements), available at
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)