Speaking from my recent experience with the flume/accumulo integration,
here are my 2 cents. The Accumulo sink was suggested to get committed into
Flume, which is what we are doing now.
First, I was impressed by how willing the Flume community was willing to
help us conform to their standards to get it committed. This appears pretty
normal in my experience across projects. However, reaching out to someone
in the flume community and playing by their rules makes it slightly harder
for me to contribute something like this.
So, regardless if it makes sense technically, maybe right now it's just
mostly about getting the contributions in. Today, I think it's more likely
an Accumulo developer wants to integrate with other things than the other
way around... so accumulo-contrib is the easiest path for someone like me
who wants to make these types of contributions. I think this is a corollary
to Chris's #2: Who is going to be responsible for maintaining it?
When other projects start independently doing Accumulo integration
independently, we'll know Accumulo "made it" ;)
On Mon, Oct 21, 2013 at 3:27 PM, Sean Busbey <[EMAIL PROTECTED]> wrote:
> On Mon, Oct 21, 2013 at 2:05 PM, Christopher <[EMAIL PROTECTED]> wrote:
> > I think the answer to where things should go depends on two main factors:
> > 1) Which project(s) does it benefit the most? (does it benefit
> > Accumulo users more to have another way to access Accumulo, or does it
> > benefit Hive users more to have another database to query from?), and
> > 2) Who is going to be responsible for maintaining it?
> > The first question is probably a very subjective one, so I expect the
> > second to play a bigger role. Perhaps the discussion should involve
> > both communities to consolidate potentially multiple efforts?
> I think the second question also comes down to community-specific
> subjectivity. For some projects, being in core doesn't imply a different
> level of maintenance than being in contrib or being in an outside repo (see
> the discussion from this summer around Trevni in Hive) -- if no one uses
> something it doesn't get maintained. If that happens long enough, it gets
> cut. I don't think we should use that lack of maintenance assurance to mean
> that we keep things in Accumulo just because we care about them being
> I tend to favor Jon's reasoning, which mostly focuses on which API is
> more likely to change in a way that requires maintenance. In the case of
> things like Flume, Hive, or Pig, I think the level of familiarity needed to
> maintain an integration point requires more knowledge of the non-Accumulo
> If the answer is that it's all case-by-case, then I can just put wording to
> that end in the contrib document. I just want to make sure people have some
> idea of our reasoning as a project without reading our mail archive or