Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Tez branch and tez based patches


Copy link to this message
-
Re: Tez branch and tez based patches
Edward Capriolo 2013-07-16, 20:08
Alan,

I agree with all your statements, with the exception of one.

"Second, the way Apache works is that contributors scratch the itch that bothers them. So to argue "We shouldn't do X because we never finished Y" or "We shouldn't do X because we're doing Y" (where X and Y are independent) is not valid in Apache projects.

I disagree, look at this:

https://issues.apache.org/jira/browse/HIVE-3585

A contribution was immediately met with a -1.

I personally have had issues closed as "WONT FIX", "LATER" across a variety
of apache projects because said committers decided the feature was out of
scope, or whatever.

Arguing that if one contributer wants to "scratch an itch" we should allow
it in the project is not practical. Because we have to be able to maintain
hive after the "itch scratcher" finds a new itch, and moves on. Hive is not
project hosting for "every cool idea".

This was why I mentioned things like "windows support", I do not think
there was ever a point where the committers/PMC agreed that "windows
support" was something we all wanted to work towards. I can not pin down
how the initiative started and why. Now whoever started that ball rolling
has moved on. I do not own a windows computer, we have no apache
infrastructure to test hive on windows. Jira issues stay open, those of us
in it for the long haul and up holding the ball, and supporting things we
never explicitly wanted.

As this relates to Tez, tez is in the incubator. Hive is release quality
software. I am not convinced Tez is the direction we should go in. I am
scared of it going the path of "windows support" or "oracle support",
because someone "scratching an itch" and we (the committers) do not have
enough information, about the changes involved, the timeline, what types of
use cases will benefit from this feature.

Tez refactoring are getting filed as 'MAJOR' 'BUGS' and getting committed
to trunk, when they are 'IMPROVEMENTS' that are 'LOW' priority. I do not
understand why there is such a priority to merge code into trunk, when we
can all see this branch is going to be opened for a long time and be rather
involved. Even then I would not mind if it was not largely unfair to
everyone else that now needs to rebase.
On Tue, Jul 16, 2013 at 2:24 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> Ed,
>
> I'm not sure I understand your argument, so I'm going to try to restate
> it.  Please tell me if I understand it correctly.
>
> I think you're saying we should not embark on big projects in Hive because:
> 1) There were big projects in the past that were abandoned or are not
> currently making progress (such as Oracle integration, Hive StorageHandler)
> 2) There are other big projects going on (ORC, Vectorization)
> 3) There are lots of out standing patches that need to be dealt with.
>
> I would respond with two points to this.
>
> First, I agree that the large out standing patch count is very bad.  It
> keeps people from getting involved in Hive.  It deprives Hive of fixes and
> improvements it would otherwise have.  Several of the committers are
> working to address this by checking in peoples' patches, but they are
> unable to keep up.  The best solution is to encourage other committers to
> check in patches as well and to find willing and able contributors and
> mentor them to committership as quickly as possible.
>
> Second, the way Apache works is that contributors scratch the itch that
> bothers them. So to argue "We shouldn't do X because we never finished Y"
> or "We shouldn't do X because we're doing Y" (where X and Y are
> independent) is not valid in Apache projects.  It's fine to argue that Tez
> hasn't been adequately explained (I think you hinted at this in previous
> emails) and ask for clarifications on what it is and what the planned
> changes are.  If after a full explanation you think it's a bad idea it's
> fine to argue Tez is the wrong direction for Hive and try to convince the
> rest of the community.  But assuming the community accepts that Tez is a