Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> Tez branch and tez based patches


+
Edward Capriolo 2013-07-13, 16:48
+
Alan Gates 2013-07-16, 00:37
+
Edward Capriolo 2013-07-16, 01:51
Copy link to this message
-
Re: Tez branch and tez based patches
Ed,

I'm not sure I understand your argument, so I'm going to try to restate it.  Please tell me if I understand it correctly.

I think you're saying we should not embark on big projects in Hive because:
1) There were big projects in the past that were abandoned or are not currently making progress (such as Oracle integration, Hive StorageHandler)
2) There are other big projects going on (ORC, Vectorization)
3) There are lots of out standing patches that need to be dealt with.

I would respond with two points to this.

First, I agree that the large out standing patch count is very bad.  It keeps people from getting involved in Hive.  It deprives Hive of fixes and improvements it would otherwise have.  Several of the committers are working to address this by checking in peoples' patches, but they are unable to keep up.  The best solution is to encourage other committers to check in patches as well and to find willing and able contributors and mentor them to committership as quickly as possible.

Second, the way Apache works is that contributors scratch the itch that bothers them. So to argue "We shouldn't do X because we never finished Y" or "We shouldn't do X because we're doing Y" (where X and Y are independent) is not valid in Apache projects.  It's fine to argue that Tez hasn't been adequately explained (I think you hinted at this in previous emails) and ask for clarifications on what it is and what the planned changes are.  If after a full explanation you think it's a bad idea it's fine to argue Tez is the wrong direction for Hive and try to convince the rest of the community.  But assuming the community accepts that Tez is a reasonable direction and there are volunteers who want to do the work, then you can't argue they should work on something else instead.

Alan.

On Jul 15, 2013, at 6:51 PM, Edward Capriolo wrote:

>>> The Hive bylaws,  https://cwiki.apache.org/confluence/display/Hive/Bylaws, lay out what votes are needed for what.  I don't see anything there about
> needing 3 +1s for a branch.  Branching >>would seem to fall under code
> change, which requires one vote and a minimum length of 1 day.
>
> You could argue that all you need is one +1 to create a branch, but this is
> more then a branch. If you are talking about something that is:
> 1) going to cause major re-factoring of critical pieces of hive like
> ExecDriver and MapRedTask
> 2) going to be very disruptive to the efforts of other committers
> 3) something that may be a major architectural change
>
> Getting the project on board with the idea is a good idea.
>
> Now I want to point something out. Here are some recent initiatives in hive:
>
> 1) At one point there was a big initiative to "support oracle" after the
> initial work, there are patches in Jira no one seems to care about oracle
> support.
> 2) Another such decisions was this "support windows" one, there are
> probably 4 windows patches waiting reviews.
> 3) I still have no clue what the official hadoop1 hadoop2, hadoop 0.23
> support prospective is, but every couple weeks we get another jira about
> something not working/testing on one of those versions, seems like several
> builds are broken.
> 4) Hive-storage handler, after the initial implementation no one cares to
> review any other storage handler implementation, 3 patches there or more,
> could not even find anyone willing to review the cassandra storage handler
> I spent months on.
> 5) OCR, Vectorization
> 6) Windowing: committed, numerous check-style violations.
>
> We have !!!160+!!! PATCH_AVAILABLE Jira issues. Few active committers. We
> are spread very thin, and embarking on another side project not involved
> with core hive seems like the wrong direction at the moment.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jul 15, 2013 at 8:37 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>>
>> On Jul 13, 2013, at 9:48 AM, Edward Capriolo wrote:
>>
>>> I have started to see several re factoring patches around tez.
>>
+
Edward Capriolo 2013-07-16, 20:08
+
Edward Capriolo 2013-07-17, 05:20
+
Alan Gates 2013-07-17, 19:35
+
Edward Capriolo 2013-07-17, 20:41
+
Ashutosh Chauhan 2013-07-18, 00:43
+
Edward Capriolo 2013-07-20, 15:10
+
Gunther Hagleitner 2013-07-23, 00:08
+
Alan Gates 2013-07-17, 21:41
+
Edward Capriolo 2013-07-30, 04:02
+
Edward Capriolo 2013-07-30, 04:53
+
Alan Gates 2013-08-05, 17:54
+
Edward Capriolo 2013-08-16, 13:13
+
Edward Capriolo 2013-08-16, 14:54
+
Alan Gates 2013-08-05, 17:40
+
Brock Noland 2013-07-16, 15:56