Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Tez branch and tez based patches


Copy link to this message
-
Re: Tez branch and tez based patches
Edward Capriolo 2013-07-20, 15:10
I agree we are getting into grey area with the term disruptive. For
reference ( I have not been doing this all the time bad on me) we are
supposed to +1 and wait a day.

>> I am not familiar with these other engines, but the short answer is that
>> Tez is built to work on YARN, which works well for Hive since it is tied
>> to Hadoop

I understand what you are saying here yarn support is a plus. However the
rest of the answer is something relevant to the discussion.

There are already frameworks like spark that are semi popular.
http://www.slideshare.net/jetlore/spark-and-shark-lightningfast-analytics-over-hadoop-and-hive-data.
There are also other framworks like s4 http://incubator.apache.org/s4/, or
storm.

A big part of making a design decision is doing a competitive analysis.
Usually asking yourself "What else for this is already out there?" or "Can
this be done other ways?"
I do want to be convinced we do not lock into tez too early with tunnel
vision. Possibly we should be thinking on how to build hive in such a way
that many different frameworks could plug in. In other words convincing
that tez is the best choice, since many people are claiming an mrr type
solution.

I will watch the video you posted and study the material myself as well.
On Wed, Jul 17, 2013 at 8:43 PM, Ashutosh Chauhan <[EMAIL PROTECTED]>wrote:

> On Wed, Jul 17, 2013 at 1:41 PM, Edward Capriolo <[EMAIL PROTECTED]
> >wrote:
>
> >
> > "In my opinion we should limit the amount of tez related optimizations to
> > and trunk" Refactoring that cleans up code is good, but as you have
> pointed
> > out there wont be a tez release until sometime this fall, and this branch
> > will be open for an extended period of time. Thus code cleanups and other
> > tez related refactoring does not need to be disruptive to trunk.
>
>
> I agree Tez specific changes need not to go in trunk. But general
> refactoring and code cleanup needs to happen on trunk as and when someone
> is willing to work on those. We have to continually improve our code
> quality. Code maintainability and readability is a priority. Without that
> code quality suffers and discourages new contributors to contribute because
> code is unnecessarily complicated. SemanticAnalyzer is 11K line class. We
> need to simplify it. Patch like HIVE-4811 is a welcome change which tackled
> it. Exec package is all convoluted which mixes up runtime operators and
> drivers for runtime. Thats a welcome patch because it makes it much more
> easy to read and reason about that piece of code. HIVE-4825 is another
> example which improves modularity of code. For contributors who are exposed
> to Hive first time it will be easier for them to follow the code.
>
> Rather than disruptive to trunk, they are constructive for trunk and I am
> glad people are choosing to work on that. Tez or no Tez Hive is better off
> with these patches.
>
> Thanks,
> Ashutosh
>
>
>
> >  On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Answers to some of your questions inlined.
> > >
> > > Alan.
> > >
> > > On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
> > >
> > > > There are some points I want to bring up. First, I am on the PMC.
> Here
> > is
> > > > something I find relevant:
> > > >
> > > > http://www.apache.org/foundation/how-it-works.html
> > > >
> > > > ------------------------------
> > > >
> > > > The role of the PMC from a Foundation perspective is oversight. The
> > main
> > > > role of the PMC is not code and not coding - but to ensure that all
> > legal
> > > > issues are addressed, that procedure is followed, and that each and
> > every
> > > > release is the product of the community as a whole. That is key to
> our
> > > > litigation protection mechanisms.
> > > >
> > > > Secondly the role of the PMC is to further the long term development
> > and
> > > > health of the community as a whole, and to ensure that balanced and
> > wide
> > > > scale peer review and collaboration does happen. Within the ASF we