Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Tez branch and tez based patches


Copy link to this message
-
Re: Tez branch and tez based patches
Edward Capriolo 2013-07-17, 20:41
>> As all JIRA creations and updates are sent to dev@hive, creating a JIRA
is de facto posting to the list.

Agreed (although several ticket names are non descriptive). Possibly more
out-of-band discussions need to be summarized on list.

Yes. I will restart this:

"In my opinion we should not start any work on this tez-hive until these
questions are answered to the satisfaction of the hive developers."

"In my opinion we should limit the amount of tez related optimizations to
and trunk" Refactoring that cleans up code is good, but as you have pointed
out there wont be a tez release until sometime this fall, and this branch
will be open for an extended period of time. Thus code cleanups and other
tez related refactoring does not need to be disruptive to trunk.

I have another relevant question, which I already probably know the answer
to, but I will ask it anyway.

Because tez is a YARN application, does this mean that Tez will be the
first hive feature that will require YARN? (It seems like the answer is yes)

On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> Answers to some of your questions inlined.
>
> Alan.
>
> On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
>
> > There are some points I want to bring up. First, I am on the PMC. Here is
> > something I find relevant:
> >
> > http://www.apache.org/foundation/how-it-works.html
> >
> > ------------------------------
> >
> > The role of the PMC from a Foundation perspective is oversight. The main
> > role of the PMC is not code and not coding - but to ensure that all legal
> > issues are addressed, that procedure is followed, and that each and every
> > release is the product of the community as a whole. That is key to our
> > litigation protection mechanisms.
> >
> > Secondly the role of the PMC is to further the long term development and
> > health of the community as a whole, and to ensure that balanced and wide
> > scale peer review and collaboration does happen. Within the ASF we worry
> > about any community which centers around a few individuals who are
> working
> > virtually uncontested. We believe that this is detrimental to quality,
> > stability, and robustness of both code and long term social structures.
> >
> > --------------------------------
> >
> >
> https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
> >
> > -------------------------------------
> >
> > All other decisions happen on the dev list, discussions on the private
> list
> > are kept to a minimum.
> >
> > "If it didn't happen on the dev list, it didn't happen" - which leads to:
> >
> > a) Elections of committers and PMC members are published on the dev list
> > once finalized.
> >
> > b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
> > soon as they have impact on the project, code or community.
> > ---------------------------------
> >
> > https://issues.apache.org/jira/browse/HIVE-4660 ironically titled "Let
> > their be Tez" has not be +1 ed by any committer. It was never discussed
> on
> > the dev or the user list (as far as I can tell).
>
> As all JIRA creations and updates are sent to dev@hive, creating a JIRA
> is de facto posting to the list.
>
> >
> > As a PMC member I feel we need more discussion on Tez on the dev list
> along
> > with a wiki-fied design document. Topics of discussion should include:
>
> I talked with Gunther and he's working on posting a design doc on the
> wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet
> on the wiki.
>
> >
> > 1) What is tez?
> In Hadoop 2.0, YARN opens up the ability to have multiple execution
> frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the
> only execution option.  Tez is an effort to build an execution engine that
> is optimized for relational data processing, such as Hive and Pig.
>
> The biggest change here is to move away from only Map and Reduce as
> processing options and to allow alternate combinations of processing, such