Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Tez branch and tez based patches

Copy link to this message
Re: Tez branch and tez based patches
Answers to some of your questions inlined.


On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:

> There are some points I want to bring up. First, I am on the PMC. Here is
> something I find relevant:
> http://www.apache.org/foundation/how-it-works.html
> ------------------------------
> The role of the PMC from a Foundation perspective is oversight. The main
> role of the PMC is not code and not coding - but to ensure that all legal
> issues are addressed, that procedure is followed, and that each and every
> release is the product of the community as a whole. That is key to our
> litigation protection mechanisms.
> Secondly the role of the PMC is to further the long term development and
> health of the community as a whole, and to ensure that balanced and wide
> scale peer review and collaboration does happen. Within the ASF we worry
> about any community which centers around a few individuals who are working
> virtually uncontested. We believe that this is detrimental to quality,
> stability, and robustness of both code and long term social structures.
> --------------------------------
> https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
> -------------------------------------
> All other decisions happen on the dev list, discussions on the private list
> are kept to a minimum.
> "If it didn't happen on the dev list, it didn't happen" - which leads to:
> a) Elections of committers and PMC members are published on the dev list
> once finalized.
> b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
> soon as they have impact on the project, code or community.
> ---------------------------------
> https://issues.apache.org/jira/browse/HIVE-4660 ironically titled "Let
> their be Tez" has not be +1 ed by any committer. It was never discussed on
> the dev or the user list (as far as I can tell).

As all JIRA creations and updates are sent to dev@hive, creating a JIRA is de facto posting to the list.  

> As a PMC member I feel we need more discussion on Tez on the dev list along
> with a wiki-fied design document. Topics of discussion should include:

I talked with Gunther and he's working on posting a design doc on the wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet on the wiki.

> 1) What is tez?
In Hadoop 2.0, YARN opens up the ability to have multiple execution frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the only execution option.  Tez is an effort to build an execution engine that is optimized for relational data processing, such as Hive and Pig.

The biggest change here is to move away from only Map and Reduce as processing options and to allow alternate combinations of processing, such as map -> reduce -> reduce or tasks that take multiple inputs or shuffles that avoid sorting when it isn't needed.

For a good intro to Tez, see Arun's presentation on it at the recent Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
> 2) How is tez different from oozie, http://code.google.com/p/hop/,
> http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming map
> reduce tools/frameworks? Why should we use this and not those?

Oozie is a completely different thing.  Oozie is a workflow engine and a scheduler.  It's core competencies are the ability to coordinate workflows of disparate job types (MR, Pig, Hive, etc.) and to schedule them.  It is not intended as an execution engine for apps such as Pig and Hive.  

I am not familiar with these other engines, but the short answer is that Tez is built to work on YARN, which works well for Hive since it is tied to Hadoop.
> 3) When can we expect the first tez release?
I don't know, but I hope sometime this fall.

> 4) How much effort is involved in integrating hive and tez?
Covered in the design doc.

> 5) Who is ready to commit to this effort?
I'll let people speak for themselves on that one.

Unlikely.  Initial integration will be done in one release, but as Tez is a new project I expect it will be adding features in the future that Hive will want to take advantage of.
Can we change this to "not commit patches"?  We can't tell willing people not to work on it.