Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Tez branch and tez based patches


Copy link to this message
-
Re: Tez branch and tez based patches
At ~25:00

"There is a working prototype of hive which is using tez as the targeted
runtime"

Can I get a look at that code? Is it on github?

Edward
On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> Answers to some of your questions inlined.
>
> Alan.
>
> On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
>
> > There are some points I want to bring up. First, I am on the PMC. Here is
> > something I find relevant:
> >
> > http://www.apache.org/foundation/how-it-works.html
> >
> > ------------------------------
> >
> > The role of the PMC from a Foundation perspective is oversight. The main
> > role of the PMC is not code and not coding - but to ensure that all legal
> > issues are addressed, that procedure is followed, and that each and every
> > release is the product of the community as a whole. That is key to our
> > litigation protection mechanisms.
> >
> > Secondly the role of the PMC is to further the long term development and
> > health of the community as a whole, and to ensure that balanced and wide
> > scale peer review and collaboration does happen. Within the ASF we worry
> > about any community which centers around a few individuals who are
> working
> > virtually uncontested. We believe that this is detrimental to quality,
> > stability, and robustness of both code and long term social structures.
> >
> > --------------------------------
> >
> >
> https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
> >
> > -------------------------------------
> >
> > All other decisions happen on the dev list, discussions on the private
> list
> > are kept to a minimum.
> >
> > "If it didn't happen on the dev list, it didn't happen" - which leads to:
> >
> > a) Elections of committers and PMC members are published on the dev list
> > once finalized.
> >
> > b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
> > soon as they have impact on the project, code or community.
> > ---------------------------------
> >
> > https://issues.apache.org/jira/browse/HIVE-4660 ironically titled "Let
> > their be Tez" has not be +1 ed by any committer. It was never discussed
> on
> > the dev or the user list (as far as I can tell).
>
> As all JIRA creations and updates are sent to dev@hive, creating a JIRA
> is de facto posting to the list.
>
> >
> > As a PMC member I feel we need more discussion on Tez on the dev list
> along
> > with a wiki-fied design document. Topics of discussion should include:
>
> I talked with Gunther and he's working on posting a design doc on the
> wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet
> on the wiki.
>
> >
> > 1) What is tez?
> In Hadoop 2.0, YARN opens up the ability to have multiple execution
> frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the
> only execution option.  Tez is an effort to build an execution engine that
> is optimized for relational data processing, such as Hive and Pig.
>
> The biggest change here is to move away from only Map and Reduce as
> processing options and to allow alternate combinations of processing, such
> as map -> reduce -> reduce or tasks that take multiple inputs or shuffles
> that avoid sorting when it isn't needed.
>
> For a good intro to Tez, see Arun's presentation on it at the recent
> Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides
> http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
> >
> > 2) How is tez different from oozie, http://code.google.com/p/hop/,
> > http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming
> map
> > reduce tools/frameworks? Why should we use this and not those?
>
> Oozie is a completely different thing.  Oozie is a workflow engine and a
> scheduler.  It's core competencies are the ability to coordinate workflows
> of disparate job types (MR, Pig, Hive, etc.) and to schedule them.  It is
> not intended as an execution engine for apps such as Pig and Hive.
>
> I am not familiar with these other engines, but the short answer is that