Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Tez branch and tez based patches


Copy link to this message
-
Re: Tez branch and tez based patches
Which talk are you referencing here?  AFAIK all the Hive code we've written is being pushed back into the Tez branch, so you should be able to see it there.

Alan.

On Jul 29, 2013, at 9:02 PM, Edward Capriolo wrote:

> At ~25:00
>
> "There is a working prototype of hive which is using tez as the targeted
> runtime"
>
> Can I get a look at that code? Is it on github?
>
> Edward
>
>
> On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>> Answers to some of your questions inlined.
>>
>> Alan.
>>
>> On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
>>
>>> There are some points I want to bring up. First, I am on the PMC. Here is
>>> something I find relevant:
>>>
>>> http://www.apache.org/foundation/how-it-works.html
>>>
>>> ------------------------------
>>>
>>> The role of the PMC from a Foundation perspective is oversight. The main
>>> role of the PMC is not code and not coding - but to ensure that all legal
>>> issues are addressed, that procedure is followed, and that each and every
>>> release is the product of the community as a whole. That is key to our
>>> litigation protection mechanisms.
>>>
>>> Secondly the role of the PMC is to further the long term development and
>>> health of the community as a whole, and to ensure that balanced and wide
>>> scale peer review and collaboration does happen. Within the ASF we worry
>>> about any community which centers around a few individuals who are
>> working
>>> virtually uncontested. We believe that this is detrimental to quality,
>>> stability, and robustness of both code and long term social structures.
>>>
>>> --------------------------------
>>>
>>>
>> https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
>>>
>>> -------------------------------------
>>>
>>> All other decisions happen on the dev list, discussions on the private
>> list
>>> are kept to a minimum.
>>>
>>> "If it didn't happen on the dev list, it didn't happen" - which leads to:
>>>
>>> a) Elections of committers and PMC members are published on the dev list
>>> once finalized.
>>>
>>> b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
>>> soon as they have impact on the project, code or community.
>>> ---------------------------------
>>>
>>> https://issues.apache.org/jira/browse/HIVE-4660 ironically titled "Let
>>> their be Tez" has not be +1 ed by any committer. It was never discussed
>> on
>>> the dev or the user list (as far as I can tell).
>>
>> As all JIRA creations and updates are sent to dev@hive, creating a JIRA
>> is de facto posting to the list.
>>
>>>
>>> As a PMC member I feel we need more discussion on Tez on the dev list
>> along
>>> with a wiki-fied design document. Topics of discussion should include:
>>
>> I talked with Gunther and he's working on posting a design doc on the
>> wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet
>> on the wiki.
>>
>>>
>>> 1) What is tez?
>> In Hadoop 2.0, YARN opens up the ability to have multiple execution
>> frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the
>> only execution option.  Tez is an effort to build an execution engine that
>> is optimized for relational data processing, such as Hive and Pig.
>>
>> The biggest change here is to move away from only Map and Reduce as
>> processing options and to allow alternate combinations of processing, such
>> as map -> reduce -> reduce or tasks that take multiple inputs or shuffles
>> that avoid sorting when it isn't needed.
>>
>> For a good intro to Tez, see Arun's presentation on it at the recent
>> Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides
>> http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
>>>
>>> 2) How is tez different from oozie, http://code.google.com/p/hop/,
>>> http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming
>> map
>>> reduce tools/frameworks? Why should we use this and not those?
>>