Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - avro in mapreduce


Copy link to this message
-
Re: avro in mapreduce
Doug Cutting 2010-01-27, 23:22
Since you raise it, my veto on the symlink patch was to one aspect for  
which implemented alternatives existed that no one vetoed. My  
understanding is that the reason this was delayed a year was primarily  
the absence of good tests. The issue was rapidly revived once Eli  
started addressing the tests. A compromise to the vetoed issue was  
rapidly found. So I don't see that my veto was the primary source of  
delay on that issue. Also, that veto was of one point in a large patch  
that had not yet been committed that I had been actively involved in  
the design of. Plus Dhruba, its primary author, while he did not veto,  
did agree with my view, so I was not alone in it.

[Sent from mobile]

On Jan 27, 2010, at 2:12 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

> On Jan 26, 2010, at 10:15 AM, Doug Cutting wrote:
>
>> This is a key link in a series of issues involved in integrating  
>> Avro in
>> Mapreduce.
>
> Getting Avro types passing through MapReduce is a good goal.
>
> I apologize for not seeing the issue before it was committed. I  
> accept some of the blame for that because I've buried in Hadoop  
> emails. That said, it is important to realize that with changes that  
> radically change the user's interaction with the framework require a  
> lot of discussion. This jira, as you've admitted, had a very  
> unrepresentative subject and description, which means that very few  
> people were following it. Additionally, there has been no design  
> document on the change to the MapReduce framework's paradigm, so it  
> wasn't clear what you were doing until this patch was committed.  
> Such a large change should have been highlighted on the public dev  
> lists. In the future, I would strongly suggest all developers  
> planning on making massive incompatible changes to post a design  
> document on the public dev lists outside of Jira to ensure the  
> discussion happens before instead of after the patch has been  
> committed.
>
> In terms of reverting the patch, I had fundamental issues with the  
> changes and felt that we needed more time to discuss them. Allowing  
> the patch to stay in trunk would bake it further and further in and  
> make reverting it much harder.
>
> I've listed my issues on the jira, but at a high level my concerns  
> are:
>
> 1. Changing API compatibility is very expensive.
> 2. Changing the semantics is even more expensive.
> 3. We are discussing several alternatives on the jira.
>
> Unlike Python and Linux, Apache has a democratic process and we have  
> to work together to build consensus. The Apache rules are that a  
> single -1 from a committer blocks the change from being made.  
> Occasionally that has cost us a lot of time. For example, a single  
> -1 from a committer on an implementation detail of the symlink patch  
> blocked it for more than a year. We need to work together to find a  
> solution that everyone can live with.
>
> -- Owen