Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Thrift?



Point taken … +1 for protobuf - from my POV we can close ISSUE-1

> The question of an internal wire format, btw, does not constrain the project relative to external access.

Sounds sensible.

The only one thing I really don't get is: why did you put Avro and JSON into the proposal [1] in the first place? Or is this the 'external access' from above?

Cheers,
  Michael

[1] http://wiki.apache.org/incubator/DrillProposal

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 14 Sep 2012, at 22:31, Ted Dunning wrote:

> I think that it is important to ask a few questions leading up a decision
> here.
>
> The first is a (rhetorical) show of hands about how many people believe
> that there are no serious performance or expressivity killers when
> comparing alternative serialization frameworks.  As far as I know,
> performance differences are not massive (and protobufs is one of the
> leaders in any case) and the expressivity differences are essentially nil.
> If somebody feels that there is a serious show-stopper with any option,
> they should speak.
>
> The second is to ask the sense of the community whether they judge progress
> or perfection in this decision is most important to the project.  My guess
> is that almost everybody would prefer to see progress as long as the
> technical choice is not subject to some horrid missing bit.
>
> The final question is whether it is reasonable to go along with protobufs
> given that several very experienced engineers prefer it and would like to
> produce code based on it.  If the first two answers are answered to the
> effect of protobufs is about as good as we will find and that progress
> trumps small differences, then it seems that moving to follow this
> preference of Jason and Ryan for protobufs might be a reasonable thing to
> do.
>
> The question of an internal wire format, btw, does not constrain the
> project relative to external access.  I think it is important to support
> JDBC and ODBC and whatever is in common use for querying.  For external
> access the question is quite different.  Whereas for the internal format
> consensus around a single choice has large benefits, the external format
> choice is nearly the opposite.  For an external format, limiting ourselves
> to a single choice seems like a bad idea and increasing the audience seems
> like a better choice.
>
> On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
>
>> Hi folks,
>>
>> I just commented on this first JIRA.  Here is my text:
>>
>> This issue has been hashed over a lot in the Hadoop projects. There
>> was work done to compare thrift vs avro vs protobuf. The conclusion
>> was protobuf was the decision to use.
>>
>> Prior to this move, there had been a lot of noise about pluggable RPC
>> transports, and whatnot. It held up adoption of a backwards compatible
>> serialization framework for a long time. The problem ended up being
>> the analysis-paralysis, rather than the specific implementation
>> problem. In other words, the problem was a LACK of implementation than
>> actual REAL problems.
>>
>> Based on this experience, I'd strongly suggest adopting protobuf and
>> moving on. Forget about pluggable RPC implementations, the complexity
>> doesnt deliver benefits. The benefits of protobuf is that its the RPC
>> format for Hadoop and HBase, which allows Drill to draw on the broad
>> experience of those communities who need to implement high performance
>> backwards compatible RPC serialization.
>>
>> ===>>
>> Expanding a bit, I've looked in to this issue a lot, and there is very
>> few significant concrete reasons to choose protobuf vs thrift.  Tiny
>> percent faster of this, and that, etc.  I'd strongly suggest protobuf
>> for the expanded community.  There is no particular Apache imperative
>> that Apache projects re-use libraries.  Use what makes sense for your
>> project.
>>
>> As regards to Avro, it's a fine serialization format for long term
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB