Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Interesting claims that seem untrue


Copy link to this message
-
Re: Interesting claims that seem untrue
Whatever you count, you get more of :)
On Tue, Sep 17, 2013 at 1:57 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:

> Carter,
>
> what you are doing is essentially contradict ASF policy of "community over
> code".
>
> Perhaps, your intentions are good. However, LOC calculations or other silly
> contests are essentially driving a wedge between developers who happen to
> draw
> their paycheck from different commercial entities. Hadoop community passed
> through this already and it caused nothing but despair and bitterness
> between
> the people.
>
> Unlike some other popular contests, the number of lines contributed doesn't
> matter for most. Seriously.
>
> Regards,
>   Cos
>
> On Mon, Sep 16, 2013 at 01:58PM, Carter Shanklin wrote:
> > Ed,
> >
> > If nothing else I'm glad it was interesting enough to generate some
> > discussion. These sorts of stats are always subjects of a lot of
> > controversy. I have seen a lot of these sorts of charts float around in
> > confidential slide decks and I think it's good to have them out in the
> open
> > where anyone can critique and correct them.
> >
> > In this case Ed, you've pointed out a legitimate flaw in my analysis.
> Doing
> > the analysis again I found that previously, due to a bug in my scripts,
> > JIRAs that didn't have Hudson comments in them were not counted (this was
> > one way it was identifying SVN commit IDs which I have since removed due
> to
> > flakiness). Brock's patch was the single largest victim of this bug but
> not
> > the only one, there were some from Cloudera, NexR, Hortonworks, Facebook
> > even 2 from you Ed. The interested can see a full list of exclusions
> here:
> >
> https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0
> .
> > I apologize to those under-represented, there wasn't any intent on my
> part
> > to minimize anyone's work. The impact in final totals is Cloudera +5.4%,
> > NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the
> blog
> > later today with relevant corrections.
> >
> > There is going to be continued interest in seeing charts like these, for
> > example when Hive 12 is officially done. Sanjay suggested that LoC counts
> > may not be the best way to represent true contribution. I agree that not
> > all lines of code are created equal, for example a few monster patches
> > recently went in re-arranging HCatalog namespaces and I think also
> > indentation style. This (hopefully) mechanical work is not on the same
> > footing as adding new query language features. Still it is work and
> > wouldn't be fair to pretend it didn't happen. If anyone has ideas on
> better
> > ways to fairly capture contribution I'm open to suggestions.
> >
> >
> >
> > On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo <[EMAIL PROTECTED]
> >wrote:
> >
> > > I was reading the horton-works blog and found an interesting article.
> > >
> > >
> http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753
> > >
> > > There is a very interesting graphic which attempts to demonstrate
> lines of
> > > code in the 12 release.
> > > http://hortonworks.com/wp-content/uploads/2013/09/hive4.png
> > >
> > > Although I do not know how they are calculated, they are probably
> counting
> > > code generated by tests output, but besides that they are wrong.
> > >
> > > One claim is that Cloudera contributed 4,244 lines of code.
> > >
> > > So to debunk that claim:
> > >
> > > In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
> > > cloudera, created the ptest2 testing framework. He did all the work for
> > > ptest2 in hive 12, and it is clearly more then 4,244
> > >
> > > This consists of 84 java files
> > > [edward@desksandra ptest2]$ find . -name "*.java" | wc -l
> > > 84
> > > and by itself is 8001 lines of code.
> > > [edward@desksandra ptest2]$ find . -name "*.java" | xargs cat | wc -l
> > > 8001
> > >
> > > [edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
> > > 7902 HIVE-4675.patch
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB