Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Interesting claims that seem untrue


+
Edward Capriolo 2013-09-12, 14:19
+
Sanjay Subramanian 2013-09-12, 17:52
+
Dean Wampler 2013-09-12, 20:42
+
Navis류승우 2013-09-13, 01:09
+
Carl Steinbach 2013-09-16, 05:34
+
Carter Shanklin 2013-09-16, 20:58
+
Konstantin Boudnik 2013-09-17, 17:57
Copy link to this message
-
Re: Interesting claims that seem untrue
Whatever you count, you get more of :)
On Tue, Sep 17, 2013 at 1:57 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:

> Carter,
>
> what you are doing is essentially contradict ASF policy of "community over
> code".
>
> Perhaps, your intentions are good. However, LOC calculations or other silly
> contests are essentially driving a wedge between developers who happen to
> draw
> their paycheck from different commercial entities. Hadoop community passed
> through this already and it caused nothing but despair and bitterness
> between
> the people.
>
> Unlike some other popular contests, the number of lines contributed doesn't
> matter for most. Seriously.
>
> Regards,
>   Cos
>
> On Mon, Sep 16, 2013 at 01:58PM, Carter Shanklin wrote:
> > Ed,
> >
> > If nothing else I'm glad it was interesting enough to generate some
> > discussion. These sorts of stats are always subjects of a lot of
> > controversy. I have seen a lot of these sorts of charts float around in
> > confidential slide decks and I think it's good to have them out in the
> open
> > where anyone can critique and correct them.
> >
> > In this case Ed, you've pointed out a legitimate flaw in my analysis.
> Doing
> > the analysis again I found that previously, due to a bug in my scripts,
> > JIRAs that didn't have Hudson comments in them were not counted (this was
> > one way it was identifying SVN commit IDs which I have since removed due
> to
> > flakiness). Brock's patch was the single largest victim of this bug but
> not
> > the only one, there were some from Cloudera, NexR, Hortonworks, Facebook
> > even 2 from you Ed. The interested can see a full list of exclusions
> here:
> >
> https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0
> .
> > I apologize to those under-represented, there wasn't any intent on my
> part
> > to minimize anyone's work. The impact in final totals is Cloudera +5.4%,
> > NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the
> blog
> > later today with relevant corrections.
> >
> > There is going to be continued interest in seeing charts like these, for
> > example when Hive 12 is officially done. Sanjay suggested that LoC counts
> > may not be the best way to represent true contribution. I agree that not
> > all lines of code are created equal, for example a few monster patches
> > recently went in re-arranging HCatalog namespaces and I think also
> > indentation style. This (hopefully) mechanical work is not on the same
> > footing as adding new query language features. Still it is work and
> > wouldn't be fair to pretend it didn't happen. If anyone has ideas on
> better
> > ways to fairly capture contribution I'm open to suggestions.
> >
> >
> >
> > On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo <[EMAIL PROTECTED]
> >wrote:
> >
> > > I was reading the horton-works blog and found an interesting article.
> > >
> > >
> http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753
> > >
> > > There is a very interesting graphic which attempts to demonstrate
> lines of
> > > code in the 12 release.
> > > http://hortonworks.com/wp-content/uploads/2013/09/hive4.png
> > >
> > > Although I do not know how they are calculated, they are probably
> counting
> > > code generated by tests output, but besides that they are wrong.
> > >
> > > One claim is that Cloudera contributed 4,244 lines of code.
> > >
> > > So to debunk that claim:
> > >
> > > In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
> > > cloudera, created the ptest2 testing framework. He did all the work for
> > > ptest2 in hive 12, and it is clearly more then 4,244
> > >
> > > This consists of 84 java files
> > > [edward@desksandra ptest2]$ find . -name "*.java" | wc -l
> > > 84
> > > and by itself is 8001 lines of code.
> > > [edward@desksandra ptest2]$ find . -name "*.java" | xargs cat | wc -l
> > > 8001
> > >
> > > [edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
> > > 7902 HIVE-4675.patch
+
Lefty Leverenz 2013-09-17, 20:22