Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Update on hadoop-0.23


Copy link to this message
-
Re: Update on hadoop-0.23
I am very glad that the development and testing of 0.23 is going so well.
I see a lot of commits and hundreds of changes going in literally every day.
It is great to see the new technology building!

On the criticism of the 0.22 release.
Arun has a top-down view and I agree a lot of progress have been
achieved with the framework.
My bottom-up view is that you first need a reliable storage layer. If
the file system looses blocks or worse messes up with the image and/or
journals, the performance of the framework is your second problem. I
have said that before. Based on my experience it take time to
stabilize a file system. Anybody seen one that has been stabilized in
less than 2 years?
I do not see the 0.22 release as a wasted effort. And if the progress
with it contributes to the 0.23 rush I am twice as happy.

Thanks,
--Konstantin

On Fri, Sep 30, 2011 at 3:00 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>
> On Sep 30, 2011, at 1:13 PM, Todd Lipcon wrote:
>
>> On Fri, Sep 30, 2011 at 11:44 AM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote:
>>> I apologize if my level of institutional knowledge of these things is
>>> lacking, but do you have any
>>> benchmarking results between 0.22 and 0.20.2xx? The reason I'm asking
>>> is twofold -- I really
>>> would like to see an objective numbers qualifying the viability of
>>> 0.22 from the performance stand point,
>>> but more importantly I would really like to include the benchmarking
>>> code into Bigtop.
>>
>> 0.22 currently suffers from MAPREDUCE-2266, which, last time I
>> benchmarked it, caused a significant slowdown. iirc a terasort ran
>> something like twice as slow on my test cluster due to this bug.
>> 0.23/MR2 doesn't suffer from this bug.
>>
>
> I don't really know where to start. CHANGES.txt in branch-0.20-security has the full list.
>
> If I remember right, long ago (late 2009)  we benchmarked .21 with gridmix and saw >30% prior to abandoning .21.
>
> Since then 0.20.2xx has had innumerable improvements to JobTracker, TaskTracker etc. etc.
> # JobTracker itself is almost thrice as fast as it used to be in 2009.
> # The scheduler is significantly better (>2x locality) and throughput.
> # TaskTracker has had innumerable fixes for dist.cache, task launch, shutdown (MR-2266 and lots of other similar fixes).
> # The MR runtime has fixes for latency on innumerable fronts.
>
> Other regressions:
> # Security
> # Support for multi-tenant clusters.
> # Tonnes of operability fixes (jobhistory, task logs i.e. MR-1100) for running MR clusters.
>
> The one redeeming aspect for .22 is the shuffle based on the work we did for winning Terasort/Petasort in 2009 but 0.23 has even more work there with zero-copy with netty (yaay! no more jetty! Thanks to @cdouglas).
>
>> In terms of bugs -- same question. Is there any publicly available
>> list of, at least, the critical
>> ones that make 0.22 not viable from your point of view?
>
> We marked a lot of them as blockers on .22 and they were discarded by the release master(s). branch-0.20-security/CHANGES.txt is the full list. I really can't spend time enumerating over 4000 commits and > 2000 (?) jiras to that branch at this point.
>
> In my opinion, as someone who has helped develop/run/support very large installs and done this for over 5 1/2 years, a major release with regression on features (security, multi-tenancy) and scalability, performance etc. is distinctly _unviable_.
>
> ----
>
> Again, none of this is meant to say you should invest time on fixing them or releasing 0.22 as it stands - just, please, don't label it in a manner which helps build unreasonable expectations among users about it's viability & usability.
>
> thanks,
> Arun
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB