|
|
-
Rejuvenate Hadoop 0.22 effort
Konstantin Shvachko 2011-09-16, 09:21
Hi everybody,
I think there is no need to change anything drastically with the plans for Hadoop 0.22 release, so I'll continue along the lines previously rendered by Nigel, discussed, and agreed upon within the community.
1. First thing, we need to resurrect Hadoop-0.22 Jenkins builds ASAP. Does anybody want to help with that? Any help is greatly appreciated.
2. I will start sorting out jiras currently assigned to 0.22. There are 10 blockers (again) over the three projects. Details below. My plan is to get the release candidate out late October.
3. Let's start discussing what else people think needs to be included in 0.22. I will include issues based on the following priorities - build fixes, - test failures, - bug fixes, - documentation - compatibility issues directed to making H-0.22 work with other project (HBase, Pig, Hive, Oozie) - minor improvements (irritating for users but simple) - no new features in 0.22.0, but I'd like to have a list of things which people would've considered for 0.22.1
4. I will use the following filter to watch the jiras assigned to the release: project in (HADOOP, HDFS, MAPREDUCE) AND resolution = Unresolved AND fixVersion = "0.22.0" ORDER BY priority DESC If you think an issue should be considered for inclusion please set fixVersion = "0.22.0". I will mark them as blockers based on the priorities above and my common sense. Note, if the jira is consciously assigned to a contributor it has high chance to make into the blockers.
== TESTING =5. I think Steve's idea of integrating 0.22 with Apache BigTop is great. Will be glad to see any steps in this direction.
6. Hadoop-0.22 is being tested since January 2011. We conducted some internal testing lately. Testing is proceeding now on a dev cluster. If anybody plans to setup a cluster for testing and wants to coordinate the efforts please ping me.
== 10 BLOCKERS =7. There are 10 official blockers.
Key Assignee Summary MAPREDUCE-1991 Todd Lipcon taskcontroller allows stealing permissions on any local file MAPREDUCE-2178 Devaraj Das Race condition in LinuxTaskController permissions handling MAPREDUCE-2266 Unassigned JvmManager sleeps between SIGTERM and SIGKILL while holding many TT locks
I will unblock TaskController issues as per discussion related to MAPREDUCE-2767.
MAPREDUCE-1100 Vinod Kumar User's task-logs filling up local disks on the TaskTrackers MAPREDUCE-1716 Vinod Kumar MAPREDUCE-1100 Truncate logs of finished tasks to prevent node thrash due to excessive logging
Don't see any activity from Vinod. Any volunteers to port this to 0.22?
HADOOP-7035 Tom White Document incompatible API changes between releases
Looks like close to completion. Tom are you still on it?
MAPREDUCE-2268 Todd Lipcon With JVM reuse, JvmManager doesn't delete last workdir properly
Todd, is it a blocker? Do you plan to fix it soon?
MAPREDUCE-1506 Unassigned Assertion failure in TestTaskTrackerMemoryManager
Will unblock, as no volunteers emerged.
HDFS-1967 Unassigned HDFS-1852 TestHDFSTrash failing on trunk and 22 HDFS-2012 Unassigned Recurring failure of TestBalancer on branch-0.22
Don't see failures anymore. Will follow up when Jenkins builds are restored
HDFS-2290 Benoy Antony Block with corrupt replica is not getting replicated
Close to completion.
Thanks, --Konstantin
+
Konstantin Shvachko 2011-09-16, 09:21
-
Re: Rejuvenate Hadoop 0.22 effort
Owen O'Malley 2011-09-16, 16:10
Konst, You should take a look at the evaluate, not just the things that were marked as blockers 6 months ago, but also look at the things that have gone out in the 0.20.2xx line that aren't in 0.22.
Areas of work that leap to mind: 1. fixes to the linux task controller. 2. rpm work 3. mr scheduler limits 4. capacity and fair share improvements 5. har improvements
-- Owen
+
Owen O'Malley 2011-09-16, 16:10
-
Re: Rejuvenate Hadoop 0.22 effort
Konstantin Shvachko 2011-09-17, 06:25
Thanks Owen. Will definitely look at those. --Konstantin
On Fri, Sep 16, 2011 at 9:10 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > Konst, > You should take a look at the evaluate, not just the things that > were marked as blockers 6 months ago, but also look at the things that > have gone out in the 0.20.2xx line that aren't in 0.22. > > Areas of work that leap to mind: > 1. fixes to the linux task controller. > 2. rpm work > 3. mr scheduler limits > 4. capacity and fair share improvements > 5. har improvements > > -- Owen >
+
Konstantin Shvachko 2011-09-17, 06:25
-
Re: Rejuvenate Hadoop 0.22 effort
Roman Shaposhnik 2011-09-24, 00:34
On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: > == TESTING => 5. I think Steve's idea of integrating 0.22 with Apache BigTop is > great. Will be glad to see any steps in this direction.
The basic integration is done. We can produce fully functional RPM and DEB packages for Hadoop 0.22 release.
This is good news. The bad news is that very few downstream components can be compiled against .22. And I'm not talking changes to versions, pom.xml and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout all need to be modified to support .22. Before that's done -- there's little that can be done as far as stack validation is concerned.
Given that work needs to be done in downstream components, I've got 2 questions: 1. do we know if the API delta between .22 and .23 is as significant as betwen .22 and .20.2?
2. what's the common approach downstream to support multiple versions of Hadoop APIs? Or is this even something that can be asked of all the components?
Thanks, Roman.
+
Roman Shaposhnik 2011-09-24, 00:34
-
Re: Rejuvenate Hadoop 0.22 effort
Konstantin Boudnik 2011-09-24, 02:40
I'd say let's take a look at how bad are the problems; what are discrepancies?
Do you have any build links or some such to point to? Cos On Fri, Sep 23, 2011 at 05:34PM, Roman Shaposhnik wrote: > On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko > <[EMAIL PROTECTED]> wrote: > > == TESTING => > 5. I think Steve's idea of integrating 0.22 with Apache BigTop is > > great. Will be glad to see any steps in this direction. > > The basic integration is done. We can produce fully functional RPM > and DEB packages for Hadoop 0.22 release. > > This is good news. The bad news is that very few downstream components > can be compiled against .22. And I'm not talking changes to versions, pom.xml > and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout > all need to be modified to support .22. Before that's done -- there's > little that > can be done as far as stack validation is concerned. > > Given that work needs to be done in downstream components, I've got 2 questions: > 1. do we know if the API delta between .22 and .23 is as > significant as betwen > .22 and .20.2? > > 2. what's the common approach downstream to support multiple versions of > Hadoop APIs? Or is this even something that can be asked of all the > components? > > Thanks, > Roman.
+
Konstantin Boudnik 2011-09-24, 02:40
-
Re: Rejuvenate Hadoop 0.22 effort
Roman Shaposhnik 2011-09-26, 06:03
On Fri, Sep 23, 2011 at 7:40 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote: > I'd say let's take a look at how bad are the problems; what are discrepancies?
Excellent point! In fact, let me hook these jobs to Bigtop's Jenkis so that this info gets to be seen by anybody who wants to. I'll take care of it tomorrow.
Thanks, Roman.
+
Roman Shaposhnik 2011-09-26, 06:03
+
Roman Shaposhnik 2011-09-27, 19:20
-
Re: Rejuvenate Hadoop 0.22 effort
Konstantin Shvachko 2011-09-26, 01:02
Good news Roman! About connecting with other projects. HBase is compiling with 0.22. For Pig there is https://issues.apache.org/jira/browse/PIG-2277For Hive created https://issues.apache.org/jira/browse/HIVE-2468The direction with Hive and Pig is to create shim layers for different versions. Don't know about the API delta between .22 and .23 yet. I assume it is less than 0.20 vs 0.22. But I may be wrong. --Konstantin On Fri, Sep 23, 2011 at 5:34 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > On Fri, Sep 16, 2011 at 2:21 AM, Konstantin Shvachko > <[EMAIL PROTECTED]> wrote: >> == TESTING =>> 5. I think Steve's idea of integrating 0.22 with Apache BigTop is >> great. Will be glad to see any steps in this direction. > > The basic integration is done. We can produce fully functional RPM > and DEB packages for Hadoop 0.22 release. > > This is good news. The bad news is that very few downstream components > can be compiled against .22. And I'm not talking changes to versions, pom.xml > and build.xml files. I'm talking API incompatibilities. Pig, Hive, HBase, Mahout > all need to be modified to support .22. Before that's done -- there's > little that > can be done as far as stack validation is concerned. > > Given that work needs to be done in downstream components, I've got 2 questions: > 1. do we know if the API delta between .22 and .23 is as > significant as betwen > .22 and .20.2? > > 2. what's the common approach downstream to support multiple versions of > Hadoop APIs? Or is this even something that can be asked of all the > components? > > Thanks, > Roman. >
+
Konstantin Shvachko 2011-09-26, 01:02
-
Re: Rejuvenate Hadoop 0.22 effort
Roman Shaposhnik 2011-09-26, 05:25
On Sun, Sep 25, 2011 at 6:02 PM, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: > Good news Roman! > > About connecting with other projects. > HBase is compiling with 0.22. That is trunk of HBase right? IOW, we don't really have a released version that is compatible with .22? > For Pig there is > https://issues.apache.org/jira/browse/PIG-2277> For Hive created > https://issues.apache.org/jira/browse/HIVE-2468> > The direction with Hive and Pig is to create shim layers for different versions. IOW, a single build of Hive and Pig being able to communicate with different versions of Hadoop? This is fine, but sound more time consuming than what HBase is doing (providing profiles to build against different versions of Hadoop). Regardless of how time consuming either approach is, I guess my fundamental question would be -- do we have any kind of commitment from the downstream guys to have a release compatible with .22? I guess I'm just wondering how these timelines of downstream components will affect usability of any Hadoop release (be it .22 or .23). Any thoughts on that? Thanks, Roman.
+
Roman Shaposhnik 2011-09-26, 05:25
-
Re: Rejuvenate Hadoop 0.22 effort
Konstantin Shvachko 2011-09-26, 16:51
>> About connecting with other projects. >> HBase is compiling with 0.22. > > That is trunk of HBase right? IOW, we don't really have a released version > that is compatible with .22? I mean HBase 0.92, which was branched recently. >> For Pig there is >> https://issues.apache.org/jira/browse/PIG-2277>> For Hive created >> https://issues.apache.org/jira/browse/HIVE-2468>> >> The direction with Hive and Pig is to create shim layers for different versions. > > IOW, a single build of Hive and Pig being able to communicate > with different versions of Hadoop? > > This is fine, but sound more time consuming than what HBase is > doing (providing profiles to build against different versions of Hadoop). > > Regardless of how time consuming either approach is, I guess my > fundamental question would be -- do we have any kind of commitment > from the downstream guys to have a release compatible with .22? > > I guess I'm just wondering how these timelines of downstream > components will affect usability of any Hadoop release (be it .22 or .23). > Any thoughts on that? > > Thanks, > Roman. >
+
Konstantin Shvachko 2011-09-26, 16:51
|
|