Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> symlink support in Hadoop 2 GA

Copy link to this message
Re: symlink support in Hadoop 2 GA
A side note on the protobuf versions, you can have a client and a server
using different versions of protobuf, that works and it works well. What
you cannot do is compile with protoc version X and run using the JAR from
version Y.
On Thu, Sep 19, 2013 at 2:11 AM, J. Rottinghuis <[EMAIL PROTECTED]>wrote:

> However painful protobuf version changes are at build time for Hadoop
> developers, at runtime with multiple clusters and many Hadoop users this is
> a total nightmare.
> Even upgrading clusters from one protobuf version to the next is going to
> be very difficult. The same users will run jobs on, and/or read&write to
> multiple clusters. That means that they will have to fork their code, run
> multiple instances? Or in the very least they have to do an update to their
> applications. All in sync with Hadoop cluster changes. And these are not
> doable in a rolling fashion.
> All Hadoop and HBase clusters will all upgrade at the same time, or we'll
> have to have our users fork / roll multiple versions ?
> My point is that these things are much harder than just fix the (Jenkins)
> build and we're done. These changes are massively disruptive.
> There is a similar situation with symlinks. Having an API that lets users
> create symlinks is very problematic. Some users create symlinks and as Eli
> pointed out, somebody else (or automated process) tries to copy to / from
> another (Hadoop 1.x?) cluster over hftp. What will happen ?
> Having an API that people should not use is also a nightmare. We
> experienced this with append. For a while it was there, but users were "not
> allowed to use it" (or else there were large #'s of corrupt blocks). If
> there is an API to create a symlink, then some of our users are going to
> use it and others are going to trip over those symlinks. We already know
> that Pig does not work with symlinks yet, and as Steve pointed out, there
> is tons of other code out there that assumes that !isDir() means isFile().
> I like symlink functionality, but in our migration to Hadoop 2.x this is a
> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> a) Not uprev until symlink support is figured out up and down the stack,
> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> (equivalent). Or
> b) rip out the API altogether. Or
> c) change the implementation to throw an UnsupportedOperationException
> I'm not sure yet which of these I like least.
> Thanks,
> Joep
> On Wed, Sep 18, 2013 at 9:48 AM, Arun C Murthy <[EMAIL PROTECTED]>
> wrote:
> >
> > On Sep 16, 2013, at 6:49 PM, Andrew Wang <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi all,
> > >
> > > I wanted to broadcast plans for putting the FileSystem symlinks work
> > > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
> > think
> > > it's pretty important we get it in since it's not a compatible change;
> if
> > > it misses the GA train, we're not going to have symlinks until the next
> > > major release.
> >
> > Just catching up, is this an incompatible change, or not? The above reads
> > 'not an incompatible change'.
> >
> > Arun
> >
> > >
> > > However, we're still dealing with ongoing issues revealed via testing.
> > > There's user-code out there that only handles files and directories and
> > > will barf when given a symlink (perhaps a dangling one!). See
> HADOOP-9912
> > > for a nice example where globStatus returning symlinks broke Pig; some
> of
> > > us had a conference call to talk it through, and one definite
> conclusion
> > > was that this wasn't solvable in a generally compatible manner.
> > >
> > > There are also still some gaps in symlink support right now. For
> example,
> > > the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need
> symlink
> > > resolution, and tooling like the FsShell and Distcp still need to be
> > > updated as well.
> > >
> > > So, there's definitely work to be done, but there are a lot of users
> > > interested in the feature, and symlinks really should be in GA. Would