|
|
-
Need to add fs shim to use QFS
Thilee Subramaniam 2012-10-05, 17:27
We at Quantcast have released QFS 1.0 (Quantcast File System) to open source. This is based on the KFS 0.5 (Kosmos Distributed File System), a C++ distributed filesystem implementation. KFS plugs into Apache Hadoop via the 'kfs' shim that is part of Hadoop codebase. QFS has added support for permissions, and also, provides fault tolerance through Reed-Solomon encoding as well as replication. There are also a number of performance and stability improvements, including a rewrite of the client library to allow parallel concurrent I/Os. Going forward, new releases of KFS will come from QFS. The open source release of QFS is at http://quantcast.github.com/qfsQFS plugs into Apache Hadoop the same way KFS does. Currently, one would apply the patches or JARs from the QFS source tree onto Apache Hadoop to make Hadoop use QFS. The patch for Apache Hadoop 1.0.X can be found at https://github.com/quantcast/qfs/blob/master/hadoop/hadoop-1.0.X.patchIn order to make the integration seamless, we would like to add a 'qfs' shim to Apache Hadoop so that the current active branches (1.0.X, 2.X.X, 0.23.X) of Apache Hadoop can use QFS. Towards this, I've submited an ASF JIRA feature ticket (HADOOP-8885) under hadoop-common project, and send a pull request with the QFS shim changes to https://github.com/apache/hadoop-common/tree/branch-1.0.2I will subsequently submit pull requests to the other active Hadoop branches. If you have any question, I will be happy yo answer or provide more details on QFS. - Thilee
+
Thilee Subramaniam 2012-10-05, 17:27
-
Re: Need to add fs shim to use QFS
Steve Loughran 2012-10-09, 19:22
On 5 October 2012 18:27, Thilee Subramaniam <[EMAIL PROTECTED]> wrote:
> We at Quantcast have released QFS 1.0 (Quantcast File System) to open > source. This is based on the KFS 0.5 (Kosmos Distributed File System), > a C++ distributed filesystem implementation. KFS plugs into Apache > Hadoop via the 'kfs' shim that is part of Hadoop codebase. > > QFS has added support for permissions, and also, provides fault tolerance > through Reed-Solomon encoding as well as replication. There are also a > number of performance and stability improvements, including a rewrite of > the client library to allow parallel concurrent I/Os. Going forward, new > releases of KFS will come from QFS. > > Does this mean the kfs plugin can go from the apache tree? One problem that we've always had with KFS is that nobody ever tested the filesystem, and it was inevitably out of sync with what was in KFS.
Have you considered just pulling the kfs lib out and releasing the bridge classes yourself? It's what the other FS suppliers do, as it gives them more control over the libraries, including the ability to release more often.
-steve
+
Steve Loughran 2012-10-09, 19:22
-
Re: Need to add fs shim to use QFS
Harsh J 2012-10-10, 07:05
Hi Steve, Check out https://issues.apache.org/jira/browse/HADOOP-8886 for the KFS removal. Seems relevant to your question here. On Wed, Oct 10, 2012 at 12:52 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > On 5 October 2012 18:27, Thilee Subramaniam <[EMAIL PROTECTED]> wrote: > >> We at Quantcast have released QFS 1.0 (Quantcast File System) to open >> source. This is based on the KFS 0.5 (Kosmos Distributed File System), >> a C++ distributed filesystem implementation. KFS plugs into Apache >> Hadoop via the 'kfs' shim that is part of Hadoop codebase. >> >> QFS has added support for permissions, and also, provides fault tolerance >> through Reed-Solomon encoding as well as replication. There are also a >> number of performance and stability improvements, including a rewrite of >> the client library to allow parallel concurrent I/Os. Going forward, new >> releases of KFS will come from QFS. >> >> > Does this mean the kfs plugin can go from the apache tree? > > > One problem that we've always had with KFS is that nobody ever tested the > filesystem, and it was inevitably out of sync with what was in KFS. > > Have you considered just pulling the kfs lib out and releasing the bridge > classes yourself? It's what the other FS suppliers do, as it gives them > more control over the libraries, including the ability to release more > often. > > -steve -- Harsh J
+
Harsh J 2012-10-10, 07:05
-
Re: Need to add fs shim to use QFS
Thilee Subramaniam 2012-10-10, 15:03
Hi Steve, Like Harsh said, HADOOP-8886 addresses removing KFS from apache tree. But I interpret your suggestion as 'moving qfs.jar out of apache tree, and keeping the jar in possibly a maven repo externally. The new fs shim for QFS will pull this jar from the repo upon compilation etc.'. Please correct me if I am wrong. Thanks, - Thilee On 10/10/12 12:05 AM, "Harsh J" <[EMAIL PROTECTED]> wrote: >Hi Steve, > >Check out https://issues.apache.org/jira/browse/HADOOP-8886 for the >KFS removal. Seems relevant to your question here. > >On Wed, Oct 10, 2012 at 12:52 AM, Steve Loughran <[EMAIL PROTECTED]> >wrote: >> On 5 October 2012 18:27, Thilee Subramaniam <[EMAIL PROTECTED]> >>wrote: >> >>> We at Quantcast have released QFS 1.0 (Quantcast File System) to open >>> source. This is based on the KFS 0.5 (Kosmos Distributed File System), >>> a C++ distributed filesystem implementation. KFS plugs into Apache >>> Hadoop via the 'kfs' shim that is part of Hadoop codebase. >>> >>> QFS has added support for permissions, and also, provides fault >>>tolerance >>> through Reed-Solomon encoding as well as replication. There are also a >>> number of performance and stability improvements, including a rewrite >>>of >>> the client library to allow parallel concurrent I/Os. Going forward, >>>new >>> releases of KFS will come from QFS. >>> >>> >> Does this mean the kfs plugin can go from the apache tree? >> >> >> One problem that we've always had with KFS is that nobody ever tested >>the >> filesystem, and it was inevitably out of sync with what was in KFS. >> >> Have you considered just pulling the kfs lib out and releasing the >>bridge >> classes yourself? It's what the other FS suppliers do, as it gives them >> more control over the libraries, including the ability to release more >> often. >> >> -steve > > > >-- >Harsh J
+
Thilee Subramaniam 2012-10-10, 15:03
-
Re: Need to add fs shim to use QFS
Steve Loughran 2012-10-10, 19:14
On 10 October 2012 16:03, Thilee Subramaniam <[EMAIL PROTECTED]> wrote:
> Hi Steve, > > Like Harsh said, HADOOP-8886 addresses removing KFS from apache tree. > > But I interpret your suggestion as 'moving qfs.jar out of apache tree, and > keeping the jar in possibly a maven repo externally. The new fs shim for > QFS will pull this jar from the repo upon compilation etc.'. >
I think the main question is if its external, what actually needs to be done w.r.t. Hadoop's build itself. Is your goal to include the shim JAR in the normal Hadoop releases? As that's what I'm not sure is needed -no other non-HDFS filesystem needs that to work with Hadoop. What matters more is to hook a local Jenkins server to do the nightly builds and tests of hadoop branch-1 and trunk onto your filesystem, test out all the releases, and then file bug reports if there have been any regressions.
+
Steve Loughran 2012-10-10, 19:14
-
Re: Need to add fs shim to use QFS
Eli Collins 2012-10-10, 19:21
On Wed, Oct 10, 2012 at 12:14 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > On 10 October 2012 16:03, Thilee Subramaniam <[EMAIL PROTECTED]> wrote: > >> Hi Steve, >> >> Like Harsh said, HADOOP-8886 addresses removing KFS from apache tree. >> >> But I interpret your suggestion as 'moving qfs.jar out of apache tree, and >> keeping the jar in possibly a maven repo externally. The new fs shim for >> QFS will pull this jar from the repo upon compilation etc.'. >> > > I think the main question is if its external, what actually needs to be > done w.r.t. Hadoop's build itself. Is your goal to include the shim JAR in > the normal Hadoop releases? As that's what I'm not sure is needed -no other > non-HDFS filesystem needs that to work with Hadoop. What matters more is to > hook a local Jenkins server to do the nightly builds and tests of hadoop > branch-1 and trunk onto your filesystem, test out all the releases, and > then file bug reports if there have been any regressions.
Good point Steve. This touches on the larger issue of whether it makes sense to host FS clients for other file systems in Hadoop itself. I agree with what I think you're getting which is - if we can handle the testing and integration via external dependencies it would probably be better to have the Hadoop client code live and ship as part of the other projects since it's more likely to be maintained there. Perhaps start a DISCUSS thread on common-dev since this pertains to other file systems aside from QFS?
Thanks, Eli
+
Eli Collins 2012-10-10, 19:21
-
Re: Need to add fs shim to use QFS
Steve Loughran 2012-10-11, 08:59
Good point Steve. This touches on the larger issue of whether it > makes sense to host FS clients for other file systems in Hadoop > itself. I agree with what I think you're getting which is - if we can > handle the testing and integration via external dependencies it would > probably be better to have the Hadoop client code live and ship as > part of the other projects since it's more likely to be maintained > there. Perhaps start a DISCUSS thread on common-dev since this > pertains to other file systems aside from QFS? > > Seems reasonable -I'll let you start it. We had this problem with Ant; I'm sure the Maven team hit it too: at first having lots of libraries that bond to external apps makes sense, because nobody else will do them for you. As your application becomes more successful, those obscure tasks become a liability as nobody ever regression tests them, most people don't even have a setup to run them by hand -and you fear support issues related to them as they will be non-reproducible, let alone fixable. Looking at the Ant task list, <netrexxc> and <wljspc> spring to mind -the latter has only ever been tested in WinNT4 and Solaris 5.x, meaning nobody has actually run it it since 2001 and Windows XP hitting the market. http://ant.apache.org/manual/Tasks/wljspc.htmlThe fact that there are no open JIRAs related to KFS are probably a metric of its use -again an argument for pushing the work out to the KFS team -though they will need to work with bigtop to ensure that an RPM can install the kfs support into /usr/lib/hadoop/lib
+
Steve Loughran 2012-10-11, 08:59
-
Re: Need to add fs shim to use QFS
Thilee Subramaniam 2012-10-10, 23:34
On 10/10/12 12:14 PM, "Steve Loughran" <[EMAIL PROTECTED]> wrote:
>On 10 October 2012 16:03, Thilee Subramaniam <[EMAIL PROTECTED]> wrote: > >> Hi Steve, >> >> Like Harsh said, HADOOP-8886 addresses removing KFS from apache tree. >> >> But I interpret your suggestion as 'moving qfs.jar out of apache tree, >>and >> keeping the jar in possibly a maven repo externally. The new fs shim for >> QFS will pull this jar from the repo upon compilation etc.'. >> > >I think the main question is if its external, what actually needs to be >done w.r.t. Hadoop's build itself. Is your goal to include the shim JAR in >the normal Hadoop releases? As that's what I'm not sure is needed -no >other >non-HDFS filesystem needs that to work with Hadoop. What matters more is >to >hook a local Jenkins server to do the nightly builds and tests of hadoop >branch-1 and trunk onto your filesystem, test out all the releases, and >then file bug reports if there have been any regressions.
My initial goal was to make Hadoop use QFS the same way it used KFS. Since Hadoop branch-1 had lib/kfs.xx.jar, I was expecting to include a qfs.x.x.jar in the Hadoop release; my first patch was to use such jar. But now I see that Hadoop trunk links to external maven repos.
It may be reasonable to link qfs.jar from an external source (I haven't yet figured out how to serve the maven repo from github for qfs.jar for Hadoop - any help on this will be appreciated). This way your nightly builds will work and the tests can catch any qfs related regressions.
+
Thilee Subramaniam 2012-10-10, 23:34
-
Re: Need to add fs shim to use QFS
Steve Loughran 2012-10-11, 15:01
On 11 October 2012 00:34, Thilee Subramaniam <[EMAIL PROTECTED]> wrote:
> > > My initial goal was to make Hadoop use QFS the same way it used KFS. Since > Hadoop branch-1 had lib/kfs.xx.jar, I was expecting to include a > qfs.x.x.jar in the Hadoop release; my first patch was to use such jar. But > now I see that Hadoop trunk links to external maven repos. > > It may be reasonable to link qfs.jar from an external source (I haven't > yet figured out how to serve the maven repo from github for qfs.jar for > Hadoop - any help on this will be appreciated).
The issue is not so much where qfs.jar comes from as where the implementation of Hadoop FileSystem goes -the big question being: should it live in your (OSS) codebase rather than Hadoop's? That would keep it 100% in sync with your back end, give you the release schedule to suit you, and ensure that bugreps end up in your bug reporting tools. > This way your nightly > builds will work and the tests can catch any qfs related regressions. >
That regression testing is going to have be on QFS -which means your infrastructure, real or virtual.
+
Steve Loughran 2012-10-11, 15:01
-
Re: Need to add fs shim to use QFS
Thilee Subramaniam 2012-10-26, 00:24
On 10/11/12 8:01 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote: >On 11 October 2012 00:34, Thilee Subramaniam <[EMAIL PROTECTED]> wrote: > >> >> >> My initial goal was to make Hadoop use QFS the same way it used KFS. >>Since >> Hadoop branch-1 had lib/kfs.xx.jar, I was expecting to include a >> qfs.x.x.jar in the Hadoop release; my first patch was to use such jar. >>But >> now I see that Hadoop trunk links to external maven repos. >> >> It may be reasonable to link qfs.jar from an external source (I haven't >> yet figured out how to serve the maven repo from github for qfs.jar for >> Hadoop - any help on this will be appreciated). > > > >The issue is not so much where qfs.jar comes from as where the >implementation of Hadoop FileSystem goes -the big question being: should >it >live in your (OSS) codebase rather than Hadoop's? That would keep it 100% >in sync with your back end, give you the release schedule to suit you, and >ensure that bugreps end up in your bug reporting tools. > > >> This way your nightly >> builds will work and the tests can catch any qfs related regressions. >> > >That regression testing is going to have be on QFS -which means your >infrastructure, real or virtual. We have made the changes recommended here, and made available a 'Hadoop QFS jar' with QFS. This plugin and the QFS libraries will be maintained & released by the QFS open-source project. Please see the download and usage instructions at https://github.com/quantcast/qfs/wiki/Migration-GuideThe QFS tarball contains a hadoop-qfs jar each for Hadoop 0.23.4, 1.0.2, 1.0.4, 1.1.0, and 2.0.2-alpha. Since the interfaces seem similar, I am not sure if this is an overkill: one each for trunk and branch1 may suffice. Could you comment on this? Also, is there documentation on Apache Hadoop website that describe available alternatives to HDFS (or how to add an alternative file system to Hadoop)? Please let us know. The JIRA issue HADOOP-8885 ( https://issues.apache.org/jira/browse/HADOOP-8885) can be closed now. Please advise whether I should close it or one of the admins will close it. Thanks, -Thilee
+
Thilee Subramaniam 2012-10-26, 00:24
-
Re: Need to add fs shim to use QFS
Steve Loughran 2012-10-26, 17:32
On 26 October 2012 01:24, Thilee Subramaniam <[EMAIL PROTECTED]> wrote: > > We have made the changes recommended here, and made available a 'Hadoop > QFS jar' with QFS. This plugin and the QFS libraries will be maintained & > released by the QFS open-source project. > > Please see the download and usage instructions at > https://github.com/quantcast/qfs/wiki/Migration-Guide> > The QFS tarball contains a hadoop-qfs jar each for Hadoop 0.23.4, 1.0.2, > 1.0.4, 1.1.0, and 2.0.2-alpha. Since the interfaces seem similar, I am not > sure if this is an overkill: one each for trunk and branch1 may suffice. > Could you comment on this? > > Java is a lot more forgiving than C/C++; one built against 1.0.4 should suffice for all; if you are being over cautious, branch-1 and trunk should be enough > Also, is there documentation on Apache Hadoop website that describe > available alternatives to HDFS (or how to add an alternative file system > to Hadoop)? Please let us know. > > If there isn't something on wiki.apache.org/hadoop there should be: create a login there, then email back your username and you can have the editor rights to put something up -I'd suggest a page on "Alternate Filesystems" -steve
+
Steve Loughran 2012-10-26, 17:32
-
Re: Need to add fs shim to use QFS
Eli Collins 2012-10-05, 17:32
Hey Thilee, Thanks for contributing. We don't process pull request on the git mirrors, please upload a patch against trunk and branch-1 if you'd like this included in Hadoop 1.x and 2.x releases. More info here: http://wiki.apache.org/hadoop/HowToContributeThanks, Eli On Fri, Oct 5, 2012 at 10:27 AM, Thilee Subramaniam <[EMAIL PROTECTED]> wrote: > We at Quantcast have released QFS 1.0 (Quantcast File System) to open > source. This is based on the KFS 0.5 (Kosmos Distributed File System), > a C++ distributed filesystem implementation. KFS plugs into Apache > Hadoop via the 'kfs' shim that is part of Hadoop codebase. > > QFS has added support for permissions, and also, provides fault tolerance > through Reed-Solomon encoding as well as replication. There are also a > number of performance and stability improvements, including a rewrite of > the client library to allow parallel concurrent I/Os. Going forward, new > releases of KFS will come from QFS. > > The open source release of QFS is at http://quantcast.github.com/qfs> > QFS plugs into Apache Hadoop the same way KFS does. Currently, one would > apply the patches or JARs from the QFS source tree onto Apache Hadoop to > make Hadoop use QFS. The patch for Apache Hadoop 1.0.X can be found at > https://github.com/quantcast/qfs/blob/master/hadoop/hadoop-1.0.X.patch> > In order to make the integration seamless, we would like to add a 'qfs' > shim to Apache Hadoop so that the current active branches (1.0.X, 2.X.X, > 0.23.X) of Apache Hadoop can use QFS. > > Towards this, I've submited an ASF JIRA feature ticket (HADOOP-8885) under > hadoop-common project, and send a pull request with the QFS shim changes > to https://github.com/apache/hadoop-common/tree/branch-1.0.2> > I will subsequently submit pull requests to the other active Hadoop > branches. > > If you have any question, I will be happy yo answer or provide more > details on QFS. > > - Thilee >
+
Eli Collins 2012-10-05, 17:32
-
Re: Need to add fs shim to use QFS
Thilee Subramaniam 2012-10-05, 23:27
Thank you Eli. I have just uploaded the patch for branch-1 to JIRA. I'll do the same for trunk soon. If you have any comments or questions on the patch please let me know. Thanks, - Thilee On 10/5/12 10:32 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >Hey Thilee, > >Thanks for contributing. We don't process pull request on the git >mirrors, please upload a patch against trunk and branch-1 if you'd >like this included in Hadoop 1.x and 2.x releases. More info here: > http://wiki.apache.org/hadoop/HowToContribute> >Thanks, >Eli > >On Fri, Oct 5, 2012 at 10:27 AM, Thilee Subramaniam ><[EMAIL PROTECTED]> wrote: >> We at Quantcast have released QFS 1.0 (Quantcast File System) to open >> source. This is based on the KFS 0.5 (Kosmos Distributed File System), >> a C++ distributed filesystem implementation. KFS plugs into Apache >> Hadoop via the 'kfs' shim that is part of Hadoop codebase. >> >> QFS has added support for permissions, and also, provides fault >>tolerance >> through Reed-Solomon encoding as well as replication. There are also a >> number of performance and stability improvements, including a rewrite of >> the client library to allow parallel concurrent I/Os. Going forward, new >> releases of KFS will come from QFS. >> >> The open source release of QFS is at http://quantcast.github.com/qfs>> >> QFS plugs into Apache Hadoop the same way KFS does. Currently, one would >> apply the patches or JARs from the QFS source tree onto Apache Hadoop to >> make Hadoop use QFS. The patch for Apache Hadoop 1.0.X can be found at >> https://github.com/quantcast/qfs/blob/master/hadoop/hadoop-1.0.X.patch>> >> In order to make the integration seamless, we would like to add a 'qfs' >> shim to Apache Hadoop so that the current active branches (1.0.X, 2.X.X, >> 0.23.X) of Apache Hadoop can use QFS. >> >> Towards this, I've submited an ASF JIRA feature ticket (HADOOP-8885) >>under >> hadoop-common project, and send a pull request with the QFS shim changes >> to https://github.com/apache/hadoop-common/tree/branch-1.0.2>> >> I will subsequently submit pull requests to the other active Hadoop >> branches. >> >> If you have any question, I will be happy yo answer or provide more >> details on QFS. >> >> - Thilee >>
+
Thilee Subramaniam 2012-10-05, 23:27
-
Re: Need to add fs shim to use QFS
Thilee Subramaniam 2012-10-09, 17:24
Hi Eli, I have attached two patches and submitted them as well. Will the integration engine automatically process 'branch-1' patch or should I do anything spcific for released versions? I set the following for the "Fix versions" field upon patch submission: 1.0.2, 1.0.3, 0.23.3, 2.0.0-alpha, 2.0.1-alpha and 3.0.0. Please let me know if its allright. Thanks, - Thilee On 10/5/12 10:32 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >Hey Thilee, > >Thanks for contributing. We don't process pull request on the git >mirrors, please upload a patch against trunk and branch-1 if you'd >like this included in Hadoop 1.x and 2.x releases. More info here: > http://wiki.apache.org/hadoop/HowToContribute> >Thanks, >Eli > >On Fri, Oct 5, 2012 at 10:27 AM, Thilee Subramaniam ><[EMAIL PROTECTED]> wrote: >> We at Quantcast have released QFS 1.0 (Quantcast File System) to open >> source. This is based on the KFS 0.5 (Kosmos Distributed File System), >> a C++ distributed filesystem implementation. KFS plugs into Apache >> Hadoop via the 'kfs' shim that is part of Hadoop codebase. >> >> QFS has added support for permissions, and also, provides fault >>tolerance >> through Reed-Solomon encoding as well as replication. There are also a >> number of performance and stability improvements, including a rewrite of >> the client library to allow parallel concurrent I/Os. Going forward, new >> releases of KFS will come from QFS. >> >> The open source release of QFS is at http://quantcast.github.com/qfs>> >> QFS plugs into Apache Hadoop the same way KFS does. Currently, one would >> apply the patches or JARs from the QFS source tree onto Apache Hadoop to >> make Hadoop use QFS. The patch for Apache Hadoop 1.0.X can be found at >> https://github.com/quantcast/qfs/blob/master/hadoop/hadoop-1.0.X.patch>> >> In order to make the integration seamless, we would like to add a 'qfs' >> shim to Apache Hadoop so that the current active branches (1.0.X, 2.X.X, >> 0.23.X) of Apache Hadoop can use QFS. >> >> Towards this, I've submited an ASF JIRA feature ticket (HADOOP-8885) >>under >> hadoop-common project, and send a pull request with the QFS shim changes >> to https://github.com/apache/hadoop-common/tree/branch-1.0.2>> >> I will subsequently submit pull requests to the other active Hadoop >> branches. >> >> If you have any question, I will be happy yo answer or provide more >> details on QFS. >> >> - Thilee >>
+
Thilee Subramaniam 2012-10-09, 17:24
|
|