|
Eli Collins
2012-03-21, 00:37
Dave Shine
2012-03-21, 12:36
Milind.Bhandarkar@...
2012-03-21, 17:17
Eli Collins
2012-03-21, 17:32
Eli Collins
2012-03-21, 17:33
Milind.Bhandarkar@...
2012-03-21, 17:47
Eli Collins
2012-03-21, 18:27
Tim Broberg
2012-03-21, 18:31
Eli Collins
2012-03-21, 18:52
Dave Shine
2012-03-21, 19:07
Milind.Bhandarkar@...
2012-03-21, 19:30
Milind.Bhandarkar@...
2012-03-21, 19:48
Milind.Bhandarkar@...
2012-03-21, 20:24
Tsz Wo Sze
2012-03-21, 20:31
Sanjay Radia
2012-03-21, 20:57
Eli Collins
2012-03-21, 20:58
Eli Collins
2012-03-21, 21:08
Eli Collins
2012-03-21, 21:09
Eli Collins
2012-03-21, 21:14
Milind.Bhandarkar@...
2012-03-21, 22:06
Eli Collins
2012-03-21, 22:16
Milind.Bhandarkar@...
2012-03-21, 22:48
Eli Collins
2012-03-21, 23:30
Konstantin Shvachko
2012-03-22, 08:11
Konstantin Shvachko
2012-03-22, 08:26
Daryn Sharp
2012-03-22, 17:15
Eli Collins
2012-03-22, 17:25
Eli Collins
2012-03-22, 17:47
Milind.Bhandarkar@...
2012-03-22, 23:27
Tsz Wo Sze
2012-03-23, 00:03
Eli Collins
2012-03-23, 00:41
Eli Collins
2012-03-23, 00:49
Dhruba Borthakur
2012-03-23, 01:18
CHANG Lei
2012-03-23, 06:22
Daryn Sharp
2012-03-23, 17:03
Scott Carey
2012-03-24, 02:26
Scott Carey
2012-03-24, 02:44
Scott Carey
2012-03-24, 02:46
Colin McCabe
2012-03-26, 19:53
Colin McCabe
2012-03-26, 20:02
Scott Carey
2012-03-26, 20:53
Tsz Wo Sze
2012-03-26, 20:55
Colin McCabe
2012-03-26, 21:31
Tsz Wo Sze
2012-03-27, 02:46
|
-
[DISCUSS] Remove append?Eli Collins 2012-03-21, 00:37
Hey gang,
I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. Where we are today.. append was added in the 0.17-19 releases (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. It and sync were re-designed, re-implemented, and shipped in 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). Rationale follows.. Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part. GFS, which HDFS' design is based on, has append (and atomic record append) so obviously a workable design does not preclude append. However we also should not ape the GFS feature set simply because it exists. I've had conversations with people who worked on GFS that regret adding record append (see also http://queue.acm.org/detail.cfm?id=1594206). In short, unless append is a real priority for our users I think we should focus our energy elsewhere. Thanks, Eli
-
RE: [DISCUSS] Remove append?Dave Shine 2012-03-21, 12:36
I am not a contributor to this project, so I don't know how much weight my opinion carries. But I have been hoping to see append become stable soon. We are constantly dealing with the "small file problem", and I have written M/R jobs to periodically roll up lots of small files into a few small ones. Having append would prevent me from needing to use up cluster resources performing these tasks.
Therefore, all things being equal I +1 making append work. However, if the level of complexity is as bad as Eli implies below, then I can understand that perhaps it is not worth the effort. If it will cause too much technical debt, then removing it makes sense. But don't just remove it because you don't believe there is a need for it. Thanks, Dave Shine -----Original Message----- From: Eli Collins [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 20, 2012 8:38 PM To: [EMAIL PROTECTED] Subject: [DISCUSS] Remove append? Hey gang, I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. Where we are today.. append was added in the 0.17-19 releases (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. It and sync were re-designed, re-implemented, and shipped in 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). Rationale follows.. Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part. GFS, which HDFS' design is based on, has append (and atomic record append) so obviously a workable design does not preclude append. However we also should not ape the GFS feature set simply because it exists. I've had conversations with people who worked on GFS that regret adding record append (see also http://queue.acm.org/detail.cfm?id=1594206). In short, unless append is a real priority for our users I think we should focus our energy elsewhere. Thanks, Eli The information contained in this email message is considered confidential and proprietary to the sender and is intended solely for review and use by the named recipient. Any unauthorized review, use or distribution is strictly prohibited. If you have received this message in error, please advise the sender by reply email and delete the message.
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 17:17
As someone who has worked with hdfs-compatible distributed file systems
that support append, I can vouch for its extensive usage. I have seen how simple it becomes to create tar archives, and later append files to them, without writing special inefficient code to do so. I have seen it used in archiving cold data, reducing MR task launch overhead without having to use a different input format, so that the same code can be used for both hot and cold data. In addition, the small-files problem in HDFS forces people to write MR code, and causes rewrite of large datasets even if a small amount of data is added to it. So, there is clearly a need for it, AFAIK. +1 on fixing it. Please let me know if you need help. - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) On 3/21/12 5:36 AM, "Dave Shine" <[EMAIL PROTECTED]> wrote: >I am not a contributor to this project, so I don't know how much weight >my opinion carries. But I have been hoping to see append become stable >soon. We are constantly dealing with the "small file problem", and I >have written M/R jobs to periodically roll up lots of small files into a >few small ones. Having append would prevent me from needing to use up >cluster resources performing these tasks. > >Therefore, all things being equal I +1 making append work. However, if >the level of complexity is as bad as Eli implies below, then I can >understand that perhaps it is not worth the effort. If it will cause too >much technical debt, then removing it makes sense. But don't just remove >it because you don't believe there is a need for it. > >Thanks, >Dave Shine > > >-----Original Message----- >From: Eli Collins [mailto:[EMAIL PROTECTED]] >Sent: Tuesday, March 20, 2012 8:38 PM >To: [EMAIL PROTECTED] >Subject: [DISCUSS] Remove append? > >Hey gang, > >I'd like to get people's thoughts on the following proposal. I think we >should consider removing append from HDFS. > >Where we are today.. append was added in the 0.17-19 releases >(HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality >issues. It and sync were re-designed, re-implemented, and shipped in >21.0 (HDFS-265). To my knowledge, there has been no real production use. >Anecdotally people who worked on branch-20-append have told me they think >the new trunk code is substantially less well-tested than the >branch-20-append code (at least for sync, append was never well tested). >It has certainly gotten way less pounding from HBase users. >The design however, is much improved, and people think we can get hsync >(and append) stabilized in trunk (mostly testing and bug fixing). > >Rationale follows.. > >Append does not seem to be an important requirement, hflush was. There >has not been much demand for append, from users or downstream projects. >Because Hadoop 1.x does not have a working append implementation (see >HDFS-3120, the branch-20-append work was focused on sync not getting >append working) which is not enabled by default and downstream projects >will want to support Hadoop 1.x releases for years, most will not >introduce dependencies on append anyway. This is not to say demand does >not exist, just that if it does, it's been much smaller than security, >sync, HA, backwards compatbile RPC, etc. This probably explains why, over >5 years after the original implementation started, we don't have a stable >release with append. > >Append introduces non-trivial design and code complexity, which is not >worth the cost if we don't have real users. Removing append means we have >the property that HDFS blocks, when finalized, are immutable. >This significantly simplifies the design and code, which significantly >simplifies the implementation of other features like snapshots, >HDFS-level caching, dedupe, etc. > >The vast majority of the HDFS-265 effort is still leveraged w/o append.
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 17:32
Thanks for the feedback Milind, questions inline.
On Wed, Mar 21, 2012 at 10:17 AM, <[EMAIL PROTECTED]> wrote: > As someone who has worked with hdfs-compatible distributed file systems > that support append, I can vouch for its extensive usage. > > I have seen how simple it becomes to create tar archives, and later append > files to them, without writing special inefficient code to do so. > Why not just write new files and use Har files, because Har files are a pita? > I have seen it used in archiving cold data, reducing MR task launch > overhead without having to use a different input format, so that the same > code can be used for both hot and cold data. > Can you elaborate on the 1st one, how it's especially helpful for archival? I assume the 2nd one refers to not having to Multi*InputFormat. And the 3rd refers to appending to an old file instead of creating a new one. > In addition, the small-files problem in HDFS forces people to write MR > code, and causes rewrite of large datasets even if a small amount of data > is added to it. Do people rewrite large datasets today just to add 1mb? I haven't heard of that from big users (Yahoo!, FB, Twitter, eBay..) or my customer base. If so I'd would have expected people to put energy into getting append working in 1.x which know was has put energy into (I know some people feel the 20-based design is unworkable, I don't know it well enough to comment there). Thanks, Eli > > So, there is clearly a need for it, AFAIK. > > +1 on fixing it. Please let me know if you need help. > > - milind > > --- > Milind Bhandarkar > Greenplum Labs, EMC > (Disclaimer: Opinions expressed in this email are those of the author, and > do not necessarily represent the views of any organization, past or > present, the author might be affiliated with.) > > > > On 3/21/12 5:36 AM, "Dave Shine" <[EMAIL PROTECTED]> > wrote: > >>I am not a contributor to this project, so I don't know how much weight >>my opinion carries. But I have been hoping to see append become stable >>soon. We are constantly dealing with the "small file problem", and I >>have written M/R jobs to periodically roll up lots of small files into a >>few small ones. Having append would prevent me from needing to use up >>cluster resources performing these tasks. >> >>Therefore, all things being equal I +1 making append work. However, if >>the level of complexity is as bad as Eli implies below, then I can >>understand that perhaps it is not worth the effort. If it will cause too >>much technical debt, then removing it makes sense. But don't just remove >>it because you don't believe there is a need for it. >> >>Thanks, >>Dave Shine >> >> >>-----Original Message----- >>From: Eli Collins [mailto:[EMAIL PROTECTED]] >>Sent: Tuesday, March 20, 2012 8:38 PM >>To: [EMAIL PROTECTED] >>Subject: [DISCUSS] Remove append? >> >>Hey gang, >> >>I'd like to get people's thoughts on the following proposal. I think we >>should consider removing append from HDFS. >> >>Where we are today.. append was added in the 0.17-19 releases >>(HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality >>issues. It and sync were re-designed, re-implemented, and shipped in >>21.0 (HDFS-265). To my knowledge, there has been no real production use. >>Anecdotally people who worked on branch-20-append have told me they think >>the new trunk code is substantially less well-tested than the >>branch-20-append code (at least for sync, append was never well tested). >>It has certainly gotten way less pounding from HBase users. >>The design however, is much improved, and people think we can get hsync >>(and append) stabilized in trunk (mostly testing and bug fixing). >> >>Rationale follows.. >> >>Append does not seem to be an important requirement, hflush was. There >>has not been much demand for append, from users or downstream projects. >>Because Hadoop 1.x does not have a working append implementation (see >>HDFS-3120, the branch-20-append work was focused on sync not getting
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 17:33
On Wed, Mar 21, 2012 at 10:32 AM, Eli Collins <[EMAIL PROTECTED]> wrote:
> Thanks for the feedback Milind, questions inline. > > On Wed, Mar 21, 2012 at 10:17 AM, <[EMAIL PROTECTED]> wrote: >> As someone who has worked with hdfs-compatible distributed file systems >> that support append, I can vouch for its extensive usage. >> >> I have seen how simple it becomes to create tar archives, and later append >> files to them, without writing special inefficient code to do so. >> > > Why not just write new files and use Har files, because Har files are a pita? > >> I have seen it used in archiving cold data, reducing MR task launch >> overhead without having to use a different input format, so that the same >> code can be used for both hot and cold data. >> > > Can you elaborate on the 1st one, how it's especially helpful for archival? > > I assume the 2nd one refers to not having to Multi*InputFormat. And > the 3rd refers to appending to an old file instead of creating a new > one. > >> In addition, the small-files problem in HDFS forces people to write MR >> code, and causes rewrite of large datasets even if a small amount of data >> is added to it. > > Do people rewrite large datasets today just to add 1mb? I haven't > heard of that from big users (Yahoo!, FB, Twitter, eBay..) or my > customer base. If so I'd would have expected people to put energy > into getting append working in 1.x which know was has put energy into Arg, that should read "no one has put energy into". </drinks coffee>
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 17:47
Answers inline.
On 3/21/12 10:32 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > >Why not just write new files and use Har files, because Har files are a >pita? Yes, and har creation is an MR job, which is totally I/O bound, and yet takes up slots/containers, reducing cluster utilization. >Can you elaborate on the 1st one, how it's especially helpful for >archival? Say you have daily log files (consider many small job history files). Instead of keeping them as separate files, one appends them to a monthly files (this in itself is a complete rewrite), but appending monthly files to year-to-date files should not require rewrite (because after March, it becomes very inefficient.) Reducing number of files this way also makes it easy to copy, take snapshots etc without having to write special parallel code to do it. > >I assume the 2nd one refers to not having to Multi*InputFormat. And >the 3rd refers to appending to an old file instead of creating a new >one. Yes. > >> In addition, the small-files problem in HDFS forces people to write MR >> code, and causes rewrite of large datasets even if a small amount of >>data >> is added to it. > >Do people rewrite large datasets today just to add 1mb? I haven't >heard of that from big users (Yahoo!, FB, Twitter, eBay..) or my >customer base. If so I'd would have expected people to put energy >into getting append working in 1.x which know was has put energy into >(I know some people feel the 20-based design is unworkable, I don't >know it well enough to comment there). With HDFS, they do not rewrite large datasets just to add a small amount of data. Instead they create new files, and use a separate metadata-service (or just file numbering conventions) to make the added data part of the large dataset. But with other file systems, they just ">>". Thanks, - milind >--- >Milind Bhandarkar >Greenplum Labs, EMC >(Disclaimer: Opinions expressed in this email are those of the author, >and do not necessarily represent the views of any organization, past or >present, the author might be affiliated with.)
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 18:27
On Wed, Mar 21, 2012 at 10:47 AM, <[EMAIL PROTECTED]> wrote:
> Answers inline. > > On 3/21/12 10:32 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > >> >>Why not just write new files and use Har files, because Har files are a >>pita? > > Yes, and har creation is an MR job, which is totally I/O bound, and yet > takes up slots/containers, reducing cluster utilization. > >>Can you elaborate on the 1st one, how it's especially helpful for >>archival? > > Say you have daily log files (consider many small job history files). > Instead of keeping them as separate files, one appends them to a monthly > files (this in itself is a complete rewrite), but appending monthly files > to year-to-date files should not require rewrite (because after March, it > becomes very inefficient.) Why not just keep the original daily files instead of continually either rewriting (yuck) or duplicating (yuck) the data by aggregating them into rollups? I can think of two reasons: 1. If the daily files are smaller than 1 block (seems unlikely) 2. The small files problem (a typical NN can store 100-200M files, so a problem for big users) In which case maybe better to focus on #2 rather than work around it? Thanks, Eli > > Reducing number of files this way also makes it easy to copy, take > snapshots etc without having to write special parallel code to do it. > >> >>I assume the 2nd one refers to not having to Multi*InputFormat. And >>the 3rd refers to appending to an old file instead of creating a new >>one. > > Yes. > >> >>> In addition, the small-files problem in HDFS forces people to write MR >>> code, and causes rewrite of large datasets even if a small amount of >>>data >>> is added to it. > > >> >>Do people rewrite large datasets today just to add 1mb? I haven't >>heard of that from big users (Yahoo!, FB, Twitter, eBay..) or my >>customer base. If so I'd would have expected people to put energy >>into getting append working in 1.x which know was has put energy into >>(I know some people feel the 20-based design is unworkable, I don't >>know it well enough to comment there). > > With HDFS, they do not rewrite large datasets just to add a small amount > of data. Instead they create new files, and use a separate > metadata-service (or just file numbering conventions) to make the added > data part of the large dataset. But with other file systems, they just > ">>". > > Thanks, > > - milind > > >>--- >>Milind Bhandarkar >>Greenplum Labs, EMC >>(Disclaimer: Opinions expressed in this email are those of the author, >>and do not necessarily represent the views of any organization, past or >>present, the author might be affiliated with.) >
-
RE: [DISCUSS] Remove append?Tim Broberg 2012-03-21, 18:31
No specific advice on this particular issue, but in general, I learned the hard way to stop asking the question, "Feature X is hard to support, is anybody really going to use this?" *Every time* I have asked this question, I get the answer I want to hear. *Every time*, they come back and ask for the feature back later and it's more work than it would have been if I had just planned for it from the beginning.
YMMV, and I'm always asking marketing guys whereas you're asking developers. Ok, there's one piece of specific advice: Go find the people that will tell you what you don't want to hear. Ask hdfs-user's whether they need the feature rather than hdfs-dev's. We all have too much empathy for your position here to make you suffer. - Tim. -----Original Message----- From: Eli Collins [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 20, 2012 8:38 PM To: [EMAIL PROTECTED] Subject: [DISCUSS] Remove append? Hey gang, I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. Where we are today.. append was added in the 0.17-19 releases (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. It and sync were re-designed, re-implemented, and shipped in 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). Rationale follows.. Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part. GFS, which HDFS' design is based on, has append (and atomic record append) so obviously a workable design does not preclude append. However we also should not ape the GFS feature set simply because it exists. I've had conversations with people who worked on GFS that regret adding record append (see also http://queue.acm.org/detail.cfm?id=1594206). In short, unless append is a real priority for our users I think we should focus our energy elsewhere. Thanks, Eli The information contained in this email message is considered confidential and proprietary to the sender and is intended solely for review and use by the named recipient. Any unauthorized review, use or distribution is strictly prohibited. If you have received this message in error, please advise the sender by reply email and delete the message. The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 18:52
Good point. I thought I'd start with devs first. If you can't get it
past devs there's no reason to go further. Also, users will tell you they want everything. I'd like to root cause this, eg if they want append to solve the small files problem I'd like to know if solving the latter means we don't have to do the former. ps - fwiw the cdh-user@ mailing list has 800 people on it and it's rarely requested. Ditto in customer conversations. However the user base continues to grow rapidly and change in makeup so the past isn't necessarily a good predictor. Thanks, Eli On Wed, Mar 21, 2012 at 11:31 AM, Tim Broberg <[EMAIL PROTECTED]> wrote: > No specific advice on this particular issue, but in general, I learned the hard way to stop asking the question, "Feature X is hard to support, is anybody really going to use this?" *Every time* I have asked this question, I get the answer I want to hear. *Every time*, they come back and ask for the feature back later and it's more work than it would have been if I had just planned for it from the beginning. > > YMMV, and I'm always asking marketing guys whereas you're asking developers. > > Ok, there's one piece of specific advice: Go find the people that will tell you what you don't want to hear. Ask hdfs-user's whether they need the feature rather than hdfs-dev's. > > We all have too much empathy for your position here to make you suffer. > > - Tim. > > -----Original Message----- > From: Eli Collins [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 20, 2012 8:38 PM > To: [EMAIL PROTECTED] > Subject: [DISCUSS] Remove append? > > Hey gang, > > I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. > > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. It and sync were re-designed, re-implemented, and shipped in > 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. > The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). > > Rationale follows.. > > Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. > > Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. > This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. > > The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part. > > GFS, which HDFS' design is based on, has append (and atomic record > append) so obviously a workable design does not preclude append. > However we also should not ape the GFS feature set simply because it exists. I've had conversations with people who worked on GFS that regret adding record append (see also http://queue.acm.org/detail.cfm?id=1594206). In short, unless append is a real priority for our users I think we should focus our energy elsewhere.
-
RE: [DISCUSS] Remove append?Dave Shine 2012-03-21, 19:07
I never brought it up on the CDH list because I was told during my CDH training (Dec 2010) that is was already there. When I later learned it was usable only for HBase, I just assumed it would be coming, eventually.
Dave -----Original Message----- From: Eli Collins [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 21, 2012 2:52 PM To: [EMAIL PROTECTED] Subject: Re: [DISCUSS] Remove append? Good point. I thought I'd start with devs first. If you can't get it past devs there's no reason to go further. Also, users will tell you they want everything. I'd like to root cause this, eg if they want append to solve the small files problem I'd like to know if solving the latter means we don't have to do the former. ps - fwiw the cdh-user@ mailing list has 800 people on it and it's rarely requested. Ditto in customer conversations. However the user base continues to grow rapidly and change in makeup so the past isn't necessarily a good predictor. Thanks, Eli On Wed, Mar 21, 2012 at 11:31 AM, Tim Broberg <[EMAIL PROTECTED]> wrote: > No specific advice on this particular issue, but in general, I learned the hard way to stop asking the question, "Feature X is hard to support, is anybody really going to use this?" *Every time* I have asked this question, I get the answer I want to hear. *Every time*, they come back and ask for the feature back later and it's more work than it would have been if I had just planned for it from the beginning. > > YMMV, and I'm always asking marketing guys whereas you're asking developers. > > Ok, there's one piece of specific advice: Go find the people that will tell you what you don't want to hear. Ask hdfs-user's whether they need the feature rather than hdfs-dev's. > > We all have too much empathy for your position here to make you suffer. > > - Tim. > > -----Original Message----- > From: Eli Collins [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 20, 2012 8:38 PM > To: [EMAIL PROTECTED] > Subject: [DISCUSS] Remove append? > > Hey gang, > > I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. > > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality > issues. It and sync were re-designed, re-implemented, and shipped in > 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. > The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). > > Rationale follows.. > > Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. > > Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. > This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. > > The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part.
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 19:30
>1. If the daily files are smaller than 1 block (seems unlikely) Even at a large hdfs installation, the avg file size was < 1.5 blocks. Bucketing causes the file sizes to drop. >2. The small files problem (a typical NN can store 100-200M files, so >a problem for big users) Big users probably have enough people to write their own roll-up code to avoid small-files problem. Its the rest that are used to storage systems handling billions of files. - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) > >In which case maybe better to focus on #2 rather than work around it? > >Thanks, >Eli > >> >> Reducing number of files this way also makes it easy to copy, take >> snapshots etc without having to write special parallel code to do it. >> >>> >>>I assume the 2nd one refers to not having to Multi*InputFormat. And >>>the 3rd refers to appending to an old file instead of creating a new >>>one. >> >> Yes. >> >>> >>>> In addition, the small-files problem in HDFS forces people to write MR >>>> code, and causes rewrite of large datasets even if a small amount of >>>>data >>>> is added to it. >> >> >>> >>>Do people rewrite large datasets today just to add 1mb? I haven't >>>heard of that from big users (Yahoo!, FB, Twitter, eBay..) or my >>>customer base. If so I'd would have expected people to put energy >>>into getting append working in 1.x which know was has put energy into >>>(I know some people feel the 20-based design is unworkable, I don't >>>know it well enough to comment there). >> >> With HDFS, they do not rewrite large datasets just to add a small amount >> of data. Instead they create new files, and use a separate >> metadata-service (or just file numbering conventions) to make the added >> data part of the large dataset. But with other file systems, they just >> ">>". >> >> Thanks, >> >> - milind >> >> >>>--- >>>Milind Bhandarkar >>>Greenplum Labs, EMC >>>(Disclaimer: Opinions expressed in this email are those of the author, >>>and do not necessarily represent the views of any organization, past or >>>present, the author might be affiliated with.) >> >
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 19:48
Eli,
To clarify a little bit, I think HDFS-3120 is the right thing to do, to disable appends, while still enabling hsync in branch-1. But, going forward, (say 0.23+) having appends working correctly will definitely add value, and make HDFS more palatable for lots of other workloads. Of course, I have a vested interest in this, because our team is working on a project that requires append and truncate, and we will be testing it thoroughly at scale in Q2 this year. Would it be okay to wait for the results of this testing ? Thanks, - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.)
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 20:24
I would also like to point to work being done on PLFS-HDFS:
http://institute.lanl.gov/isti/irhpit/presentations/PLFS-HDFS.pdf This would be made much simpler by allowing appends. Checkpointing in MPI is a very common use-case, and after Hamster, PLFS-HDFS becomes an attractive way to do this. (Section 2 of the 2009 HotCloud paper by PDL: http://www.cs.cmu.edu/~svp/2009hotcloud-tablefs.pdf discusses the reasons for seeking commonalities between HPC and DISC file systems.) - Milind On 3/21/12 12:48 PM, "Bhandarkar, Milind" <[EMAIL PROTECTED]> wrote: >Eli, > >To clarify a little bit, I think HDFS-3120 is the right thing to do, to >disable appends, while still enabling hsync in branch-1. > >But, going forward, (say 0.23+) having appends working correctly will >definitely add value, and make HDFS more palatable for lots of other >workloads. > >Of course, I have a vested interest in this, because our team is working >on a project that requires append and truncate, and we will be testing it >thoroughly at scale in Q2 this year. Would it be okay to wait for the >results of this testing ? > >Thanks, > >- milind > >--- >Milind Bhandarkar >Greenplum Labs, EMC >(Disclaimer: Opinions expressed in this email are those of the author, and >do not necessarily represent the views of any organization, past or >present, the author might be affiliated with.) > >
-
Re: [DISCUSS] Remove append?Tsz Wo Sze 2012-03-21, 20:31
Some of the information in the email is not correct. Let me clarify them. > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) . . . We never have append/sync in 0.17. Sync was added to 0.18 but not append. Append was added to 0.19. By append/sync above, I mean the implementation by HADOOP-1700. We also have HDFS-265, the new append/hflush. Below are the details. Versions Features <= 0.17: �� no sync/append 0.18: 1700 sync 0.19.0: �� 1700 append 0.19.1, 0.20: 1700 append disabled 0.20-append:append branch used by facebook 0.20.205.0: merged 1700 append to 0.20 >= 0.21: �� 265 append/hflush > . . . To my knowledge, there has been no real production use. . . The reason of no production use today is simply that append is not yet in a stable release. Besides, it does not mean append is not useful. > . . . The design however, is much improved, and people think we can get > hsync (and append) stabilized in trunk (mostly testing and bug fixing). hsync is not yet implemented. I think you may mean hflush. > . . . This probably explains why, over 5 years after the original implementation > started, we don't have a stable release with append. HADOOP-1700 was committed on July 25, 2008. I don’t know how it could be “over 5 years”. It is well known that append from 0.20.x releases is not stable and hence probably not used. It is not the case that we don’t have a stable release because append is not stable. > Append introduces non-trivial design and code complexity, which is not > worth the cost if we don't have real users. . . . I don’t agree. The non-trivial design and code complexity come from hflush but not append. Once we have hflush, append is straightforward. Roughly speaking, the append work is about 10% of the entire append/hflush work. Moreover, there are real users/use cases as mentioned by Dave and Milind. The jira that you have created to split the flag into hflush supported and append supported is a good idea. Folks who do not need append, but need hflush, can still disable append. Regards, Nicholas ________________________________ From: Eli Collins <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, March 20, 2012 5:37 PM Subject: [DISCUSS] Remove append? Hey gang, I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. Where we are today.. append was added in the 0.17-19 releases (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. It and sync were re-designed, re-implemented, and shipped in 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). Rationale follows.. Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part. GFS, which HDFS' design is based on, has append (and atomic record append) so obviously a workable design does not preclude append. However we also should not ape the GFS feature set simply because it exists. I've had conversations with people who worked on GFS that regret adding record append (see also http://queue.acm.org/detail.cfm?id=1594206). In short, unless append is a real priority for our users I think we should focus our energy elsewhere. Thanks, Eli
-
Re: [DISCUSS] Remove append?Sanjay Radia 2012-03-21, 20:57
On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <[EMAIL PROTECTED]> wrote:
> > > Append introduces non-trivial design and code complexity, which is not > worth the cost if we don't have real users. The bulk of the complexity of HDFS-265 ("the new Append") was around Hflush, concurrent readers, the pipeline etc. The code and complexity for appending to previously closed file was not that large. > Removing append means we > have the property that HDFS blocks, when finalized, are immutable. > This significantly simplifies the design and code, which significantly > simplifies the implementation of other features like snapshots, > HDFS-level caching, dedupe, etc. > While Snapshots are challenging with Append, it is solvable - the snapshot needs to remember the length of the file. (We have a working prototype - we will posting the design and the code soon). I agree that the notion of an immutable file is useful since it lets the system and tools optimize certain things. A xerox-parc file system in the 80s had this feature that the system exploited. I would support adding the notion of an immutable file to Hadoop. sanjay
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 20:58
On Wed, Mar 21, 2012 at 1:31 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> > Some of the information in the email is not correct. Let me clarify them. > >> Where we are today.. append was added in the 0.17-19 > releases >> (HADOOP-1700) . . . > > We never have append/sync in 0.17. Sync was added to 0.18 but not append. Append was added to 0.19. By append/sync above, I mean the > implementation by HADOOP-1700. We also > have HDFS-265, the new append/hflush. Below are the details. > > Versions Features > <= 0.17: no sync/append > 0.18: 1700 > sync > 0.19.0: 1700 > append > 0.19.1, 0.20: 1700 append disabled > 0.20-append:append branch used by facebook > 0.20.205.0: merged 1700 append to 0.20 >>= 0.21: 265 append/hflush > Thanks for fleshing out the specifics, I put "17-19" to indicate that parts went in over a series of releases. >> . . . To my knowledge, there has > been no real production use. . . > > The reason of no production use today > is simply that append is not yet in a stable release. Besides, it does not mean append is not > useful. > Agree, not saying it isn't useful. "usefulness" is necessary but not sufficient. There are plenty of useful things we may not want to put in HDFS. >> . . . The design however, is much > improved, and people think we can get >> hsync (and append) stabilized in > trunk (mostly testing and bug fixing). > > hsync is not yet implemented. I think you may mean hflush. > Yup, good catch, I meant hflush. (For those following along hsync is implemented, just not according to the design since today it just calls hflush). >> . . . This probably explains why, > over 5 years after the original implementation >> started, we don't have a stable > release with append. > > HADOOP-1700 was committed on July 25, > 2008. I don’t know how it could be “over > 5 years”. It is well known that append > from 0.20.x releases is not stable and hence probably not used. It is not the case that we don’t have a > stable release because append is not stable. > >> Append introduces non-trivial > design and code complexity, which is not >> worth the cost if we don't have > real users. . . . > > I don’t agree. The non-trivial design and code complexity > come from hflush but not append. Once we > have hflush, append is straightforward. Roughly speaking, the append work is about 10% of the entire > append/hflush work. Do you think having the invariant that blocks are not mutated would significantly simply the design? Thanks, Eli > > Moreover, there are real users/use > cases as mentioned by Dave and Milind. > > The jira that you have created to split > the flag into hflush supported and append supported is a good idea. Folks who > do not need append, but need hflush, can still disable append. > > Regards, > Nicholas > > > > ________________________________ > From: Eli Collins <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tuesday, March 20, 2012 5:37 PM > Subject: [DISCUSS] Remove append? > > Hey gang, > > I'd like to get people's thoughts on the following proposal. I think > we should consider removing append from HDFS. > > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality > issues. It and sync were re-designed, re-implemented, and shipped in > 21.0 (HDFS-265). To my knowledge, there has been no real production > use. Anecdotally people who worked on branch-20-append have told me > they think the new trunk code is substantially less well-tested than > the branch-20-append code (at least for sync, append was never well > tested). It has certainly gotten way less pounding from HBase users. > The design however, is much improved, and people think we can get > hsync (and append) stabilized in trunk (mostly testing and bug > fixing). > > Rationale follows.. > > Append does not seem to be an important requirement, hflush was. There > has not been much demand for append, from users or downstream
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 21:08
On Wed, Mar 21, 2012 at 1:57 PM, Sanjay Radia <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > >> >> >> Append introduces non-trivial design and code complexity, which is not >> worth the cost if we don't have real users. > > The bulk of the complexity of HDFS-265 ("the new Append") was around > Hflush, concurrent readers, the pipeline etc. The code and complexity for > appending to previously closed file was not that large. > And we'd still leverage that work. Which is not to say that append isn't complicated. There were a fair number of append bugs that were found in branch-20-append that we think are present in the new append implementation (not sure if there are jiras for all of them). Also, append + truncate removes the current invariant that we maintain eg around visible length. So append opens the doors for lots of additional complexity. We could decide to keep append but not add truncate but I suspect that will be hard because once you open up the doors to a lot of new use cases it's hard to close them. The larger issue is how simple we'd like to keep HDFS, how many use cases we'd like to grow it to. > > >> Removing append means we >> have the property that HDFS blocks, when finalized, are immutable. >> This significantly simplifies the design and code, which significantly >> simplifies the implementation of other features like snapshots, >> HDFS-level caching, dedupe, etc. >> > > While Snapshots are challenging with Append, it is solvable - the snapshot > needs to remember the length of the file. (We have a working prototype - we > will posting the design and the code soon). > Will check it out. When I read "Snapshots in Hadoop Distributed File System" it looked like the bulk of the complexity was due to the protocol for append: http://www.cs.berkeley.edu/~sameerag/hdfs-snapshots.pdf > > I agree that the notion of an immutable file is useful since it lets the > system and tools optimize certain things. A xerox-parc file system in the > 80s had this feature that the system exploited. I would support adding the > notion of an immutable file to Hadoop. > Good point, we could leverage this property on a per-file, rather than per-filesystem basis. Thanks, Eli
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 21:09
On Wed, Mar 21, 2012 at 12:48 PM, <[EMAIL PROTECTED]> wrote:
> Eli, > > To clarify a little bit, I think HDFS-3120 is the right thing to do, to > disable appends, while still enabling hsync in branch-1. > > But, going forward, (say 0.23+) having appends working correctly will > definitely add value, and make HDFS more palatable for lots of other > workloads. > > Of course, I have a vested interest in this, because our team is working > on a project that requires append and truncate, and we will be testing it > thoroughly at scale in Q2 this year. Would it be okay to wait for the > results of this testing ? Absolutely, I'd like to learn more about what append/truncate buys us. Thanks, Eli > > Thanks, > > - milind > > --- > Milind Bhandarkar > Greenplum Labs, EMC > (Disclaimer: Opinions expressed in this email are those of the author, and > do not necessarily represent the views of any organization, past or > present, the author might be affiliated with.) >
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 21:14
On Wed, Mar 21, 2012 at 12:30 PM, <[EMAIL PROTECTED]> wrote:
> >>1. If the daily files are smaller than 1 block (seems unlikely) > > Even at a large hdfs installation, the avg file size was < 1.5 blocks. > Bucketing causes the file sizes to drop. > >>2. The small files problem (a typical NN can store 100-200M files, so >>a problem for big users) > > Big users probably have enough people to write their own roll-up code to > avoid small-files problem. Its the rest that are used to storage systems > handling billions of files. > HDFS does as well, you can federate NNs to support billions of files. There's no fundamental max # files limitation in the design or latest implementation. I suspect we could support another 2x # files and # blocks per NN if we wanted by being more clever in how we store MD. One of the reason HDFS scales better (and is less buggy) than these other systems is because it's design is simpler, eg maintaining all MD in memory vs paging it. We don't want to lose these properties in the bargain. Thanks, Eli
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 22:06
> >Absolutely, I'd like to learn more about what append/truncate buys us. Indeed. Lets postpone this discussion to Q2 then. Thanks, - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) >
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 22:16
On Wed, Mar 21, 2012 at 3:06 PM, <[EMAIL PROTECTED]> wrote:
> >> >>Absolutely, I'd like to learn more about what append/truncate buys us. > > Indeed. Lets postpone this discussion to Q2 then. > I'd still like to hear what other people think if they haven't chimed in. Even if we decide to remove it, I don't think we need to do so next week, eg can wait to hear more about what you're working on. One of the reasons I raised this topic now is that it the not too distant future 0.23 will become the stable release, and we'll effectively lose the ability to remove append once we're stable. Not that I expect people will stabilize append before this happens, it doesn't seem to be a priority for anyone, though perhaps you'll end up doing that work for your project. Thanks, Eli > Thanks, > > - milind > > --- > Milind Bhandarkar > Greenplum Labs, EMC > (Disclaimer: Opinions expressed in this email are those of the author, and > do not necessarily represent the views of any organization, past or > present, the author might be affiliated with.) > > >> >
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-21, 22:48
Eli,
If HDFS-3120 is committed to both 1.x and trunk/0.23.x, then one will be able to disable appends (while keeping hflush) using different config variables. By default (I.e. In hdfs-default.xlm), we should set dfs.support.append to false, and dfs.support.hsync to true. That way, we get enough time to fix append, and if we decide to remove it, then we can do that without causing major distress in 0.24. Thoughts ? - Milind On 3/21/12 3:16 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >On Wed, Mar 21, 2012 at 3:06 PM, <[EMAIL PROTECTED]> wrote: >> >>> >>>Absolutely, I'd like to learn more about what append/truncate buys us. >> >> Indeed. Lets postpone this discussion to Q2 then. >> > >I'd still like to hear what other people think if they haven't chimed >in. Even if we decide to remove it, I don't think we need to do so >next week, eg can wait to hear more about what you're working on. > >One of the reasons I raised this topic now is that it the not too >distant future 0.23 will become the stable release, and we'll >effectively lose the ability to remove append once we're stable. Not >that I expect people will stabilize append before this happens, it >doesn't seem to be a priority for anyone, though perhaps you'll end up >doing that work for your project. > >Thanks, >Eli > >> Thanks, >> >> - milind >> >> --- >> Milind Bhandarkar >> Greenplum Labs, EMC >> (Disclaimer: Opinions expressed in this email are those of the author, >>and >> do not necessarily represent the views of any organization, past or >> present, the author might be affiliated with.) >> >> >>> >> >
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-21, 23:30
On Wed, Mar 21, 2012 at 3:48 PM, <[EMAIL PROTECTED]> wrote:
> Eli, > > If HDFS-3120 is committed to both 1.x and trunk/0.23.x, then one will be > able to disable appends (while keeping hflush) using different config > variables. By default (I.e. In hdfs-default.xlm), we should set > dfs.support.append to false, and dfs.support.hsync to true. > Agree, thanks for the thoughts. > That way, we get enough time to fix append, and if we decide to remove it, > then we can do that without causing major distress in 0.24. > > Thoughts ? > Sounds good. Thanks, Eli > - Milind > > On 3/21/12 3:16 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > >>On Wed, Mar 21, 2012 at 3:06 PM, <[EMAIL PROTECTED]> wrote: >>> >>>> >>>>Absolutely, I'd like to learn more about what append/truncate buys us. >>> >>> Indeed. Lets postpone this discussion to Q2 then. >>> >> >>I'd still like to hear what other people think if they haven't chimed >>in. Even if we decide to remove it, I don't think we need to do so >>next week, eg can wait to hear more about what you're working on. >> >>One of the reasons I raised this topic now is that it the not too >>distant future 0.23 will become the stable release, and we'll >>effectively lose the ability to remove append once we're stable. Not >>that I expect people will stabilize append before this happens, it >>doesn't seem to be a priority for anyone, though perhaps you'll end up >>doing that work for your project. >> >>Thanks, >>Eli >> >>> Thanks, >>> >>> - milind >>> >>> --- >>> Milind Bhandarkar >>> Greenplum Labs, EMC >>> (Disclaimer: Opinions expressed in this email are those of the author, >>>and >>> do not necessarily represent the views of any organization, past or >>> present, the author might be affiliated with.) >>> >>> >>>> >>> >> >
-
Re: [DISCUSS] Remove append?Konstantin Shvachko 2012-03-22, 08:11
Hi Dave,
Your opinion is very much appreciated. Thanks, --Konstantin On Wed, Mar 21, 2012 at 5:36 AM, Dave Shine <[EMAIL PROTECTED]> wrote: > I am not a contributor to this project, so I don't know how much weight my opinion carries. But I have been hoping to see append become stable soon. We are constantly dealing with the "small file problem", and I have written M/R jobs to periodically roll up lots of small files into a few small ones. Having append would prevent me from needing to use up cluster resources performing these tasks. > > Therefore, all things being equal I +1 making append work. However, if the level of complexity is as bad as Eli implies below, then I can understand that perhaps it is not worth the effort. If it will cause too much technical debt, then removing it makes sense. But don't just remove it because you don't believe there is a need for it. > > Thanks, > Dave Shine > > > -----Original Message----- > From: Eli Collins [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 20, 2012 8:38 PM > To: [EMAIL PROTECTED] > Subject: [DISCUSS] Remove append? > > Hey gang, > > I'd like to get people's thoughts on the following proposal. I think we should consider removing append from HDFS. > > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality issues. It and sync were re-designed, re-implemented, and shipped in > 21.0 (HDFS-265). To my knowledge, there has been no real production use. Anecdotally people who worked on branch-20-append have told me they think the new trunk code is substantially less well-tested than the branch-20-append code (at least for sync, append was never well tested). It has certainly gotten way less pounding from HBase users. > The design however, is much improved, and people think we can get hsync (and append) stabilized in trunk (mostly testing and bug fixing). > > Rationale follows.. > > Append does not seem to be an important requirement, hflush was. There has not been much demand for append, from users or downstream projects. Because Hadoop 1.x does not have a working append implementation (see HDFS-3120, the branch-20-append work was focused on sync not getting append working) which is not enabled by default and downstream projects will want to support Hadoop 1.x releases for years, most will not introduce dependencies on append anyway. This is not to say demand does not exist, just that if it does, it's been much smaller than security, sync, HA, backwards compatbile RPC, etc. This probably explains why, over 5 years after the original implementation started, we don't have a stable release with append. > > Append introduces non-trivial design and code complexity, which is not worth the cost if we don't have real users. Removing append means we have the property that HDFS blocks, when finalized, are immutable. > This significantly simplifies the design and code, which significantly simplifies the implementation of other features like snapshots, HDFS-level caching, dedupe, etc. > > The vast majority of the HDFS-265 effort is still leveraged w/o append. The new data durability and read consistency behavior was the key part. > > GFS, which HDFS' design is based on, has append (and atomic record > append) so obviously a workable design does not preclude append. > However we also should not ape the GFS feature set simply because it exists. I've had conversations with people who worked on GFS that regret adding record append (see also http://queue.acm.org/detail.cfm?id=1594206). In short, unless append is a real priority for our users I think we should focus our energy elsewhere. > > Thanks, > Eli > > The information contained in this email message is considered confidential and proprietary to the sender and is intended solely for review and use by the named recipient. Any unauthorized review, use or distribution is strictly prohibited. If you have received this message in error, please advise the sender by reply email and delete the message.
-
Re: [DISCUSS] Remove append?Konstantin Shvachko 2012-03-22, 08:26
Eli,
I went over the entire discussion on the topic, and did not get it. Is there a problem with append? We know it does not work in hadoop-1, only flush() does. Is there anything wrong with the new append (HDFS-265)? If so please file a bug. I tested it in Hadoop-0.22 branch it works fine. I agree with people who were involved with the implementation of the new append that the complexity is mainly in 1. pipeline recovery 2. consistent client reading while writing, and 3. hflush() Once it is done the append itself, which is reopening of previously closed files for adding data, is not complex. You mentioned it and I agree you indeed should be more involved with your customer base. As for eBay, append was of the motivations to work on stabilizing 0.22 branch. And there is a lot of use cases which require append for our customers. Some of them were mentioned in this discussion. Thanks, --Konstantin On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > Hey gang, > > I'd like to get people's thoughts on the following proposal. I think > we should consider removing append from HDFS. > > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality > issues. It and sync were re-designed, re-implemented, and shipped in > 21.0 (HDFS-265). To my knowledge, there has been no real production > use. Anecdotally people who worked on branch-20-append have told me > they think the new trunk code is substantially less well-tested than > the branch-20-append code (at least for sync, append was never well > tested). It has certainly gotten way less pounding from HBase users. > The design however, is much improved, and people think we can get > hsync (and append) stabilized in trunk (mostly testing and bug > fixing). > > Rationale follows.. > > Append does not seem to be an important requirement, hflush was. There > has not been much demand for append, from users or downstream > projects. Because Hadoop 1.x does not have a working append > implementation (see HDFS-3120, the branch-20-append work was focused > on sync not getting append working) which is not enabled by default > and downstream projects will want to support Hadoop 1.x releases for > years, most will not introduce dependencies on append anyway. This is > not to say demand does not exist, just that if it does, it's been much > smaller than security, sync, HA, backwards compatbile RPC, etc. This > probably explains why, over 5 years after the original implementation > started, we don't have a stable release with append. > > Append introduces non-trivial design and code complexity, which is not > worth the cost if we don't have real users. Removing append means we > have the property that HDFS blocks, when finalized, are immutable. > This significantly simplifies the design and code, which significantly > simplifies the implementation of other features like snapshots, > HDFS-level caching, dedupe, etc. > > The vast majority of the HDFS-265 effort is still leveraged w/o > append. The new data durability and read consistency behavior was the > key part. > > GFS, which HDFS' design is based on, has append (and atomic record > append) so obviously a workable design does not preclude append. > However we also should not ape the GFS feature set simply because it > exists. I've had conversations with people who worked on GFS that > regret adding record append (see also > http://queue.acm.org/detail.cfm?id=1594206). In short, unless append > is a real priority for our users I think we should focus our energy > elsewhere. > > Thanks, > Eli
-
Re: [DISCUSS] Remove append?Daryn Sharp 2012-03-22, 17:15
On Mar 20, 2012, at 7:37 PM, Eli Collins wrote:
> Hey gang, > > I'd like to get people's thoughts on the following proposal. I think > we should consider removing append from HDFS. > > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality > issues. It and sync were re-designed, re-implemented, and shipped in > 21.0 (HDFS-265). To my knowledge, there has been no real production > use. Anecdotally people who worked on branch-20-append have told me > they think the new trunk code is substantially less well-tested than > the branch-20-append code (at least for sync, append was never well > tested). It has certainly gotten way less pounding from HBase users. > The design however, is much improved, and people think we can get > hsync (and append) stabilized in trunk (mostly testing and bug > fixing). Up front: I think append is a needed feature. Politely speaking, I think the premise of the question is a bit dubious due to circular nature. Ie. It's not used in production so is it worth it? The stigma/perception that append has been unstable and is not well-tested is a compelling reason to not be in production at major installations. The situation is going to be akin to "You go first. No, you go first! No way, you go first!". Downstream projects also aren't going to use something until it's stable, so they either work around the limitation, or... they chose something other hdfs. There's also the unanswerable question of how potential users have been silently lost. We are unlikely to have heard the user demand from those that chose another solution. Generally for every complaint/request, a large N-many people didn't even bother. I envision a day where hdfs is a performant posix filesystem. Dropping append sets us back from that goal. Admittedly, I don't know all the intricacies of how append was implemented and why it is/was difficult. Is the complexity maybe due to "bolting" append onto code that wasn't designed with mutability in mind? (That's truly a question, not a statement) If so, perhaps a refactoring would simplify the code? Dropping append also might be used as a cudgel against hdfs. Cynically speaking, do we want to risk marketeers from certain competitors to say or imply: Trust your data with us because we're so brilliant that we have a feature hdfs has repeatedly tried and failed to implement! Daryn
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-22, 17:25
On Thu, Mar 22, 2012 at 1:26 AM, Konstantin Shvachko
<[EMAIL PROTECTED]> wrote: > Eli, > > I went over the entire discussion on the topic, and did not get it. Is > there a problem with append? We know it does not work in hadoop-1, > only flush() does. Is there anything wrong with the new append > (HDFS-265)? If so please file a bug. > I tested it in Hadoop-0.22 branch it works fine. > > I agree with people who were involved with the implementation of the > new append that the complexity is mainly in > 1. pipeline recovery > 2. consistent client reading while writing, and > 3. hflush() > Once it is done the append itself, which is reopening of previously > closed files for adding data, is not complex. > I agree that much of the complexity is in #1-3 above, which is why HDFS-265 is leveraged. The primary simplicity of not having append (and truncate) comes from not leveraging the invariant that finalized blocks are immutable, that blocks once written won't eg shrink in size (which we assume today). > You mentioned it and I agree you indeed should be more involved with > your customer base. As for eBay, append was of the motivations to work > on stabilizing 0.22 branch. And there is a lot of use cases which > require append for our customers. > Some of them were mentioned in this discussion. > >From what I've seen 0.22 isn't ready for production use. Aside from not supporting critical features like security, it doesn't have a size-able user-base behind it testing and fixing bugs, etc. All things I'd imagine an org like eBay would want. I've never gotten a request to support 0.22 from a customer. Thanks, Eli
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-22, 17:47
On Thu, Mar 22, 2012 at 10:15 AM, Daryn Sharp <[EMAIL PROTECTED]> wrote:
> On Mar 20, 2012, at 7:37 PM, Eli Collins wrote: >> Hey gang, >> >> I'd like to get people's thoughts on the following proposal. I think >> we should consider removing append from HDFS. >> >> Where we are today.. append was added in the 0.17-19 releases >> (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality >> issues. It and sync were re-designed, re-implemented, and shipped in >> 21.0 (HDFS-265). To my knowledge, there has been no real production >> use. Anecdotally people who worked on branch-20-append have told me >> they think the new trunk code is substantially less well-tested than >> the branch-20-append code (at least for sync, append was never well >> tested). It has certainly gotten way less pounding from HBase users. >> The design however, is much improved, and people think we can get >> hsync (and append) stabilized in trunk (mostly testing and bug >> fixing). > > Up front: I think append is a needed feature. > Can you elaborate.. eg are there particular use cases at Yahoo! that have been running for years that are itching to start using append when 0.23 is deployed? Are you guys testing the new append implementation extensively because you have an app that's ready to use it when 0.23 is deployed? So far Milind has been the only one to chime in saying "we really need append, here's why". Which is great. > Politely speaking, I think the premise of the question is a bit dubious due to circular nature. Ie. It's not used in production so is it worth it? No, I'm saying we've absorbed a lot of complexity for it, but I don't see downstream projects using it any time soon. Similarly there hasn't been a big push to get it working, eg there was a push for security and hbase support on 20, but not append (the append rewrite was invasive but so was security). It will have been several years from the time the rewrite was started until it gets deployed in production, which makes me think it's less of a priority. So much so that I wondered whether it was a big priority at all. > The stigma/perception that append has been unstable and is not well-tested is a compelling reason to not be in production at major installations. The situation is going to be akin to "You go first. No, you go first! No way, you go first!". > > Downstream projects also aren't going to use something until it's stable, so they either work around the limitation, or... they chose something other hdfs. There's also the unanswerable question of how potential users have been silently lost. We are unlikely to have heard the user demand from those that chose another solution. Generally for every complaint/request, a large N-many people didn't even bother. > > I envision a day where hdfs is a performant posix filesystem. I think that's unlikely. Posix compliance is a non-goal for HDFS. We are intentionally not-compliant in many cases to achieve scale and performance. Check out the Ceph file system paper (they manage the tradeoff between Posix and scale/performance explicitly). The primary motivation for Posix compliance is compatibility with existing Unix-like software. That's not HDFS' raison d'etre (which is the ecosystem of projects that run atop HDFS: MR, HBase, Pig, Hive, Flume, Sqoop, etc etc). HDFS' focus on it's core use case and simplicity is one of the reasons it's been as successful as it has been. That's not to say we don't need to do a lot more work to better integrate HDFS with existing software and tools. Fuse-DFS is just a start. We need to support a standard interface like NFS etc. These efforts do not require HDFS become fully Posix compliant. There's always a trade off between adding more features (increasing the size of the addressable market) and focusing on your core uses cases, quality, etc. In my mind append is on the boundary. I'm happy to be convinced that append is in HDFS' wheel house. Thanks, Eli
-
Re: [DISCUSS] Remove append?Milind.Bhandarkar@... 2012-03-22, 23:27
Eli,
I think by "current definition of visible length", you mean that once a client opens a file and gets block list, it will always be able to read up to the length at open. However, correct me if I am wrong, but this definition is already violated, if file is deleted after open. So, truncate does add some complexity, but not a whole lot. If client gets an EOF before length at open, it must retry to see if the new visible length is different (rather than to see if the file does not exist anymore). Right ? - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) On 3/22/12 4:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >On Thu, Mar 22, 2012 at 3:57 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>> Do you think having the invariant that blocks are not mutated would >>> significantly simply the design? >> >> No. As mentioned in my previous email and others, the complexity is in >>hflush. Once we have hflush, append is straightforward. > >I understand that append is a small delta once you have hflush, what >I'm saying is that the overall design of the file system is >significantly simplified if you can assume blocks are not mutated. Eg >see the way truncate is going to interact with the current definition >of visible length (it violates it). Resolving issues like that are >non-trivial. > >Thanks, >Eli >
-
Re: [DISCUSS] Remove append?Tsz Wo Sze 2012-03-23, 00:03
@Eli, Removing a feature would simplify the design and code. I think this is a generally true statement but not specific to Append. The question is whether Append is useless and it should be removed? I think it is clear from this email thread that the answer is no.
@Milind, I agree with you. BTW, we are proposing truncate on closed file. So it is nothing to do with visible length. Regards, Nichlas ----- Original Message ----- From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: Sent: Thursday, March 22, 2012 4:27 PM Subject: Re: [DISCUSS] Remove append? Eli, I think by "current definition of visible length", you mean that once a client opens a file and gets block list, it will always be able to read up to the length at open. However, correct me if I am wrong, but this definition is already violated, if file is deleted after open. So, truncate does add some complexity, but not a whole lot. If client gets an EOF before length at open, it must retry to see if the new visible length is different (rather than to see if the file does not exist anymore). Right ? - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) On 3/22/12 4:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >On Thu, Mar 22, 2012 at 3:57 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>> Do you think having the invariant that blocks are not mutated would >>> significantly simply the design? >> >> No. As mentioned in my previous email and others, the complexity is in >>hflush. Once we have hflush, append is straightforward. > >I understand that append is a small delta once you have hflush, what >I'm saying is that the overall design of the file system is >significantly simplified if you can assume blocks are not mutated. Eg >see the way truncate is going to interact with the current definition >of visible length (it violates it). Resolving issues like that are >non-trivial. > >Thanks, >Eli >
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-23, 00:41
On Thu, Mar 22, 2012 at 4:27 PM, <[EMAIL PROTECTED]> wrote:
> Eli, > > I think by "current definition of visible length", you mean that once a > client opens a file and gets block list, it will always be able to read up > to the length at open. > I was thinking of the definition from the design doc. See my last comment on HDFS-2288, part of the confusion is that we're using the same name for two different things. > However, correct me if I am wrong, but this definition is already > violated, if file is deleted after open. I think you're right. > So, truncate does add some complexity, but not a whole lot. If client gets > an EOF before length at open, it must retry to see if the new visible > length is different (rather than to see if the file does not exist > anymore). > > Right ? > Makes sense. I was thinking you were talking about truncate on open files, which be harder. You can already truncate a file on open, you just can't choose the offset you want to truncate at (the NN implements this by deleting the file). Thanks, Eli > > --- > Milind Bhandarkar > Greenplum Labs, EMC > (Disclaimer: Opinions expressed in this email are those of the author, and > do not necessarily represent the views of any organization, past or > present, the author might be affiliated with.) > > > > On 3/22/12 4:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > >>On Thu, Mar 22, 2012 at 3:57 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>>> Do you think having the invariant that blocks are not mutated would >>>> significantly simply the design? >>> >>> No. As mentioned in my previous email and others, the complexity is in >>>hflush. Once we have hflush, append is straightforward. >> >>I understand that append is a small delta once you have hflush, what >>I'm saying is that the overall design of the file system is >>significantly simplified if you can assume blocks are not mutated. Eg >>see the way truncate is going to interact with the current definition >>of visible length (it violates it). Resolving issues like that are >>non-trivial. >> >>Thanks, >>Eli >> >
-
Re: [DISCUSS] Remove append?Eli Collins 2012-03-23, 00:49
On Thu, Mar 22, 2012 at 5:03 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
> @Eli, Removing a feature would simplify the design and code. I think this is a generally true statement but not specific to Append. The question is whether Append is useless and it should be removed? I think it is clear from this email thread that the answer is no. @Nicholas, no one is saying append is "useless and should be removed." The discussion is perhaps a little more subtle than you've understood it to be. If there are a lot of good use cases I'm all for it, I just don't see downstream projects using it any time soon (which is not to say they don't want it, just that they can't depend on something not in 1.x), and I haven't seen much demand. I wanted to hear from others if they had. When I brought it up with a room of hdfs developers from 3 different companies no one felt strongly. And so far only a handful of people have chimed in, I actually thought more would. Thanks, Eli
-
Re: [DISCUSS] Remove append?Dhruba Borthakur 2012-03-23, 01:18
I think "append" would be useful. But not precisely sure which applications
would use it. I would vote to keep the code though and not remove it. -dhruba On Thu, Mar 22, 2012 at 5:49 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Thu, Mar 22, 2012 at 5:03 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: > > @Eli, Removing a feature would simplify the design and code. I think > this is a generally true statement but not specific to Append. The > question is whether Append is useless and it should be removed? I think it > is clear from this email thread that the answer is no. > > @Nicholas, no one is saying append is "useless and should be removed." > The discussion is perhaps a little more subtle than you've understood > it to be. > > If there are a lot of good use cases I'm all for it, I just don't see > downstream projects using it any time soon (which is not to say they > don't want it, just that they can't depend on something not in 1.x), > and I haven't seen much demand. I wanted to hear from others if they > had. When I brought it up with a room of hdfs developers from 3 > different companies no one felt strongly. And so far only a handful of > people have chimed in, I actually thought more would. > > Thanks, > Eli > -- Subscribe to my posts at http://www.facebook.com/dhruba
-
Re: [DISCUSS] Remove append?CHANG Lei 2012-03-23, 06:22
Append is already useful for our current project. It makes it possible
for us not to implement extra tricky logic to compact a large number of small files regularly. Thanks Lei On Thu, Mar 22, 2012 at 6:18 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: > I think "append" would be useful. But not precisely sure which applications > would use it. I would vote to keep the code though and not remove it. > > -dhruba > > On Thu, Mar 22, 2012 at 5:49 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > >> On Thu, Mar 22, 2012 at 5:03 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> > @Eli, Removing a feature would simplify the design and code. I think >> this is a generally true statement but not specific to Append. The >> question is whether Append is useless and it should be removed? I think it >> is clear from this email thread that the answer is no. >> >> @Nicholas, no one is saying append is "useless and should be removed." >> The discussion is perhaps a little more subtle than you've understood >> it to be. >> >> If there are a lot of good use cases I'm all for it, I just don't see >> downstream projects using it any time soon (which is not to say they >> don't want it, just that they can't depend on something not in 1.x), >> and I haven't seen much demand. I wanted to hear from others if they >> had. When I brought it up with a room of hdfs developers from 3 >> different companies no one felt strongly. And so far only a handful of >> people have chimed in, I actually thought more would. >> >> Thanks, >> Eli >> > > > > -- > Subscribe to my posts at http://www.facebook.com/dhruba
-
Re: [DISCUSS] Remove append?Daryn Sharp 2012-03-23, 17:03
I think Yarn/MR might be able to benefit from the ability to append to logs in hdfs. It might reduce some of the after-the-fact copying of logs into hdfs.
Daryn On Mar 22, 2012, at 8:18 PM, Dhruba Borthakur wrote: > I think "append" would be useful. But not precisely sure which applications > would use it. I would vote to keep the code though and not remove it. > > -dhruba > > On Thu, Mar 22, 2012 at 5:49 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > >> On Thu, Mar 22, 2012 at 5:03 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>> @Eli, Removing a feature would simplify the design and code. I think >> this is a generally true statement but not specific to Append. The >> question is whether Append is useless and it should be removed? I think it >> is clear from this email thread that the answer is no. >> >> @Nicholas, no one is saying append is "useless and should be removed." >> The discussion is perhaps a little more subtle than you've understood >> it to be. >> >> If there are a lot of good use cases I'm all for it, I just don't see >> downstream projects using it any time soon (which is not to say they >> don't want it, just that they can't depend on something not in 1.x), >> and I haven't seen much demand. I wanted to hear from others if they >> had. When I brought it up with a room of hdfs developers from 3 >> different companies no one felt strongly. And so far only a handful of >> people have chimed in, I actually thought more would. >> >> Thanks, >> Eli >> > > > > -- > Subscribe to my posts at http://www.facebook.com/dhruba
-
Re: [DISCUSS] Remove append?Scott Carey 2012-03-24, 02:26
On 3/20/12 5:37 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >Append introduces non-trivial design and code complexity, which is not >worth the cost if we don't have real users. Removing append means we >have the property that HDFS blocks, when finalized, are immutable. >This significantly simplifies the design and code, which significantly >simplifies the implementation of other features like snapshots, >HDFS-level caching, dedupe, etc. The above is related the critical design flaw in HDFS that makes it more complicated than necessary. Immutable files on a node can be combined with append with copy-on-write semantics if the blocks are small enough. But small blocks are not going to work with this flaw. This flaw is the definition of a block. It is conflated, being is two things at once: # An immutable segment of data that the file system tracks. # The segment of data that is contiguous on an individual data node. The first in any sane file system is a constant length. The second need not be. File systems like Ext4 and XFS use extents to map ranges of blocks to contiguous regions on disk. Then, they need only track these extents rather than all the fine grained detail of each block. The equivalent of a block report is then an extent report. HDFS does not have extents, and this causes extreme pressure to have large blocks for two well known reasons: reduction in filesystem state data, and larger data batches for Mappers. With extents, both of these pressures apply to extent sizes instead of block sizes. Blocks can be small, extents larger. Blocks can be immutable with copy-on-write for appends, truncate, and even random write. Others have already implemented the above in other distributed file systems. But when mentioned here in the past it seemed to be ignored or misunderstood: http://mail-archives.apache.org/mod_mbox/hadoop-general/201110.mbox/%3C1318 437111.16477.228.camel@thinkpad%3E The response to that was disappointing -- the extent concept did not seem to be comprehended, and none of the good ideas from the links provided got discussed. I personally NEED append for some of my work and had been planning on using it in 0.23. However I recognize that even more than that I can't risk losing data for my append use case. If append is too hard and complicated to bolt on to HDFS, perhaps a bigger re-think is required so that such features are not so complicated and a better natural fit to the design.
-
Re: [DISCUSS] Remove append?Scott Carey 2012-03-24, 02:44
On 3/22/12 10:25 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >On Thu, Mar 22, 2012 at 1:26 AM, Konstantin Shvachko ><[EMAIL PROTECTED]> wrote: >> Eli, >> >> I went over the entire discussion on the topic, and did not get it. Is >> there a problem with append? We know it does not work in hadoop-1, >> only flush() does. Is there anything wrong with the new append >> (HDFS-265)? If so please file a bug. >> I tested it in Hadoop-0.22 branch it works fine. >> >> I agree with people who were involved with the implementation of the >> new append that the complexity is mainly in >> 1. pipeline recovery >> 2. consistent client reading while writing, and >> 3. hflush() >> Once it is done the append itself, which is reopening of previously >> closed files for adding data, is not complex. >> > >I agree that much of the complexity is in #1-3 above, which is why >HDFS-265 is leveraged. >The primary simplicity of not having append (and truncate) comes from >not leveraging the invariant that finalized blocks are immutable, that >blocks once written won't eg shrink in size (which we assume today). That invariant can co-exist with append via copy-on-write. The new state and old state would co-exist until the old state was not needed, a file's block map would have to use a persistent data structure. Copy on write semantics with blocks in file systems is all the rage these days. Free snapshots, atomic transactions for operations on multiple blocks, etc. > >> You mentioned it and I agree you indeed should be more involved with >> your customer base. As for eBay, append was of the motivations to work >> on stabilizing 0.22 branch. And there is a lot of use cases which >> require append for our customers. >> Some of them were mentioned in this discussion. >> > >From what I've seen 0.22 isn't ready for production use. Aside from >not supporting critical features like security, it doesn't have a >size-able user-base behind it testing and fixing bugs, etc. All things >I'd imagine an org like eBay would want. I've never gotten a request >to support 0.22 from a customer. > >Thanks, >Eli
-
Re: [DISCUSS] Remove append?Scott Carey 2012-03-24, 02:46
On 3/22/12 5:41 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >On Thu, Mar 22, 2012 at 4:27 PM, <[EMAIL PROTECTED]> wrote: >> Eli, >> >> I think by "current definition of visible length", you mean that once a >> client opens a file and gets block list, it will always be able to read >>up >> to the length at open. >> > >I was thinking of the definition from the design doc. See my last >comment on HDFS-2288, part of the confusion is that we're using the >same name for two different things. > >> However, correct me if I am wrong, but this definition is already >> violated, if file is deleted after open. > >I think you're right. Another thing that could be fixed with COW blocks and MVCC principles. If a file was opened, then deleted the blocks on the opened file would still be visible to that client, but no new ones. > >> So, truncate does add some complexity, but not a whole lot. If client >>gets >> an EOF before length at open, it must retry to see if the new visible >> length is different (rather than to see if the file does not exist >> anymore). >> >> Right ? >> > >Makes sense. I was thinking you were talking about truncate on open >files, which be harder. You can already truncate a file on open, you >just can't choose the offset you want to truncate at (the NN >implements this by deleting the file). > >Thanks, >Eli > >> >> --- >> Milind Bhandarkar >> Greenplum Labs, EMC >> (Disclaimer: Opinions expressed in this email are those of the author, >>and >> do not necessarily represent the views of any organization, past or >> present, the author might be affiliated with.) >> >> >> >> On 3/22/12 4:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >> >>>On Thu, Mar 22, 2012 at 3:57 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >>>>> Do you think having the invariant that blocks are not mutated would >>>>> significantly simply the design? >>>> >>>> No. As mentioned in my previous email and others, the complexity is >>>>in >>>>hflush. Once we have hflush, append is straightforward. >>> >>>I understand that append is a small delta once you have hflush, what >>>I'm saying is that the overall design of the file system is >>>significantly simplified if you can assume blocks are not mutated. Eg >>>see the way truncate is going to interact with the current definition >>>of visible length (it violates it). Resolving issues like that are >>>non-trivial. >>> >>>Thanks, >>>Eli >>> >>
-
Re: [DISCUSS] Remove append?Colin McCabe 2012-03-26, 19:53
On Fri, Mar 23, 2012 at 7:44 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
> > > On 3/22/12 10:25 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > >>On Thu, Mar 22, 2012 at 1:26 AM, Konstantin Shvachko >><[EMAIL PROTECTED]> wrote: >>> Eli, >>> >>> I went over the entire discussion on the topic, and did not get it. Is >>> there a problem with append? We know it does not work in hadoop-1, >>> only flush() does. Is there anything wrong with the new append >>> (HDFS-265)? If so please file a bug. >>> I tested it in Hadoop-0.22 branch it works fine. >>> >>> I agree with people who were involved with the implementation of the >>> new append that the complexity is mainly in >>> 1. pipeline recovery >>> 2. consistent client reading while writing, and >>> 3. hflush() >>> Once it is done the append itself, which is reopening of previously >>> closed files for adding data, is not complex. >>> >> >>I agree that much of the complexity is in #1-3 above, which is why >>HDFS-265 is leveraged. >>The primary simplicity of not having append (and truncate) comes from >>not leveraging the invariant that finalized blocks are immutable, that >>blocks once written won't eg shrink in size (which we assume today). > > That invariant can co-exist with append via copy-on-write. The new state > and old state would co-exist until the old state was not needed, a file's > block map would have to use a persistent data structure. Copy on write > semantics with blocks in file systems is all the rage these days. Free > snapshots, atomic transactions for operations on multiple blocks, etc. Hi Scott, If a client accesses a file, and then the client becomes unresponsive, how long should you wait before declaring the blocks he was looking at unused? No matter how long or how short a period you choose, someone will argue with it. And having to track this kind of state in the NameNode introduces a huge amount of complexity, not to mention extra memory consumption. Basically, we would have to track the ID of every block that any client looked at, at all times. Colin > >> >>> You mentioned it and I agree you indeed should be more involved with >>> your customer base. As for eBay, append was of the motivations to work >>> on stabilizing 0.22 branch. And there is a lot of use cases which >>> require append for our customers. >>> Some of them were mentioned in this discussion. >>> >> > >From what I've seen 0.22 isn't ready for production use. Aside from >>not supporting critical features like security, it doesn't have a >>size-able user-base behind it testing and fixing bugs, etc. All things >>I'd imagine an org like eBay would want. I've never gotten a request >>to support 0.22 from a customer. >> >>Thanks, >>Eli >
-
Re: [DISCUSS] Remove append?Colin McCabe 2012-03-26, 20:02
On Thu, Mar 22, 2012 at 5:49 PM, Eli Collins <[EMAIL PROTECTED]> wrote:
> On Thu, Mar 22, 2012 at 5:03 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> @Eli, Removing a feature would simplify the design and code. I think this is a generally true statement but not specific to Append. The question is whether Append is useless and it should be removed? I think it is clear from this email thread that the answer is no. > > @Nicholas, no one is saying append is "useless and should be removed." > The discussion is perhaps a little more subtle than you've understood > it to be. > > If there are a lot of good use cases I'm all for it, I just don't see > downstream projects using it any time soon (which is not to say they > don't want it, just that they can't depend on something not in 1.x), > and I haven't seen much demand. I wanted to hear from others if they > had. When I brought it up with a room of hdfs developers from 3 > different companies no one felt strongly. And so far only a handful of > people have chimed in, I actually thought more would. Just one comment: If we do decide to keep append in, we should get it to be actually stable and usable. In my opinion, this should definitely happen before adding any new operations. Colin
-
Re: [DISCUSS] Remove append?Scott Carey 2012-03-26, 20:53
On 3/26/12 12:53 PM, "Colin McCabe" <[EMAIL PROTECTED]> wrote: >On Fri, Mar 23, 2012 at 7:44 PM, Scott Carey <[EMAIL PROTECTED]> >wrote: >> >> >> On 3/22/12 10:25 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >> >>>On Thu, Mar 22, 2012 at 1:26 AM, Konstantin Shvachko >>><[EMAIL PROTECTED]> wrote: >>>> Eli, >>>> >>>> I went over the entire discussion on the topic, and did not get it. Is >>>> there a problem with append? We know it does not work in hadoop-1, >>>> only flush() does. Is there anything wrong with the new append >>>> (HDFS-265)? If so please file a bug. >>>> I tested it in Hadoop-0.22 branch it works fine. >>>> >>>> I agree with people who were involved with the implementation of the >>>> new append that the complexity is mainly in >>>> 1. pipeline recovery >>>> 2. consistent client reading while writing, and >>>> 3. hflush() >>>> Once it is done the append itself, which is reopening of previously >>>> closed files for adding data, is not complex. >>>> >>> >>>I agree that much of the complexity is in #1-3 above, which is why >>>HDFS-265 is leveraged. >>>The primary simplicity of not having append (and truncate) comes from >>>not leveraging the invariant that finalized blocks are immutable, that >>>blocks once written won't eg shrink in size (which we assume today). >> >> That invariant can co-exist with append via copy-on-write. The new >>state >> and old state would co-exist until the old state was not needed, a >>file's >> block map would have to use a persistent data structure. Copy on write >> semantics with blocks in file systems is all the rage these days. Free >> snapshots, atomic transactions for operations on multiple blocks, etc. > >Hi Scott, > >If a client accesses a file, and then the client becomes unresponsive, >how long should you wait before declaring the blocks he was looking at >unused? >No matter how long or how short a period you choose, someone >will argue with it. How long does the NN wait now? What if a client is reading a file, then becomes unresponsive, then another deletes the file today? At some point the NN has to unlock the file and allow for delete. If you choose locking you have the question of when to expire a lock. With MVCC you have the question of when to retire a reference. It is the same, exact problem. >And having to track this kind of state in the >NameNode introduces a huge amount of complexity, not to mention extra >memory consumption. Basically, we would have to track the ID of every >block that any client looked at, at all times. There are simple, almost trivial solutions. java.lang.ref.WeakReference makes it trivial to track when an object (block reference) is no longer referenced by client objects so that it can be logged as dead. Persistent data structures make it truly trivial to reference only exactly what is visible to open transactions. I strongly feel that the result would be many fewer lines of code and complexity. Solutions for the sort of data structures required have been solved by others in the last 35 years -- but mostly for functional languages -- but there is still plenty of innovation -- the Immutable Bitmapped Vector Trie is a powerful and fascinating example. The following presentation is excellent, and covers the sort of data structures solve the problems you list above without the complexity that would be required if the NN block map was an ephemeral data structure: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey In addition to allowing for atomic transaction batches and lockless file access, file system snapshots become trivial as well -- they are equivalent to a permanently open transaction. The space needed for such a snapshot is proportional to the delta between the snapshot and the current state. > >Colin >
-
Re: [DISCUSS] Remove append?Tsz Wo Sze 2012-03-26, 20:55
> Just one comment: If we do decide to keep append in, we should get it
> to be actually stable and usable. In my opinion, this should > definitely happen before adding any new operations. @Colin, append is currently stable and, of course, usable. Many people in different organizations have tested it in small and large scale. However, it is not yet in a stable release and so it is not yet heavy used. > I agree that the notion of an immutable file is useful since it lets the > system and tools optimize certain things. A xerox-parc file system in the > 80s had this feature that the system exploited. I would support adding the > notion of an immutable file to Hadoop. @Sanjay, I filed HDFS-3154. @Eli and others, it turns out that the discussion is very useful! Thanks. Nicholas
-
Re: [DISCUSS] Remove append?Colin McCabe 2012-03-26, 21:31
On Mon, Mar 26, 2012 at 1:55 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:
>> Just one comment: If we do decide to keep append in, we should get it >> to be actually stable and usable. In my opinion, this should >> definitely happen before adding any new operations. > > @Colin, append is currently stable and, of course, usable. Many people in different organizations have tested it > in small and large scale. However, it is not yet in a stable release and so it is not yet heavy used. The append unit test failed on me recently on Jenkins. It's possible that this was due to a Jenkins timeout, or something, but I assumed it was due to instability at the time. If it happens again, I'll be sure to check the backtrace and file a JIRA if needed. >> I agree that the notion of an immutable file is useful since it lets the >> system and tools optimize certain things. A xerox-parc file system in the >> 80s had this feature that the system exploited. I would support adding the >> notion of an immutable file to Hadoop. I think Eli was hoping that making files immutable would make the system simpler, and hopefully, less buggy. You won't get that benefit if only certain files are immutable. In fact, quite the contrary-- you'll just be adding more complexity. I'd also like to see what the "certain things" are that having certain files, but not others, be immutable would allow you to optimize. The thread you linked to from the JIRA has no information on this. I am aware of at least two "filesystems" (in the loose sense of the word) that have immutable files. One is Venti from Plan9, and the other is git, by Linus Torvalds. Both of them are significantly simpler because of their invariant that files cannot change. However, both of them are append-only, meaning that files can never be deleted. This seems unsuitable for the HDFS use case, and in fact, I see no reason to believe that having some, but not all, files be immutable would provide any benefit. Feel free to prove me wrong if you think of something, though! cheers, Colin > > @Sanjay, I filed HDFS-3154. > > @Eli and others, it turns out that the discussion is very useful! Thanks. > > Nicholas
-
Re: [DISCUSS] Remove append?Tsz Wo Sze 2012-03-27, 02:46
Hi Colin,
Please feel free to file JIRAs if you see unit test failures. Let's continue the immutable file discussion on HDFS-3154. Nicholas ________________________________ From: Colin McCabe <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Tsz Wo Sze <[EMAIL PROTECTED]> Sent: Monday, March 26, 2012 2:31 PM Subject: Re: [DISCUSS] Remove append? On Mon, Mar 26, 2012 at 1:55 PM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote: >> Just one comment: If we do decide to keep append in, we should get it >> to be actually stable and usable. In my opinion, this should >> definitely happen before adding any new operations. > > @Colin, append is currently stable and, of course, usable. Many people in different organizations have tested it > in small and large scale. However, it is not yet in a stable release and so it is not yet heavy used. The append unit test failed on me recently on Jenkins. It's possible that this was due to a Jenkins timeout, or something, but I assumed it was due to instability at the time. If it happens again, I'll be sure to check the backtrace and file a JIRA if needed. >> I agree that the notion of an immutable file is useful since it lets the >> system and tools optimize certain things. A xerox-parc file system in the >> 80s had this feature that the system exploited. I would support adding the >> notion of an immutable file to Hadoop. I think Eli was hoping that making files immutable would make the system simpler, and hopefully, less buggy. You won't get that benefit if only certain files are immutable. In fact, quite the contrary-- you'll just be adding more complexity. I'd also like to see what the "certain things" are that having certain files, but not others, be immutable would allow you to optimize. The thread you linked to from the JIRA has no information on this. I am aware of at least two "filesystems" (in the loose sense of the word) that have immutable files. One is Venti from Plan9, and the other is git, by Linus Torvalds. Both of them are significantly simpler because of their invariant that files cannot change. However, both of them are append-only, meaning that files can never be deleted. This seems unsuitable for the HDFS use case, and in fact, I see no reason to believe that having some, but not all, files be immutable would provide any benefit. Feel free to prove me wrong if you think of something, though! cheers, Colin > > @Sanjay, I filed HDFS-3154. > > @Eli and others, it turns out that the discussion is very useful! Thanks. > > Nicholas |