|
|
-
profiling hdfs write path
Radim Kolar 2012-11-25, 21:41
anybody tried to profile why HDFS write path is so much CPU intensive?
-
Re: profiling hdfs write path
Todd Lipcon 2012-11-26, 01:35
Hi Radim,
Currently it's CPU-intensive for several reasons: 1) It doesn't yet use the native CRC code 2) It makes several unnecessary copies and byte buffer allocations, both in the client and in the DataNode
There are open JIRAs for these, and I have a preliminary patch which helped a lot, but it hasn't been high priority. On most clusters, writing becomes network bound before being CPU-bound. On the other hand, as 10gbe is becoming fairly common, this will probably be more important soon. Hoping to find time to get back to finishing the patches in the next few months.
-Todd
On Sun, Nov 25, 2012 at 1:41 PM, Radim Kolar <[EMAIL PROTECTED]> wrote:
> anybody tried to profile why HDFS write path is so much CPU intensive? >
-- Todd Lipcon Software Engineer, Cloudera
-
Re: profiling hdfs write path
Radim Kolar 2012-11-26, 03:07
> Currently it's CPU-intensive for several reasons: > 1) It doesn't yet use the native CRC code > 2) It makes several unnecessary copies and byte buffer allocations, both in > the client and in the DataNode > > There are open JIRAs for these, and I have a preliminary patch which helped > a lot, but it hasn't been high priority. can you attach crc path there? https://issues.apache.org/jira/browse/HDFS-3528i will finish it.
-
Re: profiling hdfs write path
Radim Kolar 2012-11-29, 15:17
> Hoping to find time to get back to finishing the patches in the next few months. Todd, just attach these pathes to jira, they do not even needs to apply cleanly to trunk. I will get them finished within day. I do not have months which i can spare on waiting for work be done by you. If you do not want to share these patches, its still fine with me we can do this work alone as well. I need just word from you.
-
Re: profiling hdfs write path
Todd Lipcon 2012-11-29, 18:25
Hi Radim, My work-in-progress branch is online here: https://github.com/toddlipcon/hadoop-common/commits/trunk-write-pipeline-fastIt is definitely buggy, it might not actually be faster, and it probably isn't well commented. But feel free to have a go at it. -Todd On Thu, Nov 29, 2012 at 7:17 AM, Radim Kolar <[EMAIL PROTECTED]> wrote: > >> Hoping to find time to get back to finishing the patches in the next few >> months. > Todd, > just attach these pathes to jira, they do not even needs to apply cleanly > to trunk. I will get them finished within day. I do not have months which i > can spare on waiting for work be done by you. If you do not want to share > these patches, its still fine with me we can do this work alone as well. I > need just word from you. -- Todd Lipcon Software Engineer, Cloudera
-
Re: profiling hdfs write path
Radim Kolar 2012-12-04, 17:07
> It is definitely buggy, it might not actually be faster, and it > probably isn't well commented. But feel free to have a go at it. thank you for your code, i got it merged with trunk. HDFS is crap code, private methods not documented at all, and unit tests are joke. I did some random code changes and some were not detected by unit tests. What methods are you using for testing?
-
Re: profiling hdfs write path
Todd Lipcon 2012-12-04, 17:27
On Tue, Dec 4, 2012 at 9:07 AM, Radim Kolar <[EMAIL PROTECTED]> wrote: > >> It is definitely buggy, it might not actually be faster, and it >> probably isn't well commented. But feel free to have a go at it. > > thank you for your code, i got it merged with trunk. HDFS is crap code, > private methods not documented at all, and unit tests are joke. I did some > random code changes and some were not detected by unit tests. What methods > are you using for testing?
If you're just going to insult us, please stay away. We don't need your help unless you're going to be constructive.
Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: profiling hdfs write path
Radim Kolar 2012-12-04, 17:39
> If you're just going to insult us, please stay away. We don't need > your help unless you're going to be constructive. Good units tests will catch code modifications like:
from: long getLastByteOffsetBlock() { return lastByteOffsetInBlock; }
to
from: long getLastByteOffsetBlock() { return lastByteOffsetInBlock-1; }
I did 10 of such changes and about 60% were undetected.
-
Re: profiling hdfs write path
Eli Collins 2012-12-04, 17:44
On Tue, Dec 4, 2012 at 9:39 AM, Radim Kolar <[EMAIL PROTECTED]> wrote: > >> If you're just going to insult us, please stay away. We don't need >> your help unless you're going to be constructive. > > Good units tests will catch code modifications like: >
Agree. Want to write some? Would love to see patches like this. Here's a recent example: HDFS-4156 (Seeking to a negative position should throw an IOE)
Thanks, Eli
-
Re: profiling hdfs write path
Suresh Srinivas 2012-12-04, 17:49
Thank you Todd! I have been seeing similar attitude in many jiras, that I have tried hard to ignore and was wondering how to respond to this email. I could not have said it better. On Tue, Dec 4, 2012 at 9:27 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > f you're just going to insult us, please stay away. We don't need > your help unless you're going to be constructive. > -- http://hortonworks.com/download/
-
Re: profiling hdfs write path
Radim Kolar 2012-12-05, 02:00
> Agree. Want to write some? Its not about writing patches, its about to get them committed. I have experience that getting something committed takes months even on simple patch. I have about 10 patches floating around none of them was committed in last 4 weeks. They are really simple stuff. I haven't tried to go with some more elaborated patch because Bible says: if you fail easy thing, you will fail hard thing too.
I am thinking day by day that i really need to fork hadoop otherwise there is no way to move it forward where i need it to be.
-
Re: profiling hdfs write path
Steve Loughran 2012-12-05, 08:57
On 5 December 2012 02:00, Radim Kolar <[EMAIL PROTECTED]> wrote:
> > Agree. Want to write some? >> > Its not about writing patches, its about to get them committed. I have > experience that getting something committed takes months even on simple > patch. I have about 10 patches floating around none of them was committed > in last 4 weeks. They are really simple stuff. I haven't tried to go with > some more elaborated patch because Bible says: if you fail easy thing, you > will fail hard thing too. > > There is inertia; nobody is happy with it -but that's the price of having something that's designed to keep PB of data safe.
> I am thinking day by day that i really need to fork hadoop otherwise there > is no way to move it forward where i need it to be. >
A lot of the early hadoop projects chose this path. Once you get out of sync with the apache code you have two problems -keeping your branch up to date with all fixes and features you want. -testing
-
Re: profiling hdfs write path
Andy Isaacson 2012-12-05, 22:21
On Tue, Dec 4, 2012 at 6:00 PM, Radim Kolar <[EMAIL PROTECTED]> wrote: > Its not about writing patches, its about to get them committed. I have > experience that getting something committed takes months even on simple > patch. I have about 10 patches floating around none of them was committed in > last 4 weeks.
Could you share a list of Jiras you're concerned about? I've seen a few patches you provided that got committed just fine, and I've seen a few patches that I thought didn't have a strong justification that didn't get committed, and I think I've seen a few Jiras that I thought were a good idea that haven't been committed yet due to outstanding review feedback or lack of a committer who can volunteer to do the work.
I'm not saying that the Hadoop process is perfect, far from it, but from where I sit (like you I'm a contributor but not yet a committer) it seems to be working OK so far for both you and I. Some things could be better, but the current fairly-conservative process has the benefit of keeping trunk in a really sane, safe state.
> They are really simple stuff. I haven't tried to go with some > more elaborated patch because Bible says: if you fail easy thing, you will > fail hard thing too. > > I am thinking day by day that i really need to fork hadoop otherwise there > is no way to move it forward where i need it to be.
Forking is tempting, but working with the community is really powerful. You've got plenty of successful jiras under your belt, let's just keep on truckin' and build a better Hadoop.
-andy
-
Re: profiling hdfs write path
Radim Kolar 2012-12-06, 02:02
-
Re: profiling hdfs write path
Andy Isaacson 2012-12-06, 23:06
On Wed, Dec 5, 2012 at 6:02 PM, Radim Kolar <[EMAIL PROTECTED]> wrote: > YARN-223 < https://issues.apache.org/jira/browse/YARN-223>> YARN-211 < https://issues.apache.org/jira/browse/YARN-211>> YARN-210 < https://issues.apache.org/jira/browse/YARN-210>> MAPREDUCE-4839 < https://issues.apache.org/jira/browse/MAPREDUCE-4839>> MAPREDUCE-4827 < https://issues.apache.org/jira/browse/MAPREDUCE-4827>> MAPREDUCE-4594 < https://issues.apache.org/jira/browse/MAPREDUCE-4594>> MAPREDUCE-3968 < https://issues.apache.org/jira/browse/MAPREDUCE-3968>I don't really know the YARN or MAPREDUCE code bases so I'm going to pass on those ones... > HADOOP-9088 < https://issues.apache.org/jira/browse/HADOOP-9088>Todd asked a pretty reasonable question that I don't see an answer to -- where will murmur3 actually be used? We generally don't add code, even if it's good code that we're sure to need someday, until there's an actual user for it. > HADOOP-9041 < https://issues.apache.org/jira/browse/HADOOP-9041>There needs to be a complete, up-to-date patch uploaded. This one seems to have two patches that need to be applied to get a working commit -- HADOOP-9041.patch and fsinit-unit.txt. Also the latter has a misspelled classname, Initialization is spelled with a "t" rather than a "c". It would be really good to develop a JUnit test that fails reliably both under mvn and Eclipse that shows the problem to avoid regressions in the future... even if the unit test has to do moderately unclean things to force the failure. (But that's not a hard requirement, if it's really impossible to do the current situation is OK.) > HADOOP-8698 < https://issues.apache.org/jira/browse/HADOOP-8698>I don't understand this patch at all. Since it makes the constructor vacuous, why not just delete the constructor entirely? If avoiding the possible "could be null" makes other code simpler, go ahead and include the simplification in this patch. (see below for more on including stuff in a single jira.) Generally if Jenkins posts a -1 on a patch, you should follow up with a comment explaining why it's OK for this patch to fail the given test. For example I had a change recently that fixed an intermittent test failure, so I didn't need to add a test. Jenkins said "-1 no tests included" and I commented "fixes TestFoo intermittent failures". One of the ways the community has compensated for the heavyweight JIRA process is to allow a single JIRA to include more change than I would normally put into a git commit. I do my development locally in a per-jira branch "hdfs1337" with normal small git-style commits, and then when I'm ready to post a patch I "git diff upstream/trunk..hdfs1337 > hdfs1337.txt" to squash all the sane git commits into a single large diff to upload. Thanks, -andy
-
Re: profiling hdfs write path
Radim Kolar 2012-12-08, 04:39
> I'm not saying that the Hadoop process is perfect, far from it, but > from where I sit (like you I'm a contributor but not yet a committer) > it seems to be working OK so far for both you and I. It does not work for me OK. Its way too slow. i got just 2k LOC in committed and still floating around patches. That is real and sad result of 1/2 year of cooperation. I know that contributor patches are low priority in every project, but this is too low priority for me.
> Some things could be better, but the current fairly-conservative process has the benefit > of keeping trunk in a really sane, safe state. if you want to keep code in safe state you need: 1. good unit test 2. high unit test coverage 3. clean code 4. documented code 5. good javadoc
> You've got plenty of successful jiras under your belt, let's just keep on truckin' and build a better Hadoop. only successful work was rework of todd patch because it made hbase about 30% faster.
-
Re: profiling hdfs write path
Steve Loughran 2012-12-08, 12:38
On 8 December 2012 04:39, Radim Kolar <[EMAIL PROTECTED]> wrote:
> > if you want to keep code in safe state you need: > 1. good unit test > 2. high unit test coverage > 3. clean code > 4. documented code > 5. good javadoc + good functional tests, which explores the deployment state of the world, especially different networks. Once you get into HA you also need the ability to trigger server failures and network partitions as part of a test run.
|
|