As most of you know, I've been hacking on HADOOP-9902 off and on for a year now. [For those that don't, this is an almost complete rewrite of most of the major shell code that we ship with Hadoop. The stuff that was missed I'll pick it up after this gets committed.] As part of this, I recently reformatted the last patch to fit that 80 character requirement as best I could. The result is... not good. Not good at all. In many ways, it actually hurt readability even beyond the lack of indentation that Bash Beautifier doesn't support for line continuation. (That case statement in bin/hadoop is painful to look at and makes me cry.)
Barring anymore feedback, it's pretty much ready to commit. But before that happens, do we want to specify that bash has different line length requirements? Say 120 chars? Most of the problems stem from our usage of REALLY LONG env var names that can't really be changed at this point without *massively* screwing backward compatibility. (Hello, YARN_RESOURCEMANAGER_OPTS... I'm talking about you!).
Bouncing the idea around a few folks, they all seem to agree that 80 is just too hard for bash given our general use case, but I think it'd be good to have something official.
I am not a shell scripting expert but I have written few and used/seen many from including top 3 enterprise software giants. I don't think everyone sticks to 80 char guidelines, may be this is remnant of the old 80 char terminals. I prefer long descriptive names for the env vars (or vars in general) as it makes the program more readable. Not sure what are technical ramifications of having lines longer than 80 char if any.
[image: photo] On Sat, Jul 26, 2014 at 3:20 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
I'm not sure how up to date that checklist is, but imposing such a small size on cmd arguments seems incredibly short sighted. I'm pretty sure that it is not a generally accepted limit. I've seen MANY Hadoop processes require lengthy CLASS_PATHS that were easily over 240 chars.
BTW, the BASH limit is something around 24k, well beyond 120 chars.
IMHO, if you want to avoid Bill Gates syndrome, (no one will ever need more than 256k RAM) you might want to set an upper limit around 64K. And then I'd use a configuration file so I could be stupid and request a max of 256M, because I'm doing something way abnormal.
$0.02. YMMV. ;) On Sat, Jul 26, 2014 at 10:02 PM, Paresh Yadav <[EMAIL PROTECTED]> wrote:
I looked at last patch and personally I don't see any major issue, perhaps JVM flags, URLs and configuration keys might be better to define as environment variables if you want to reduce the length of the lines but I think legibility is more important.
cheers, esteban. Cloudera, Inc.
On Sat, Jul 26, 2014 at 7:29 PM, Chris Embree <[EMAIL PROTECTED]> wrote:
Sun's java code convention (published in year of 97) suggest 80 column per line for old-style terminals. It sounds pretty old, However, I saw some developers (not me :)) like to open multiple terminals in one screen for coding/debugging so 80-colum could be just fit. Google's java convention ( https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.4-column-limit) shows some flexibility here with 80 or 100 column (and some exception cases). Like Chris mentioned early, I think this 80-column should just be a general guideline but not a strict limit - we can break it if it hurts legibility of code reading. btw, some research work found that CPL (characters per line) only had small effects on readability for news, including factors of speed and comprehension ( http://psychology.wichita.edu/surl/usabilitynews/72/LineLength.asp). Not sure if reading code is the same (assume break lines properly). 2014-07-29 15:24 GMT+08:00 Andrew Purtell <[EMAIL PROTECTED]>:
I think a lot of these studies don't really translate very well to code. (A lot of them are college students seeing how quickly they can read a news article.) Code with extremely long line lengths tends to have super-deep nesting, which makes it hard to keep track of what is going on (the so-called "arrow anti-pattern"). This is especially true when there are break and continue statements involved. Super-long lines make diffs very difficult to do. And it's just unpleasant to read, forcing users to choose between horizontal scrolling or tiny text...
Maybe it makes sense to extend the bash line length, though, if it's tough to fit in 80 chars. Bash is whitespace sensitive and doing the line continuation thing is a pain. Another option might be renaming some variables, or using temp variables with shorter names...
one argument in favour of 80 is that it's easier to side-by-side diff
even so, I find it restrictive in Java code; once you go for long env vars in bash-land then you are in trouble. As for python, you have to indent according to your code flow.
were we to have a special getout of 120 chars in .sh, .py, and other scripts, I'd be happy. On 29 July 2014 18:59, Colin McCabe <[EMAIL PROTECTED]> wrote: CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext