|
Keith Wiley
2013-02-14, 22:20
Harsh J
2013-02-14, 23:02
Keith Wiley
2013-02-14, 23:35
Harsh J
2013-02-14, 23:39
Keith Wiley
2013-02-14, 23:46
Marcos Ortiz Valmaseda
2013-02-15, 03:09
|
-
.deflate troubleKeith Wiley 2013-02-14, 22:20
I just got hadoop running on EC2 (0.19 just because that's the AMI the scripts seemed to go for). The PI example worked and I believe the wordcount example worked too. However, the output file is in .deflate format. "hadoop fs -text" fails to decompress the file -- it produces the same binary output as "hadoop fs -cat", which I find counterintuitive; isn't -text specifically supposed to handle this situation?
I copied the file to local and tried manually decompressing it with gunzip and lzop (by appending appropriate suffixes), but both tools failed to recognize the file. To add to the confusion, I see this in the default configuration offered by the EC2 scripts: <name>mapred.output.compress</name> <value>false</value> <description>Should the job outputs be compressed? </description> ...so I don't understand why the output was compressed in the first place. At this point, I'm kind of stuck. The output shouldn't be compressed to begin with, and all attempts to decompress it have failed. Any ideas? Thanks. ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "And what if we picked the wrong religion? Every week, we're just making God madder and madder!" -- Homer Simpson ________________________________________________________________________________
-
Re: .deflate troubleHarsh J 2013-02-14, 23:02
0.19 is really old and thats probably why the Text utility (fs -text)
doesn't support automatic decompression based on extensions (or specifically, of .deflate). Did the job.xml of the job that produced this output also carry mapred.output.compress=false in it? The file should be viewable on the JT UI page for the job. Unless explicitly turned out, even 0.19 wouldn't have enabled compression on its own. On Fri, Feb 15, 2013 at 3:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: > I just got hadoop running on EC2 (0.19 just because that's the AMI the scripts seemed to go for). The PI example worked and I believe the wordcount example worked too. However, the output file is in .deflate format. "hadoop fs -text" fails to decompress the file -- it produces the same binary output as "hadoop fs -cat", which I find counterintuitive; isn't -text specifically supposed to handle this situation? > > I copied the file to local and tried manually decompressing it with gunzip and lzop (by appending appropriate suffixes), but both tools failed to recognize the file. To add to the confusion, I see this in the default configuration offered by the EC2 scripts: > > <name>mapred.output.compress</name> > <value>false</value> > <description>Should the job outputs be compressed? > </description> > > ...so I don't understand why the output was compressed in the first place. > > At this point, I'm kind of stuck. The output shouldn't be compressed to begin with, and all attempts to decompress it have failed. > > Any ideas? > > Thanks. > > ________________________________________________________________________________ > Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com > > "And what if we picked the wrong religion? Every week, we're just making God > madder and madder!" > -- Homer Simpson > ________________________________________________________________________________ > -- Harsh J
-
Re: .deflate troubleKeith Wiley 2013-02-14, 23:35
I'll look into the job.xml issue, thanks for the suggestion. In the meantime, who is in charge of maintaining the "official" AWS Hadoop AMIs? The following are the contents of the hadoop-images/ S3 bucket. As you can see, it tops out at 19:
IMAGE ami-65987c0c hadoop-images/hadoop-0.17.1-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen IMAGE ami-4b987c22 hadoop-images/hadoop-0.17.1-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen IMAGE ami-b0fe1ad9 hadoop-images/hadoop-0.18.0-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen IMAGE ami-90fe1af9 hadoop-images/hadoop-0.18.0-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen IMAGE ami-ea36d283 hadoop-images/hadoop-0.18.1-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen IMAGE ami-fe37d397 hadoop-images/hadoop-0.18.1-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen IMAGE ami-fa6a8e93 hadoop-images/hadoop-0.19.0-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen IMAGE ami-cd6a8ea4 hadoop-images/hadoop-0.19.0-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen IMAGE ami-15e80f7c hadoop-images/hadoop-base-20090210-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen IMAGE ami-1ee80f77 hadoop-images/hadoop-base-20090210-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen On Feb 14, 2013, at 15:02 , Harsh J wrote: > 0.19 is really old and thats probably why the Text utility (fs -text) > doesn't support automatic decompression based on extensions (or > specifically, of .deflate). > > Did the job.xml of the job that produced this output also carry > mapred.output.compress=false in it? The file should be viewable on the > JT UI page for the job. Unless explicitly turned out, even 0.19 > wouldn't have enabled compression on its own. > > On Fri, Feb 15, 2013 at 3:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: >> I just got hadoop running on EC2 (0.19 just because that's the AMI the scripts seemed to go for). The PI example worked and I believe the wordcount example worked too. However, the output file is in .deflate format. "hadoop fs -text" fails to decompress the file -- it produces the same binary output as "hadoop fs -cat", which I find counterintuitive; isn't -text specifically supposed to handle this situation? >> >> I copied the file to local and tried manually decompressing it with gunzip and lzop (by appending appropriate suffixes), but both tools failed to recognize the file. To add to the confusion, I see this in the default configuration offered by the EC2 scripts: >> >> <name>mapred.output.compress</name> >> <value>false</value> >> <description>Should the job outputs be compressed? >> </description> >> >> ...so I don't understand why the output was compressed in the first place. >> >> At this point, I'm kind of stuck. The output shouldn't be compressed to begin with, and all attempts to decompress it have failed. >> >> Any ideas? >> >> Thanks. ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me." -- Abe (Grandpa) Simpson ________________________________________________________________________________
-
Re: .deflate troubleHarsh J 2013-02-14, 23:39
Am pretty sure we do not maintain AMIs here at Apache Hadoop at least.
Perhaps those are from some earlier effort by AMZN itself? There is this page that may interest you though?: http://wiki.apache.org/hadoop/AmazonEC2. Sorry for the short responses - I'm not well versed with these things. Someone else can probably assist you further. On Fri, Feb 15, 2013 at 5:05 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: > > I'll look into the job.xml issue, thanks for the suggestion. In the meantime, who is in charge of maintaining the "official" AWS Hadoop AMIs? The following are the contents of the hadoop-images/ S3 bucket. As you can see, it tops out at 19: > > IMAGE ami-65987c0c hadoop-images/hadoop-0.17.1-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen > IMAGE ami-4b987c22 hadoop-images/hadoop-0.17.1-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen > IMAGE ami-b0fe1ad9 hadoop-images/hadoop-0.18.0-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen > IMAGE ami-90fe1af9 hadoop-images/hadoop-0.18.0-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen > IMAGE ami-ea36d283 hadoop-images/hadoop-0.18.1-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen > IMAGE ami-fe37d397 hadoop-images/hadoop-0.18.1-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen > IMAGE ami-fa6a8e93 hadoop-images/hadoop-0.19.0-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen > IMAGE ami-cd6a8ea4 hadoop-images/hadoop-0.19.0-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen > IMAGE ami-15e80f7c hadoop-images/hadoop-base-20090210-i386.manifest.xml 914733919441 available public i386 machine aki-a71cf9ce ari-a51cf9cc instance-store paravirtual xen > IMAGE ami-1ee80f77 hadoop-images/hadoop-base-20090210-x86_64.manifest.xml 914733919441 available public x86_64 machine aki-b51cf9dc ari-b31cf9da instance-store paravirtual xen > > > On Feb 14, 2013, at 15:02 , Harsh J wrote: > > > 0.19 is really old and thats probably why the Text utility (fs -text) > > doesn't support automatic decompression based on extensions (or > > specifically, of .deflate). > > > > Did the job.xml of the job that produced this output also carry > > mapred.output.compress=false in it? The file should be viewable on the > > JT UI page for the job. Unless explicitly turned out, even 0.19 > > wouldn't have enabled compression on its own. > > > > On Fri, Feb 15, 2013 at 3:50 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: > >> I just got hadoop running on EC2 (0.19 just because that's the AMI the scripts seemed to go for). The PI example worked and I believe the wordcount example worked too. However, the output file is in .deflate format. "hadoop fs -text" fails to decompress the file -- it produces the same binary output as "hadoop fs -cat", which I find counterintuitive; isn't -text specifically supposed to handle this situation? > >> > >> I copied the file to local and tried manually decompressing it with gunzip and lzop (by appending appropriate suffixes), but both tools failed to recognize the file. To add to the confusion, I see this in the default configuration offered by the EC2 scripts: Harsh J
-
Re: .deflate troubleKeith Wiley 2013-02-14, 23:46
Good call. We can't use the conventional web-based JT due to corporate access issues, but I looked at the job_XXX.xml file directly, and sure enough, it set mapred.output.compress to true. Now I just need to remember how that occurs. I simply ran the wordcount example straight off the command line, I didn't specify any overridden conf settings for the job.
Ultimately, the solution (or part of it) is to get away from .19 to a more up-to-date version of Hadoop. I would prefer 2.0 over 1.0 in fact, but due to a remarkable lack of concise EC2/Hadoop documentation (and the fact that what docs I did find were very old and therefore conformed to .19 style Hadoop), I have fallen back on old versions of Hadoop for my initial tests. In the long run, I will need to get a more modern version of Hadoop to successfully deploy on EC2. Thanks. On Feb 14, 2013, at 15:02 , Harsh J wrote: > Did the job.xml of the job that produced this output also carry > mapred.output.compress=false in it? The file should be viewable on the > JT UI page for the job. Unless explicitly turned out, even 0.19 > wouldn't have enabled compression on its own. ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "The easy confidence with which I know another man's religion is folly teaches me to suspect that my own is also." -- Mark Twain ________________________________________________________________________________
-
Re: .deflate troubleMarcos Ortiz Valmaseda 2013-02-15, 03:09
Regards, Keith. For EMR issues and stuff, you can contact directly to Jeff Barr(Chief Evangelist for AWS) or to Saurabh Baji (Product Manager for AWS EMR).
Best wishes. ----- Mensaje original ----- De: "Keith Wiley" <[EMAIL PROTECTED]> Para: [EMAIL PROTECTED] Enviados: Jueves, 14 de Febrero 2013 15:46:05 Asunto: Re: .deflate trouble Good call. We can't use the conventional web-based JT due to corporate access issues, but I looked at the job_XXX.xml file directly, and sure enough, it set mapred.output.compress to true. Now I just need to remember how that occurs. I simply ran the wordcount example straight off the command line, I didn't specify any overridden conf settings for the job. Ultimately, the solution (or part of it) is to get away from .19 to a more up-to-date version of Hadoop. I would prefer 2.0 over 1.0 in fact, but due to a remarkable lack of concise EC2/Hadoop documentation (and the fact that what docs I did find were very old and therefore conformed to .19 style Hadoop), I have fallen back on old versions of Hadoop for my initial tests. In the long run, I will need to get a more modern version of Hadoop to successfully deploy on EC2. Thanks. On Feb 14, 2013, at 15:02 , Harsh J wrote: > Did the job.xml of the job that produced this output also carry > mapred.output.compress=false in it? The file should be viewable on the > JT UI page for the job. Unless explicitly turned out, even 0.19 > wouldn't have enabled compression on its own. ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com "The easy confidence with which I know another man's religion is folly teaches me to suspect that my own is also." -- Mark Twain ________________________________________________________________________________ -- Marcos Ortiz Valmaseda, Product Manager && Data Scientist at UCI Blog : http://marcosluis2186.posterous.com LinkedIn: http://www.linkedin.com/in/marcosluis2186 Twitter : @marcosluis2186 |