MapReduce >> mail # user >> enable snappy on hadoop 1.1.1

RE: enable snappy on hadoop 1.1.1
I kind of read the hadoop 1.1.1 source code for this, it is very strange for me now.
>From the error, it looks like runtime JVM cannot find the native method of org/apache/hadoop/io/compress/snappy/SnappyCompressor.compressBytesDirect()I, that my guess from the error message, but from the log, it looks like all the native library, include native-hadoop and native snappy are both loaded, as shown in the failed task log:
2013-10-04 16:33:21,635 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library2013-10-04 16:33:22,006 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 02013-10-04 16:33:22,020 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@282528252013-10-04 16:33:22,111 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 12013-10-04 16:33:22,116 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 2562013-10-04 16:33:22,168 INFO org.apache.hadoop.mapred.MapTask: data buffer = 204010960/2550136962013-10-04 16:33:22,168 INFO org.apache.hadoop.mapred.MapTask: record buffer = 671088/8388602013-10-04 16:33:22,342 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available2013-10-04 16:33:22,342 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded2013-10-04 16:33:44,054 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output2013-10-04 16:33:44,872 WARN org.apache.hadoop.io.compress.snappy.SnappyCompressor: java.lang.UnsatisfiedLinkError: org/apache/hadoop/io/compress/snappy/SnappyCompressor.initIDs()V2013-10-04 16:33:44,872 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor2013-10-04 16:33:44,928 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-12013-10-04 16:33:44,951 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.2013-10-04 16:33:44,951 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName yzhang for UID 1000 from the native implementation2013-10-04 16:33:44,952 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.UnsatisfiedLinkError: org/apache/hadoop/io/compress/snappy/SnappyCompressor.compressBytesDirect()I        at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:229)        at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:141)        at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:135)        at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:135)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1450)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)        at java.security.AccessController.doPrivileged(AccessController.java:310)        at javax.security.auth.Subject.doAs(Subject.java:573)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)        at org.apache.hadoop.mapred.Child.main(Child.java:249)
So is there any way I can check if $HADOOP_HOME/lib/native/Linux-amd64-64/libhadoop.so contains the native method as expected above?This hadoop 1.1.0 is not compiled by me, but coming from IBM biginsight 2.1 as we are evaluating it. I will create a ticket for them, but is this kind of strange, as everything loaded shown in the log, but complains later about a native method? Any reason can cause this?
Subject: enable snappy on hadoop 1.1.1
Date: Fri, 4 Oct 2013 15:44:34 -0400
I am using hadoop 1.1.1. I want to test to see the snappy compression with hadoop, but I have some problems to make it work on my Linux environment.
I am using opensuse 12.3 x86_64.
First, when I tried to enable snappy in hadoop 1.1.1 by:
            conf.setBoolean("mapred.compress.map.output", true);            conf.set("mapred.output.compression.type", "RECORD");            conf.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.SnappyCodec");
I got the following error in my test MR job:
Exception in thread "main" java.lang.RuntimeException: native snappy library not available
So I download the snappy 1.1.0 from https://code.google.com/p/snappy/, compile it and install it successfully under /opt/snappy-1.1.0, and then I link the /opt/snappy-1.1.0/lib64/libsnappy.so to /user/lib64/libsnappy.so
Now after I restart the hadoop and tried my test MR job again, this time, it didn't give me the originally error, but a new error like this:
Error: java.lang.UnsatisfiedLinkError: org/apache/hadoop/io/compress/snappy/SnappyCompressor.compressBytesDirect()I at org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:229) at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:141) at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:135) at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:135) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1450) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:852) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1343)
I wrote a test problem, like hadoop did load the library:
it works fine in my test program.
I don't know why at runtime, the Class SnappyCompressor.compressByteDirect() gave back that kind of error. From the source code, it looks like a native c program from here
