Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> lzo compressed data get corrupted


Copy link to this message
-
lzo compressed data get corrupted
Hi,
We use flume to write lzo compressed logs into HDFS and hive to analysis
these logs. I find hive sql sometimes fail due to the following errors. We
can tolerate a few lines of error log, but can not tolerate sql execution
failure.

2013-11-27 20:48:24,060 FATAL org.apache.hadoop.mapred.Child: Error running
child : java.lang.InternalError: lzo1x_decompress_safe returned: -6
at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native
Method)
at
com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:316)
at
com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122)
at
com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:247)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187)
at
com.hadoop.mapred.DeprecatedLzoLineRecordReader.next(DeprecatedLzoLineRecordReader.java:85)
at
com.hadoop.mapred.DeprecatedLzoLineRecordReader.next(DeprecatedLzoLineRecordReader.java:35)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I can prevent hive sql execution failure by changing a few lines of
hadoop-lzo code, see
https://github.com/MTDATA/hadoop-lzo/commit/c63cf2984b89175c22b95ccdf202c59193fe0840
. But I'm not sure it's appropriate to change the decompression strategy
from lzo1x_decompress_safe to lzo1x_decompress. See the
following explanation of lzo1x_decompress_safe and lzo1x_decompress from
http://www.oberhumer.com/opensource/lzo/lzofaq.php.

- lzo1x_decompress
    The 'standard' decompressor. Pretty fast - use this whenever possible.

    This decompressor expects valid compressed data.
    If the compressed data gets corrupted somehow (e.g. transmission
    via an erroneous channel, disk errors, ...) it will probably crash
    your application because absolutely no additional checks are done.

- lzo1x_decompress_safe
    The 'safe' decompressor. Somewhat slower.

    This decompressor will catch all compressed data violations and
    return an error code in this case - it will never crash.
I'm curious of under what circumstances can flume write a corrupted
lzo compressed data ? The attachment is the corrupted lzo compressed
file. You can use lzop to decompress it with an error "lzop:
lc_datalog04.1383609600707.lzo: Compressed data violation".

Thanks,

chenchun