Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to read LZO compressed files?

Copy link to this message
Re: How to read LZO compressed files?

The first solution is my final plan. There are so many lzo files, that
manual decompression would take quite a while

As you suggested, I have used LzoTextInputFormat but I get the following

2012-01-02 16:15:16,668 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-01-02 16:15:16,765 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId2012-01-02 16:15:16,858 INFO
com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
2012-01-02 16:15:16,860 INFO com.hadoop.compression.lzo.LzoCodec:
Successfully loaded & initialized native-lzo library [hadoop-lzo rev
2012-01-02 16:15:16,906 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-01-02 16:15:16,908 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Codec for file
not found, cannot run
at com.hadoop.mapreduce.LzoLineRecordReader.initialize(LzoLineRecordReader.java:97)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:451)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-01-02 16:15:16,910 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task

which I don't understand, because I do have LZO codec.
Could you tell me what I am doing wrong here?


2012/1/2 Shi Yu <[EMAIL PROTECTED]>

> You could decompress the LZO file manually into plain text then
> using TextInputFormat.
> Alternatively, you don't need to index the LZO compressed file,
> just using LZOInputFormat on non-indexed files, then the LZO
> file will not be split anymore.