Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - How to read LZO compressed files?


+
edward choi 2012-01-02, 05:34
+
Shi Yu 2012-01-02, 06:54
Copy link to this message
-
Re: How to read LZO compressed files?
edward choi 2012-01-02, 07:22
Hi,

The first solution is my final plan. There are so many lzo files, that
manual decompression would take quite a while

As you suggested, I have used LzoTextInputFormat but I get the following
error

2012-01-02 16:15:16,668 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-01-02 16:15:16,765 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId2012-01-02 16:15:16,858 INFO
com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
library
2012-01-02 16:15:16,860 INFO com.hadoop.compression.lzo.LzoCodec:
Successfully loaded & initialized native-lzo library [hadoop-lzo rev
8aa060526bc6778c971775b832751d2894c73b5f]
2012-01-02 16:15:16,906 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-01-02 16:15:16,908 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Codec for file
hdfs://lp182:54310/user/hadoop/blog_result/20111106_20111112/part-m-00000.lzo
not found, cannot run
at com.hadoop.mapreduce.LzoLineRecordReader.initialize(LzoLineRecordReader.java:97)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:451)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-01-02 16:15:16,910 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task

which I don't understand, because I do have LZO codec.
Could you tell me what I am doing wrong here?

Regards,
Ed

2012/1/2 Shi Yu <[EMAIL PROTECTED]>

> You could decompress the LZO file manually into plain text then
> using TextInputFormat.
>
> Alternatively, you don't need to index the LZO compressed file,
> just using LZOInputFormat on non-indexed files, then the LZO
> file will not be split anymore.
>
+
Harsh J 2012-01-02, 07:22
+
edward choi 2012-01-02, 08:01