Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> How to read LZO compressed files?


+
edward choi 2012-01-02, 05:34
+
Shi Yu 2012-01-02, 06:54
Copy link to this message
-
Re: How to read LZO compressed files?
Hi,

The first solution is my final plan. There are so many lzo files, that
manual decompression would take quite a while

As you suggested, I have used LzoTextInputFormat but I get the following
error

2012-01-02 16:15:16,668 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-01-02 16:15:16,765 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId2012-01-02 16:15:16,858 INFO
com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
library
2012-01-02 16:15:16,860 INFO com.hadoop.compression.lzo.LzoCodec:
Successfully loaded & initialized native-lzo library [hadoop-lzo rev
8aa060526bc6778c971775b832751d2894c73b5f]
2012-01-02 16:15:16,906 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-01-02 16:15:16,908 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Codec for file
hdfs://lp182:54310/user/hadoop/blog_result/20111106_20111112/part-m-00000.lzo
not found, cannot run
at com.hadoop.mapreduce.LzoLineRecordReader.initialize(LzoLineRecordReader.java:97)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:451)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-01-02 16:15:16,910 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task

which I don't understand, because I do have LZO codec.
Could you tell me what I am doing wrong here?

Regards,
Ed

2012/1/2 Shi Yu <[EMAIL PROTECTED]>

> You could decompress the LZO file manually into plain text then
> using TextInputFormat.
>
> Alternatively, you don't need to index the LZO compressed file,
> just using LZOInputFormat on non-indexed files, then the LZO
> file will not be split anymore.
>
+
Harsh J 2012-01-02, 07:22
+
edward choi 2012-01-02, 08:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB