Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to read LZO compressed files?


Copy link to this message
-
Re: How to read LZO compressed files?
Harsh, your comment just saved me from several wasteful hours of aimless
labor.
I added LzoCodec in core-site.xml. But I forgot to add LzopCodec.
Now it works all good. Thanks for the reply!!!

Regards,
Ed

2012/1/2 Harsh J <[EMAIL PROTECTED]>

> Hello Edward,
>
> On Mon, Jan 2, 2012 at 11:04 AM, edward choi <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I'm having trouble trying to handle lzo compressed files.
> > The input files are compressed by LzopCodec provided by hadoop-lzo
> package.
> > And I am using Cloudera 3 update 2 version Hadoop.
> >
> > I don't need to split the input file, so there is no need telling me to
> > index the input file and to use LzoTextInputFormat, unless that is the
> only
> > way to handle lzo-compressed files.
>
> Its OK to use LZO without splitting. There are no issues in doing that.
>
> > I thought all I needed to do was set the job input format as
> > "TextInputFormat" and hadoop will take care of the rest.
> > When I do that, I don't get any error messages but log files tell me that
> > input files are not decompressed at all. Input files are being handled as
> > raw text files.
>
> By 'Input files are being handled as raw text files.' I assume you
> mean that your mappers are receiving garbage (compressed) input,
> without being decoded?
>
> Have you ensured that your io.compression.codecs property in
> core-site.xml carries LzoCodec and LzopCodec canonical classnames, and
> that your MR cluster was restarted with this change added?
>
> > Is there a specific way to read files with lzo extension?
>
> The above config registers ".lzo" look-outs and auto-detection of LZO
> files so you shouldn't need an explicit way.
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB