Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to read LZO compressed files?


Copy link to this message
-
Re: How to read LZO compressed files?
Harsh, your comment just saved me from several wasteful hours of aimless
labor.
I added LzoCodec in core-site.xml. But I forgot to add LzopCodec.
Now it works all good. Thanks for the reply!!!

Regards,
Ed

2012/1/2 Harsh J <[EMAIL PROTECTED]>

> Hello Edward,
>
> On Mon, Jan 2, 2012 at 11:04 AM, edward choi <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I'm having trouble trying to handle lzo compressed files.
> > The input files are compressed by LzopCodec provided by hadoop-lzo
> package.
> > And I am using Cloudera 3 update 2 version Hadoop.
> >
> > I don't need to split the input file, so there is no need telling me to
> > index the input file and to use LzoTextInputFormat, unless that is the
> only
> > way to handle lzo-compressed files.
>
> Its OK to use LZO without splitting. There are no issues in doing that.
>
> > I thought all I needed to do was set the job input format as
> > "TextInputFormat" and hadoop will take care of the rest.
> > When I do that, I don't get any error messages but log files tell me that
> > input files are not decompressed at all. Input files are being handled as
> > raw text files.
>
> By 'Input files are being handled as raw text files.' I assume you
> mean that your mappers are receiving garbage (compressed) input,
> without being decoded?
>
> Have you ensured that your io.compression.codecs property in
> core-site.xml carries LzoCodec and LzopCodec canonical classnames, and
> that your MR cluster was restarted with this change added?
>
> > Is there a specific way to read files with lzo extension?
>
> The above config registers ".lzo" look-outs and auto-detection of LZO
> files so you shouldn't need an explicit way.
>
> --
> Harsh J
>