edward choi 2012-01-02, 05:34
Shi Yu 2012-01-02, 06:54
edward choi 2012-01-02, 07:22
Harsh J 2012-01-02, 07:22
Harsh, your comment just saved me from several wasteful hours of aimless
I added LzoCodec in core-site.xml. But I forgot to add LzopCodec.
Now it works all good. Thanks for the reply!!!
2012/1/2 Harsh J <[EMAIL PROTECTED]>
> Hello Edward,
> On Mon, Jan 2, 2012 at 11:04 AM, edward choi <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I'm having trouble trying to handle lzo compressed files.
> > The input files are compressed by LzopCodec provided by hadoop-lzo
> > And I am using Cloudera 3 update 2 version Hadoop.
> > I don't need to split the input file, so there is no need telling me to
> > index the input file and to use LzoTextInputFormat, unless that is the
> > way to handle lzo-compressed files.
> Its OK to use LZO without splitting. There are no issues in doing that.
> > I thought all I needed to do was set the job input format as
> > "TextInputFormat" and hadoop will take care of the rest.
> > When I do that, I don't get any error messages but log files tell me that
> > input files are not decompressed at all. Input files are being handled as
> > raw text files.
> By 'Input files are being handled as raw text files.' I assume you
> mean that your mappers are receiving garbage (compressed) input,
> without being decoded?
> Have you ensured that your io.compression.codecs property in
> core-site.xml carries LzoCodec and LzopCodec canonical classnames, and
> that your MR cluster was restarted with this change added?
> > Is there a specific way to read files with lzo extension?
> The above config registers ".lzo" look-outs and auto-detection of LZO
> files so you shouldn't need an explicit way.
> Harsh J