Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> LZO with sequenceFile


Copy link to this message
-
Re: LZO with sequenceFile
On Sun, Feb 26, 2012 at 9:09 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> If you want to just quickly package the hadoop-lzo items instead of
> building/managing-deployment on your own, you can reuse Todd Lipcon's
> script at https://github.com/toddlipcon/hadoop-lzo-packager - Creates
> both RPMs and DEBs.
>

Thanks! Some questions I have is:
1. Would it work with sequence files? I am using
SequenceFileAsTextInputStream
2. If I use SequenceFile.CompressionType.RECORD or BLOCK would it still
split the files?
3. I am also using CDH's 20.2 version of hadoop.
>
> On Sun, Feb 26, 2012 at 9:55 PM, Ioan Eugen Stan <[EMAIL PROTECTED]>
> wrote:
> > 2012/2/26 Mohit Anchlia <[EMAIL PROTECTED]>:
> >> Thanks. Does it mean LZO is not installed by default? How can I install
> LZO?
> >
> > The LZO library is released under GPL and I believe it can't be
> > included in most distributions of Hadoop because of this (can't mix
> > GPL with non GPL stuff). It should be easily available though.
> >
> >> On Sat, Feb 25, 2012 at 6:27 PM, Shi Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> Yes, it is supported by Hadoop sequence file. It is splittable
> >>> by default. If you have installed and specified LZO correctly,
> >>> use these:
> >>>
> >>>
> >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma
> >>> t.setCompressOutput(job,true);
> >>>
> >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma
> >>> t.setOutputCompressorClass(job,com.hadoop.compression.lzo.LzoC
> >>> odec.class);
> >>>
> >>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma
> >>> t.setOutputCompressionType(job,
> >>> SequenceFile.CompressionType.BLOCK);
> >>>
> >>> job.setOutputFormatClass(org.apache.hadoop.mapreduce.lib.outpu
> >>> t.SequenceFileOutputFormat.class);
> >>>
> >>>
> >>> Shi
> >>>
> >
> >
> >
> > --
> > Ioan Eugen Stan
> > http://ieugen.blogspot.com/
>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB