Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Snappy compression with pig


+
Mohit Anchlia 2012-04-26, 19:32
+
Mohit Anchlia 2012-04-26, 19:40
+
Prashant Kommireddi 2012-04-29, 18:57
+
Mohit Anchlia 2012-04-29, 20:06
+
Prashant Kommireddi 2012-04-29, 20:12
+
Mohit Anchlia 2012-04-29, 20:41
+
Prashant Kommireddi 2012-04-30, 02:33
+
Mohit Anchlia 2012-04-30, 23:15
Copy link to this message
-
Re: Snappy compression with pig
Line

On Mon, Apr 30, 2012 at 4:15 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Thanks! It worked just fine. But now my question is when compressing a text
> file is it compressed line by line or the entire file is compressed as one?
>
> On Sun, Apr 29, 2012 at 7:33 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > By blocks do you mean you would be using Snappy to write SequeneFile?
> Yes,
> > you can do that by setting compression at BLOCK level for the sequence
> > file.
> >
> > On Sun, Apr 29, 2012 at 1:41 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks! Is this compressing everyline or in blocks? Is it possible to
> set
> > > it to compress per block?
> > >
> > > On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > The ones you mentioned are for map output compression, not job
> output.
> > > >
> > > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > I tried these and didn't work with STORE? Is this different than
> the
> > > one
> > > > > you mentioned?
> > > > >
> > > > > SET mapred.compress.map.output true;
> > > > >
> > > > > SET mapred.output.compression
> > > org.apache.hadoop.io.compress.SnappyCodec;
> > > > >
> > > > >
> > > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi
> > > > > <[EMAIL PROTECTED]>wrote:
> > > > >
> > > > >> Have you tried setting output compression to Snappy for Store?
> > > > >>
> > > > >> grunt> set output.compression.enabled true;
> > > > >> grunt> set output.compression.codec
> > > > >> org.apache.hadoop.io.compress.SnappyCodec;
> > > > >>
> > > > >> You should be able to read and write Snappy compressed files with
> > > > >> PigStorage which uses Hadoop TextInputFormat internally.
> > > > >>
> > > > >> Thanks,
> > > > >> Prashant
> > > > >>
> > > > >>
> > > > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <
> > > [EMAIL PROTECTED]
> > > > >>> wrote:
> > > > >>
> > > > >>> I think I need to write both store and load functions. It appears
> > > that
> > > > >> only
> > > > >>> intermediate output that is stored on temp location can be
> > compressed
> > > > >>> using:
> > > > >>>
> > > > >>> SET mapred.compress.map.output true;
> > > > >>>
> > > > >>> SET mapred.output.compression
> > > > org.apache.hadoop.io.compress.SnappyCodec;
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Any pointers as to how I can store and load using snappy would be
> > > > >> helpful.
> > > > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <
> > > > [EMAIL PROTECTED]
> > > > >>>> wrote:
> > > > >>>
> > > > >>>> I am able to write with Snappy  compression. But I don't think
> pig
> > > > >>>> provides anything to read such records. Can someone suggest or
> > point
> > > > me
> > > > >>> to
> > > > >>>> relevant code that might help me write LoadFunc for it?
> > > > >>>
> > > > >>
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB