Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Snappy compression with pig


Copy link to this message
-
Re: Snappy compression with pig
Thanks! It worked just fine. But now my question is when compressing a text
file is it compressed line by line or the entire file is compressed as one?

On Sun, Apr 29, 2012 at 7:33 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> By blocks do you mean you would be using Snappy to write SequeneFile? Yes,
> you can do that by setting compression at BLOCK level for the sequence
> file.
>
> On Sun, Apr 29, 2012 at 1:41 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
>
> > Thanks! Is this compressing everyline or in blocks? Is it possible to set
> > it to compress per block?
> >
> > On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > The ones you mentioned are for map output compression, not job output.
> > >
> > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > I tried these and didn't work with STORE? Is this different than the
> > one
> > > > you mentioned?
> > > >
> > > > SET mapred.compress.map.output true;
> > > >
> > > > SET mapred.output.compression
> > org.apache.hadoop.io.compress.SnappyCodec;
> > > >
> > > >
> > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > >> Have you tried setting output compression to Snappy for Store?
> > > >>
> > > >> grunt> set output.compression.enabled true;
> > > >> grunt> set output.compression.codec
> > > >> org.apache.hadoop.io.compress.SnappyCodec;
> > > >>
> > > >> You should be able to read and write Snappy compressed files with
> > > >> PigStorage which uses Hadoop TextInputFormat internally.
> > > >>
> > > >> Thanks,
> > > >> Prashant
> > > >>
> > > >>
> > > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <
> > [EMAIL PROTECTED]
> > > >>> wrote:
> > > >>
> > > >>> I think I need to write both store and load functions. It appears
> > that
> > > >> only
> > > >>> intermediate output that is stored on temp location can be
> compressed
> > > >>> using:
> > > >>>
> > > >>> SET mapred.compress.map.output true;
> > > >>>
> > > >>> SET mapred.output.compression
> > > org.apache.hadoop.io.compress.SnappyCodec;
> > > >>>
> > > >>>
> > > >>>
> > > >>> Any pointers as to how I can store and load using snappy would be
> > > >> helpful.
> > > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <
> > > [EMAIL PROTECTED]
> > > >>>> wrote:
> > > >>>
> > > >>>> I am able to write with Snappy  compression. But I don't think pig
> > > >>>> provides anything to read such records. Can someone suggest or
> point
> > > me
> > > >>> to
> > > >>>> relevant code that might help me write LoadFunc for it?
> > > >>>
> > > >>
> > >
> >
>