Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Snappy compression with pig


Copy link to this message
-
Re: Snappy compression with pig
Prashant Kommireddi 2012-05-01, 00:38
Line

On Mon, Apr 30, 2012 at 4:15 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Thanks! It worked just fine. But now my question is when compressing a text
> file is it compressed line by line or the entire file is compressed as one?
>
> On Sun, Apr 29, 2012 at 7:33 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > By blocks do you mean you would be using Snappy to write SequeneFile?
> Yes,
> > you can do that by setting compression at BLOCK level for the sequence
> > file.
> >
> > On Sun, Apr 29, 2012 at 1:41 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks! Is this compressing everyline or in blocks? Is it possible to
> set
> > > it to compress per block?
> > >
> > > On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > The ones you mentioned are for map output compression, not job
> output.
> > > >
> > > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > I tried these and didn't work with STORE? Is this different than
> the
> > > one
> > > > > you mentioned?
> > > > >
> > > > > SET mapred.compress.map.output true;
> > > > >
> > > > > SET mapred.output.compression
> > > org.apache.hadoop.io.compress.SnappyCodec;
> > > > >
> > > > >
> > > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi
> > > > > <[EMAIL PROTECTED]>wrote:
> > > > >
> > > > >> Have you tried setting output compression to Snappy for Store?
> > > > >>
> > > > >> grunt> set output.compression.enabled true;
> > > > >> grunt> set output.compression.codec
> > > > >> org.apache.hadoop.io.compress.SnappyCodec;
> > > > >>
> > > > >> You should be able to read and write Snappy compressed files with
> > > > >> PigStorage which uses Hadoop TextInputFormat internally.
> > > > >>
> > > > >> Thanks,
> > > > >> Prashant
> > > > >>
> > > > >>
> > > > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <
> > > [EMAIL PROTECTED]
> > > > >>> wrote:
> > > > >>
> > > > >>> I think I need to write both store and load functions. It appears
> > > that
> > > > >> only
> > > > >>> intermediate output that is stored on temp location can be
> > compressed
> > > > >>> using:
> > > > >>>
> > > > >>> SET mapred.compress.map.output true;
> > > > >>>
> > > > >>> SET mapred.output.compression
> > > > org.apache.hadoop.io.compress.SnappyCodec;
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Any pointers as to how I can store and load using snappy would be
> > > > >> helpful.
> > > > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <
> > > > [EMAIL PROTECTED]
> > > > >>>> wrote:
> > > > >>>
> > > > >>>> I am able to write with Snappy  compression. But I don't think
> pig
> > > > >>>> provides anything to read such records. Can someone suggest or
> > point
> > > > me
> > > > >>> to
> > > > >>>> relevant code that might help me write LoadFunc for it?
> > > > >>>
> > > > >>
> > > >
> > >
> >
>