|
|
-
Snappy compression with pig
Mohit Anchlia 2012-04-26, 19:32
I am able to write with Snappy compression. But I don't think pig provides anything to read such records. Can someone suggest or point me to relevant code that might help me write LoadFunc for it?
+
Mohit Anchlia 2012-04-26, 19:32
-
Re: Snappy compression with pig
Mohit Anchlia 2012-04-26, 19:40
I think I need to write both store and load functions. It appears that only intermediate output that is stored on temp location can be compressed using:
SET mapred.compress.map.output true;
SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec;
Any pointers as to how I can store and load using snappy would be helpful. On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> I am able to write with Snappy compression. But I don't think pig > provides anything to read such records. Can someone suggest or point me to > relevant code that might help me write LoadFunc for it?
+
Mohit Anchlia 2012-04-26, 19:40
-
Re: Snappy compression with pig
Prashant Kommireddi 2012-04-29, 18:57
Have you tried setting output compression to Snappy for Store?
grunt> set output.compression.enabled true; grunt> set output.compression.codec org.apache.hadoop.io.compress.SnappyCodec;
You should be able to read and write Snappy compressed files with PigStorage which uses Hadoop TextInputFormat internally.
Thanks, Prashant On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> I think I need to write both store and load functions. It appears that only > intermediate output that is stored on temp location can be compressed > using: > > SET mapred.compress.map.output true; > > SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec; > > > > Any pointers as to how I can store and load using snappy would be helpful. > On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > I am able to write with Snappy compression. But I don't think pig > > provides anything to read such records. Can someone suggest or point me > to > > relevant code that might help me write LoadFunc for it? >
+
Prashant Kommireddi 2012-04-29, 18:57
-
Re: Snappy compression with pig
Mohit Anchlia 2012-04-29, 20:06
I tried these and didn't work with STORE? Is this different than the one you mentioned?
SET mapred.compress.map.output true;
SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec; On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> Have you tried setting output compression to Snappy for Store? > > grunt> set output.compression.enabled true; > grunt> set output.compression.codec > org.apache.hadoop.io.compress.SnappyCodec; > > You should be able to read and write Snappy compressed files with > PigStorage which uses Hadoop TextInputFormat internally. > > Thanks, > Prashant > > > On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > I think I need to write both store and load functions. It appears that > only > > intermediate output that is stored on temp location can be compressed > > using: > > > > SET mapred.compress.map.output true; > > > > SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec; > > > > > > > > Any pointers as to how I can store and load using snappy would be > helpful. > > On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <[EMAIL PROTECTED] > > >wrote: > > > > > I am able to write with Snappy compression. But I don't think pig > > > provides anything to read such records. Can someone suggest or point me > > to > > > relevant code that might help me write LoadFunc for it? > > >
+
Mohit Anchlia 2012-04-29, 20:06
-
Re: Snappy compression with pig
Prashant Kommireddi 2012-04-29, 20:12
The ones you mentioned are for map output compression, not job output.
On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> I tried these and didn't work with STORE? Is this different than the one > you mentioned? > > SET mapred.compress.map.output true; > > SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec; > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi > <[EMAIL PROTECTED]>wrote: > >> Have you tried setting output compression to Snappy for Store? >> >> grunt> set output.compression.enabled true; >> grunt> set output.compression.codec >> org.apache.hadoop.io.compress.SnappyCodec; >> >> You should be able to read and write Snappy compressed files with >> PigStorage which uses Hadoop TextInputFormat internally. >> >> Thanks, >> Prashant >> >> >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <[EMAIL PROTECTED] >>> wrote: >> >>> I think I need to write both store and load functions. It appears that >> only >>> intermediate output that is stored on temp location can be compressed >>> using: >>> >>> SET mapred.compress.map.output true; >>> >>> SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec; >>> >>> >>> >>> Any pointers as to how I can store and load using snappy would be >> helpful. >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia <[EMAIL PROTECTED] >>>> wrote: >>> >>>> I am able to write with Snappy compression. But I don't think pig >>>> provides anything to read such records. Can someone suggest or point me >>> to >>>> relevant code that might help me write LoadFunc for it? >>> >>
+
Prashant Kommireddi 2012-04-29, 20:12
-
Re: Snappy compression with pig
Mohit Anchlia 2012-04-29, 20:41
Thanks! Is this compressing everyline or in blocks? Is it possible to set it to compress per block?
On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> The ones you mentioned are for map output compression, not job output. > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > > > I tried these and didn't work with STORE? Is this different than the one > > you mentioned? > > > > SET mapred.compress.map.output true; > > > > SET mapred.output.compression org.apache.hadoop.io.compress.SnappyCodec; > > > > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi > > <[EMAIL PROTECTED]>wrote: > > > >> Have you tried setting output compression to Snappy for Store? > >> > >> grunt> set output.compression.enabled true; > >> grunt> set output.compression.codec > >> org.apache.hadoop.io.compress.SnappyCodec; > >> > >> You should be able to read and write Snappy compressed files with > >> PigStorage which uses Hadoop TextInputFormat internally. > >> > >> Thanks, > >> Prashant > >> > >> > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia <[EMAIL PROTECTED] > >>> wrote: > >> > >>> I think I need to write both store and load functions. It appears that > >> only > >>> intermediate output that is stored on temp location can be compressed > >>> using: > >>> > >>> SET mapred.compress.map.output true; > >>> > >>> SET mapred.output.compression > org.apache.hadoop.io.compress.SnappyCodec; > >>> > >>> > >>> > >>> Any pointers as to how I can store and load using snappy would be > >> helpful. > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia < > [EMAIL PROTECTED] > >>>> wrote: > >>> > >>>> I am able to write with Snappy compression. But I don't think pig > >>>> provides anything to read such records. Can someone suggest or point > me > >>> to > >>>> relevant code that might help me write LoadFunc for it? > >>> > >> >
+
Mohit Anchlia 2012-04-29, 20:41
-
Re: Snappy compression with pig
Prashant Kommireddi 2012-04-30, 02:33
By blocks do you mean you would be using Snappy to write SequeneFile? Yes, you can do that by setting compression at BLOCK level for the sequence file.
On Sun, Apr 29, 2012 at 1:41 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> Thanks! Is this compressing everyline or in blocks? Is it possible to set > it to compress per block? > > On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi <[EMAIL PROTECTED] > >wrote: > > > The ones you mentioned are for map output compression, not job output. > > > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > > > > I tried these and didn't work with STORE? Is this different than the > one > > > you mentioned? > > > > > > SET mapred.compress.map.output true; > > > > > > SET mapred.output.compression > org.apache.hadoop.io.compress.SnappyCodec; > > > > > > > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi > > > <[EMAIL PROTECTED]>wrote: > > > > > >> Have you tried setting output compression to Snappy for Store? > > >> > > >> grunt> set output.compression.enabled true; > > >> grunt> set output.compression.codec > > >> org.apache.hadoop.io.compress.SnappyCodec; > > >> > > >> You should be able to read and write Snappy compressed files with > > >> PigStorage which uses Hadoop TextInputFormat internally. > > >> > > >> Thanks, > > >> Prashant > > >> > > >> > > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia < > [EMAIL PROTECTED] > > >>> wrote: > > >> > > >>> I think I need to write both store and load functions. It appears > that > > >> only > > >>> intermediate output that is stored on temp location can be compressed > > >>> using: > > >>> > > >>> SET mapred.compress.map.output true; > > >>> > > >>> SET mapred.output.compression > > org.apache.hadoop.io.compress.SnappyCodec; > > >>> > > >>> > > >>> > > >>> Any pointers as to how I can store and load using snappy would be > > >> helpful. > > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia < > > [EMAIL PROTECTED] > > >>>> wrote: > > >>> > > >>>> I am able to write with Snappy compression. But I don't think pig > > >>>> provides anything to read such records. Can someone suggest or point > > me > > >>> to > > >>>> relevant code that might help me write LoadFunc for it? > > >>> > > >> > > >
+
Prashant Kommireddi 2012-04-30, 02:33
-
Re: Snappy compression with pig
Mohit Anchlia 2012-04-30, 23:15
Thanks! It worked just fine. But now my question is when compressing a text file is it compressed line by line or the entire file is compressed as one?
On Sun, Apr 29, 2012 at 7:33 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> By blocks do you mean you would be using Snappy to write SequeneFile? Yes, > you can do that by setting compression at BLOCK level for the sequence > file. > > On Sun, Apr 29, 2012 at 1:41 PM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > Thanks! Is this compressing everyline or in blocks? Is it possible to set > > it to compress per block? > > > > On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi < > [EMAIL PROTECTED] > > >wrote: > > > > > The ones you mentioned are for map output compression, not job output. > > > > > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]> > > wrote: > > > > > > > I tried these and didn't work with STORE? Is this different than the > > one > > > > you mentioned? > > > > > > > > SET mapred.compress.map.output true; > > > > > > > > SET mapred.output.compression > > org.apache.hadoop.io.compress.SnappyCodec; > > > > > > > > > > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi > > > > <[EMAIL PROTECTED]>wrote: > > > > > > > >> Have you tried setting output compression to Snappy for Store? > > > >> > > > >> grunt> set output.compression.enabled true; > > > >> grunt> set output.compression.codec > > > >> org.apache.hadoop.io.compress.SnappyCodec; > > > >> > > > >> You should be able to read and write Snappy compressed files with > > > >> PigStorage which uses Hadoop TextInputFormat internally. > > > >> > > > >> Thanks, > > > >> Prashant > > > >> > > > >> > > > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia < > > [EMAIL PROTECTED] > > > >>> wrote: > > > >> > > > >>> I think I need to write both store and load functions. It appears > > that > > > >> only > > > >>> intermediate output that is stored on temp location can be > compressed > > > >>> using: > > > >>> > > > >>> SET mapred.compress.map.output true; > > > >>> > > > >>> SET mapred.output.compression > > > org.apache.hadoop.io.compress.SnappyCodec; > > > >>> > > > >>> > > > >>> > > > >>> Any pointers as to how I can store and load using snappy would be > > > >> helpful. > > > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia < > > > [EMAIL PROTECTED] > > > >>>> wrote: > > > >>> > > > >>>> I am able to write with Snappy compression. But I don't think pig > > > >>>> provides anything to read such records. Can someone suggest or > point > > > me > > > >>> to > > > >>>> relevant code that might help me write LoadFunc for it? > > > >>> > > > >> > > > > > >
+
Mohit Anchlia 2012-04-30, 23:15
-
Re: Snappy compression with pig
Prashant Kommireddi 2012-05-01, 00:38
Line
On Mon, Apr 30, 2012 at 4:15 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> Thanks! It worked just fine. But now my question is when compressing a text > file is it compressed line by line or the entire file is compressed as one? > > On Sun, Apr 29, 2012 at 7:33 PM, Prashant Kommireddi <[EMAIL PROTECTED] > >wrote: > > > By blocks do you mean you would be using Snappy to write SequeneFile? > Yes, > > you can do that by setting compression at BLOCK level for the sequence > > file. > > > > On Sun, Apr 29, 2012 at 1:41 PM, Mohit Anchlia <[EMAIL PROTECTED] > > >wrote: > > > > > Thanks! Is this compressing everyline or in blocks? Is it possible to > set > > > it to compress per block? > > > > > > On Sun, Apr 29, 2012 at 1:12 PM, Prashant Kommireddi < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > The ones you mentioned are for map output compression, not job > output. > > > > > > > > On Apr 29, 2012, at 1:07 PM, Mohit Anchlia <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > I tried these and didn't work with STORE? Is this different than > the > > > one > > > > > you mentioned? > > > > > > > > > > SET mapred.compress.map.output true; > > > > > > > > > > SET mapred.output.compression > > > org.apache.hadoop.io.compress.SnappyCodec; > > > > > > > > > > > > > > > On Sun, Apr 29, 2012 at 11:57 AM, Prashant Kommireddi > > > > > <[EMAIL PROTECTED]>wrote: > > > > > > > > > >> Have you tried setting output compression to Snappy for Store? > > > > >> > > > > >> grunt> set output.compression.enabled true; > > > > >> grunt> set output.compression.codec > > > > >> org.apache.hadoop.io.compress.SnappyCodec; > > > > >> > > > > >> You should be able to read and write Snappy compressed files with > > > > >> PigStorage which uses Hadoop TextInputFormat internally. > > > > >> > > > > >> Thanks, > > > > >> Prashant > > > > >> > > > > >> > > > > >> On Thu, Apr 26, 2012 at 12:40 PM, Mohit Anchlia < > > > [EMAIL PROTECTED] > > > > >>> wrote: > > > > >> > > > > >>> I think I need to write both store and load functions. It appears > > > that > > > > >> only > > > > >>> intermediate output that is stored on temp location can be > > compressed > > > > >>> using: > > > > >>> > > > > >>> SET mapred.compress.map.output true; > > > > >>> > > > > >>> SET mapred.output.compression > > > > org.apache.hadoop.io.compress.SnappyCodec; > > > > >>> > > > > >>> > > > > >>> > > > > >>> Any pointers as to how I can store and load using snappy would be > > > > >> helpful. > > > > >>> On Thu, Apr 26, 2012 at 12:32 PM, Mohit Anchlia < > > > > [EMAIL PROTECTED] > > > > >>>> wrote: > > > > >>> > > > > >>>> I am able to write with Snappy compression. But I don't think > pig > > > > >>>> provides anything to read such records. Can someone suggest or > > point > > > > me > > > > >>> to > > > > >>>> relevant code that might help me write LoadFunc for it? > > > > >>> > > > > >> > > > > > > > > > >
+
Prashant Kommireddi 2012-05-01, 00:38
|
|