Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 14552: PIG-3480 TFile-based tmpfile compression crashes in some cases


Copy link to this message
-
Re: Review Request 14552: PIG-3480 TFile-based tmpfile compression crashes in some cases


> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 259
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line259>
> >
> >     We should throw an exception if the trimmed lowercased  string is something we don't expect.
> >     otherwise typos (seqFile, sequencefile, ...) will lead to unexpected behavior.
> >    
> >     the call becomes then:
> >     ...getProperty(...PIG_TEMP_FILE_COMPRESSION_STORAGE, "tfile");
> >    
> >     example message:
> >     PigConfiguration.PIG_TEMP_FILE_COMPRESSION_STORAGE +": expected storage seqfile or tfile (default tfile)"
> >    
> >

We need seqfile as an option. tfile can be used by default without any extra properties flag.
> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 329
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line329>
> >
> >     else throw exception?
> >     we have a default if it's not set but we don't do anything if it's an invalid compression codec

Idea is to let user provide a codec through "mapred.output.compression.codec" property. In case SequenceFile (Hadoop) starts supporting more than gz, snappy, lzo and bzip2 in newer version of hadoop. Note, when using seqfile, user can completely avoid setting pig.tmpfilecompression.codec and use mapred.output.compression.codec if wanted.

We need this because TFileStorage uses pig's implementations of gz and lzo instead of hadoop. Also, gzip codec is called gz by pig and gzip by hadoop (making hard to be consistent and backwards compatible).
> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java, line 159
> > <https://reviews.apache.org/r/14552/diff/3/?file=362603#file362603line159>
> >
> >     can we fix indentation and whitespace at the same time?
> >     - curly at end of line
> >     - spaces around operators
> >     - whitespace at end of line

I tried :), missed a few. Will do. I guess there is an automated way for this.
> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 272
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line272>
> >
> >     duplicate logic?

Will fix it with a new Enum thingy.
- Aniket
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14552/#review26894
-----------------------------------------------------------
On Oct. 9, 2013, 6 p.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14552/
> -----------------------------------------------------------
>
> (Updated Oct. 9, 2013, 6 p.m.)
>
>
> Review request for pig, Cheolsoo Park, Dmitriy Ryaboy, Julien Le Dem, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3480
>     https://issues.apache.org/jira/browse/PIG-3480
>
>
> Repository: pig
>
>
> Description
> -------
>
> - Added a new parameter to make SequenceFileInterStorage optional.
> - Added tests
> - Refactored apis
>
>
> Diffs
> -----
>
>   trunk/conf/pig.properties 1530468
>   trunk/src/org/apache/pig/PigConfiguration.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartitionRearrange.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 1530468
>   trunk/src/org/apache/pig/impl/io/InterStorage.java 1530468
>   trunk/src/org/apache/pig/impl/io/SequenceFileInterStorage.java PRE-CREATION
>   trunk/src/org/apache/pig/impl/io/TFileStorage.java 1530468