Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request 14552: PIG-3480 TFile-based tmpfile compression crashes in some cases


+
Aniket Mokashi 2013-10-09, 00:43
+
Aniket Mokashi 2013-10-09, 01:02
+
Aniket Mokashi 2013-10-09, 01:32
+
Aniket Mokashi 2013-10-09, 18:00
+
Julien Le Dem 2013-10-10, 20:50
+
Aniket Mokashi 2013-10-10, 22:06
+
Julien Le Dem 2013-10-11, 04:57
+
Aniket Mokashi 2013-10-11, 04:31
Copy link to this message
-
Re: Review Request 14552: PIG-3480 TFile-based tmpfile compression crashes in some cases
Julien Le Dem 2013-10-11, 03:51


> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 259
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line259>
> >
> >     We should throw an exception if the trimmed lowercased  string is something we don't expect.
> >     otherwise typos (seqFile, sequencefile, ...) will lead to unexpected behavior.
> >    
> >     the call becomes then:
> >     ...getProperty(...PIG_TEMP_FILE_COMPRESSION_STORAGE, "tfile");
> >    
> >     example message:
> >     PigConfiguration.PIG_TEMP_FILE_COMPRESSION_STORAGE +": expected storage seqfile or tfile (default tfile)"
> >    
> >
>
> Aniket Mokashi wrote:
>     We need seqfile as an option. tfile can be used by default without any extra properties flag.

my point was that we should use the default when the property is not defined not when it is set to a value we don't understand. In that case it should fail.
> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 329
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line329>
> >
> >     else throw exception?
> >     we have a default if it's not set but we don't do anything if it's an invalid compression codec
>
> Aniket Mokashi wrote:
>     Idea is to let user provide a codec through "mapred.output.compression.codec" property. In case SequenceFile (Hadoop) starts supporting more than gz, snappy, lzo and bzip2 in newer version of hadoop. Note, when using seqfile, user can completely avoid setting pig.tmpfilecompression.codec and use mapred.output.compression.codec if wanted.
>    
>     We need this because TFileStorage uses pig's implementations of gz and lzo instead of hadoop. Also, gzip codec is called gz by pig and gzip by hadoop (making hard to be consistent and backwards compatible).

sure. In particular my goal was to avoid doing nothing silently when there is a typo in the setting. Those are hard to debug. In the current impl not setting pig.tmpfilecompression.codec will force the DefaultCodec and setting to something we don't understand will leave the codec unchanged. hardly intuitive.
In general setting a property to something unknown should not fall back to a default but fail with an explicit message.
- Julien
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14552/#review26894
-----------------------------------------------------------
On Oct. 9, 2013, 6 p.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14552/
> -----------------------------------------------------------
>
> (Updated Oct. 9, 2013, 6 p.m.)
>
>
> Review request for pig, Cheolsoo Park, Dmitriy Ryaboy, Julien Le Dem, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3480
>     https://issues.apache.org/jira/browse/PIG-3480
>
>
> Repository: pig
>
>
> Description
> -------
>
> - Added a new parameter to make SequenceFileInterStorage optional.
> - Added tests
> - Refactored apis
>
>
> Diffs
> -----
>
>   trunk/conf/pig.properties 1530468
>   trunk/src/org/apache/pig/PigConfiguration.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartitionRearrange.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 1530468
>   trunk/src/org/apache/pig/impl/io/InterStorage.java 1530468
>   trunk/src/org/apache/pig/impl/io/SequenceFileInterStorage.java PRE-CREATION
>   trunk/src/org/apache/pig/impl/io/TFileStorage.java 1530468
>   trunk/src/org/apache/pig/impl/util/Utils.java 1530468
+
Aniket Mokashi 2013-10-11, 20:43
+
Dmitriy Ryaboy 2013-10-11, 23:40
+
Aniket Mokashi 2013-10-12, 00:28
+
Aniket Mokashi 2013-10-12, 00:44
+
Julien Le Dem 2013-10-13, 11:40
+
Aniket Mokashi 2013-10-14, 03:21
+
Aniket Mokashi 2013-10-09, 01:10