Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Review Request 14552: PIG-3480 TFile-based tmpfile compression crashes in some cases


+
Aniket Mokashi 2013-10-09, 00:43
+
Aniket Mokashi 2013-10-09, 01:02
+
Aniket Mokashi 2013-10-09, 01:32
+
Aniket Mokashi 2013-10-09, 18:00
+
Julien Le Dem 2013-10-10, 20:50
+
Aniket Mokashi 2013-10-10, 22:06
Copy link to this message
-
Re: Review Request 14552: PIG-3480 TFile-based tmpfile compression crashes in some cases


> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 259
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line259>
> >
> >     We should throw an exception if the trimmed lowercased  string is something we don't expect.
> >     otherwise typos (seqFile, sequencefile, ...) will lead to unexpected behavior.
> >    
> >     the call becomes then:
> >     ...getProperty(...PIG_TEMP_FILE_COMPRESSION_STORAGE, "tfile");
> >    
> >     example message:
> >     PigConfiguration.PIG_TEMP_FILE_COMPRESSION_STORAGE +": expected storage seqfile or tfile (default tfile)"
> >    
> >
>
> Aniket Mokashi wrote:
>     We need seqfile as an option. tfile can be used by default without any extra properties flag.
>
> Julien Le Dem wrote:
>     my point was that we should use the default when the property is not defined not when it is set to a value we don't understand. In that case it should fail.
>
> Aniket Mokashi wrote:
>     got it. So, either property should not exist or property should be set to tfile, (Instead of checking for "" string), correct?
>     Totally makes sense.

yep :)
> On Oct. 10, 2013, 8:50 p.m., Julien Le Dem wrote:
> > trunk/src/org/apache/pig/impl/util/Utils.java, line 329
> > <https://reviews.apache.org/r/14552/diff/3/?file=362607#file362607line329>
> >
> >     else throw exception?
> >     we have a default if it's not set but we don't do anything if it's an invalid compression codec
>
> Aniket Mokashi wrote:
>     Idea is to let user provide a codec through "mapred.output.compression.codec" property. In case SequenceFile (Hadoop) starts supporting more than gz, snappy, lzo and bzip2 in newer version of hadoop. Note, when using seqfile, user can completely avoid setting pig.tmpfilecompression.codec and use mapred.output.compression.codec if wanted.
>    
>     We need this because TFileStorage uses pig's implementations of gz and lzo instead of hadoop. Also, gzip codec is called gz by pig and gzip by hadoop (making hard to be consistent and backwards compatible).
>
> Julien Le Dem wrote:
>     sure. In particular my goal was to avoid doing nothing silently when there is a typo in the setting. Those are hard to debug. In the current impl not setting pig.tmpfilecompression.codec will force the DefaultCodec and setting to something we don't understand will leave the codec unchanged. hardly intuitive.
>     In general setting a property to something unknown should not fall back to a default but fail with an explicit message.
>
> Aniket Mokashi wrote:
>     Actually, thinking about it now, I think your comments make sense. I will modify the code accordingly. Thanks a lot! :)
>    
>     So, new behavior:
>     codec.equals("") -> user mean to set it through the mapred property.
>     else -> misspelling etc.
>     This is definitely more intuitive.

thanks to you!
- Julien
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14552/#review26894
-----------------------------------------------------------
On Oct. 9, 2013, 6 p.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14552/
> -----------------------------------------------------------
>
> (Updated Oct. 9, 2013, 6 p.m.)
>
>
> Review request for pig, Cheolsoo Park, Dmitriy Ryaboy, Julien Le Dem, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3480
>     https://issues.apache.org/jira/browse/PIG-3480
>
>
> Repository: pig
>
>
> Description
> -------
>
> - Added a new parameter to make SequenceFileInterStorage optional.
> - Added tests
> - Refactored apis
>
>
> Diffs
> -----
>
>   trunk/conf/pig.properties 1530468
>   trunk/src/org/apache/pig/PigConfiguration.java 1530468
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1530468
+
Aniket Mokashi 2013-10-11, 04:31
+
Julien Le Dem 2013-10-11, 03:51
+
Aniket Mokashi 2013-10-11, 20:43
+
Dmitriy Ryaboy 2013-10-11, 23:40
+
Aniket Mokashi 2013-10-12, 00:28
+
Aniket Mokashi 2013-10-12, 00:44
+
Julien Le Dem 2013-10-13, 11:40
+
Aniket Mokashi 2013-10-14, 03:21
+
Aniket Mokashi 2013-10-09, 01:10