Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs


Copy link to this message
-
Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32286
-----------------------------------------------------------
This is great work. Thank you so much!

I have two comments-

1) It doesn't seem to work for a map-only job. For eg, I tried to run load and dump in grunt as follows-

x = load '/user/cheolsoop/foo';
dump x;

This job doesn't get converted to local mode because no of reducers are 21, which doesn't make sense. See log output below-

2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Size of input: 8 bytes.
2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - No of reducers: 21
2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process

2) The changes in PigStats and PigStatsUtil might break backward compatibility. Perhaps we could avoid them if they're not necessary. Thoughts?

trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
<https://reviews.apache.org/r/16928/#comment61021>

    Do you mind replacing these with static variables too?

trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/16928/#comment61022>

    I think the pseudo distributed mode means single-node and multi-processes. But you mean the local mode (multi-threads) here, don't you?

trunk/src/org/apache/pig/tools/pigstats/PigStats.java
<https://reviews.apache.org/r/16928/#comment61027>

    I like removing this from PigStats.
    
    But I am a bit worried that this might break backward compatibility with downstream applications since it is public.

trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java
<https://reviews.apache.org/r/16928/#comment61023>

    Update the comment to reflect the change.

trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java
<https://reviews.apache.org/r/16928/#comment61024>

    Update the comment to reflect the change.
- Cheolsoo Park
On Jan. 16, 2014, 10:04 p.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
>
> (Updated Jan. 16, 2014, 10:04 p.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
>
>
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
>
>
> Repository: pig
>
>
> Description
> -------
>
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
>
>
> Diffs
> -----
>
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572
>   trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572
>   trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572