Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs


Copy link to this message
-
Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs


> On Jan. 20, 2014, 10:25 a.m., Cheolsoo Park wrote:
> > This is great work. Thank you so much!
> >
> > I have two comments-
> >
> > 1) It doesn't seem to work for a map-only job. For eg, I tried to run load and dump in grunt as follows-
> >
> > x = load '/user/cheolsoop/foo';
> > dump x;
> >
> > This job doesn't get converted to local mode because no of reducers are 21, which doesn't make sense. See log output below-
> >
> > 2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Size of input: 8 bytes.
> > 2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - No of reducers: 21
> > 2014-01-20 10:05:30,578 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
> >
> > 2) The changes in PigStats and PigStatsUtil might break backward compatibility. Perhaps we could avoid them if they're not necessary. Thoughts?
> >

1) I tested load-dump on my side, and it got auto converted to local-mode. Digging deeper, I found that reducer estimation happens before okToRunLocal call. But, for map only job, we do not set num reducers to zero until later. So, I moved that code up. That should take care of map-only jobs.

2) Makes sense. Reverted.
- Aniket
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32286
-----------------------------------------------------------
On Jan. 21, 2014, 2:24 a.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
>
> (Updated Jan. 21, 2014, 2:24 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
>
>
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
>
>
> Repository: pig
>
>
> Description
> -------
>
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
>
>
> Diffs
> -----
>
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/16928/diff/
>
>
> Testing
> -------
>
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
>
>
> Thanks,
>
> Aniket Mokashi
>
>