Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs


Copy link to this message
-
Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs
Daniel Dai 2014-01-21, 05:05

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32339
-----------------------------------------------------------
Looks good. We also need to add the configuration to conf/pig.properties comments (#pig.auto.local.enabled=true, #pig.auto.local.input.maxbytes=100000000), so user know this configuration.

This also reminds me we should read/write hdfs files in local mode, but that's a different issue.

- Daniel Dai
On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
>
> (Updated Jan. 21, 2014, 2:52 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
>
>
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
>
>
> Repository: pig
>
>
> Description
> -------
>
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced to run in local mode. Output of local run is also written to hdfs.
>
>
> Diffs
> -----
>
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java 1558572
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java 1558572
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/16928/diff/
>
>
> Testing
> -------
>
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage in MR mode.
>
>
> Thanks,
>
> Aniket Mokashi
>
>