Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request 14964: PIG-3047 Check the size of a relation before adding it to distributed cache in Replicated join


Copy link to this message
-
Re: Review Request 14964: PIG-3047 Check the size of a relation before adding it to distributed cache in Replicated join

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14964/#review27628
-----------------------------------------------------------

Ship it!
Looks good, and TestFRJoin2 passes.

Aniket, do you mind opening a documentation jira for this? Or you can update it when committing the patch. I think we should change the following section-

Conditions
Fragment replicate joins are experimental; we don't have a strong sense of how small the small relation must be to fit into memory. In our tests with a simple query that involves just a JOIN, a relation of up to 100 M can be used if the process overall gets 1 GB of memory. Please share your observations and experience with us.

- Cheolsoo Park
On Oct. 28, 2013, 6:45 a.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14964/
> -----------------------------------------------------------
>
> (Updated Oct. 28, 2013, 6:45 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
>
>
> Bugs: PIG-3047
>     https://issues.apache.org/jira/browse/PIG-3047
>
>
> Repository: pig
>
>
> Description
> -------
>
> -Check the size of a relation before adding it to distributed cache in Replicated join - 1G by default
>
>
> Diffs
> -----
>
>   trunk/conf/pig.properties 1536246
>   trunk/src/org/apache/pig/PigConfiguration.java 1536246
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputSizeReducerEstimator.java 1536246
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1536246
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 1536246
>   trunk/src/org/apache/pig/impl/util/Utils.java 1536246
>   trunk/test/org/apache/pig/test/PigStorageWithStatistics.java 1536246
>   trunk/test/org/apache/pig/test/TestFRJoin2.java 1536246
>
> Diff: https://reviews.apache.org/r/14964/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Aniket Mokashi
>
>