Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request 14964: PIG-3047 Check the size of a relation before adding it to distributed cache in Replicated join


+
Aniket Mokashi 2013-10-26, 01:29
+
Cheolsoo Park 2013-10-27, 01:28
+
Aniket Mokashi 2013-10-28, 06:45
+
Cheolsoo Park 2013-10-28, 19:20
Copy link to this message
-
Re: Review Request 14964: PIG-3047 Check the size of a relation before adding it to distributed cache in Replicated join
Aniket Mokashi 2013-10-28, 22:11


> On Oct. 28, 2013, 7:20 p.m., Cheolsoo Park wrote:
> > Looks good, and TestFRJoin2 passes.
> >
> > Aniket, do you mind opening a documentation jira for this? Or you can update it when committing the patch. I think we should change the following section-
> >
> > Conditions
> > Fragment replicate joins are experimental; we don't have a strong sense of how small the small relation must be to fit into memory. In our tests with a simple query that involves just a JOIN, a relation of up to 100 M can be used if the process overall gets 1 GB of memory. Please share your observations and experience with us.

Thanks Cheolsoo, let me submit a RB update to incorporate documentation for this.
- Aniket
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14964/#review27628
-----------------------------------------------------------
On Oct. 28, 2013, 6:45 a.m., Aniket Mokashi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14964/
> -----------------------------------------------------------
>
> (Updated Oct. 28, 2013, 6:45 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
>
>
> Bugs: PIG-3047
>     https://issues.apache.org/jira/browse/PIG-3047
>
>
> Repository: pig
>
>
> Description
> -------
>
> -Check the size of a relation before adding it to distributed cache in Replicated join - 1G by default
>
>
> Diffs
> -----
>
>   trunk/conf/pig.properties 1536246
>   trunk/src/org/apache/pig/PigConfiguration.java 1536246
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputSizeReducerEstimator.java 1536246
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1536246
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 1536246
>   trunk/src/org/apache/pig/impl/util/Utils.java 1536246
>   trunk/test/org/apache/pig/test/PigStorageWithStatistics.java 1536246
>   trunk/test/org/apache/pig/test/TestFRJoin2.java 1536246
>
> Diff: https://reviews.apache.org/r/14964/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Aniket Mokashi
>
>

+
Aniket Mokashi 2013-10-28, 22:28
+
Cheolsoo Park 2013-10-28, 22:51