Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez


+
Alex Bain 2013-10-24, 01:42
Copy link to this message
-
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Cheolsoo Park 2013-10-24, 05:31

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/#review27443
-----------------------------------------------------------
Looks great! Thanks a lot!

In terms of tests, can you please add a test case to TestTezCompiler? Please look at existing test cases and TEZCx.gld files. With that, I think we can commit this patch. Later we should add more tests using either mini cluster or e2e.

I also made few minor comments below.
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53329>

    Please remove all the trailing white spaces in the patch.

src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53333>

    Can you mark this comment as TODO?

src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53335>

    This is not your fault but mine. I realized that blocking(op) has no need to take an argument. Can you please change it to blocking()? It just confuses me when reading the code.

src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53334>

    In the same line.

src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53330>

    Do you mind using Lists.newArrayList() instead of new ArrayList<PhysicalPlan>()? Just for cleaner code.
    
    There are several places.

src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53331>

    Why is this method public? Can't we keep it private?
    
    Also please put the method signature and curly brace in a single line.
- Cheolsoo Park
On Oct. 24, 2013, 1:42 a.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14897/
> -----------------------------------------------------------
>
> (Updated Oct. 24, 2013, 1:42 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3538
>     https://issues.apache.org/jira/browse/PIG-3538
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
>
>
> Diffs
> -----
>
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
>
> Diff: https://reviews.apache.org/r/14897/diff/
>
>
> Testing
> -------
>
> [abain@abain-ld pig]$ cat data/1.dat
> 1,orange
> 2,apple
> 3,strawberry
>
> [abain@abain-ld pig]$ cat test3.pig
> a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
> b = LIMIT a 2;
> STORE b INTO 'foo';
>
> I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
>
> [abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
> Found 2 items
> -rw-r--r--   1 abain supergroup          0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
> -rw-r--r--   1 abain supergroup         17 2013-10-23 18:38 /user/abain/foo/part-r-00000
>
> [abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
> 1 orange
> 2 apple
>
>
> Thanks,
>
> Alex Bain
>
>

+
Daniel Dai 2013-10-24, 18:16
+
Alex Bain 2013-10-24, 23:45
+
Alex Bain 2013-10-24, 23:47
+
Daniel Dai 2013-10-25, 00:39
+
Daniel Dai 2013-10-24, 23:54