Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez


Copy link to this message
-
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/#review27470
-----------------------------------------------------------
Thanks Alex. Patch looks good in general. Might consider some optimization in the future:
1. If the previous stage using 1 reduce, no need to add one more vertex
2. If the limitplan is null (ie, not the "limited order by" case), we might not need a shuffle edge, a pass through edge should be enough if possible
3. Similar to PIG-1270, we can push limit to InputHandler
4. We also need to think through the "limited order by" case once "order by" is implemented.

Once Choelsoo's comments are addressed, we are ready to go. I will add a e2e test case for it.

- Daniel Dai
On Oct. 24, 2013, 1:42 a.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14897/
> -----------------------------------------------------------
>
> (Updated Oct. 24, 2013, 1:42 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3538
>     https://issues.apache.org/jira/browse/PIG-3538
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
>
>
> Diffs
> -----
>
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
>
> Diff: https://reviews.apache.org/r/14897/diff/
>
>
> Testing
> -------
>
> [abain@abain-ld pig]$ cat data/1.dat
> 1,orange
> 2,apple
> 3,strawberry
>
> [abain@abain-ld pig]$ cat test3.pig
> a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
> b = LIMIT a 2;
> STORE b INTO 'foo';
>
> I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
>
> [abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
> Found 2 items
> -rw-r--r--   1 abain supergroup          0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
> -rw-r--r--   1 abain supergroup         17 2013-10-23 18:38 /user/abain/foo/part-r-00000
>
> [abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
> 1 orange
> 2 apple
>
>
> Thanks,
>
> Alex Bain
>
>