Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Join with greater/less then condition


+
sonia gehlot 2012-07-05, 19:21
Copy link to this message
-
Re: Join with greater/less then condition
Pig can only do equi-joins.  Theta joins are hard in MapReduce.  So the way to do this is do the equi-join and then filter afterwards.  This will not create significant additional cost since the join results will be filtered before being materialized to disk.

C = Join table_a on user_id, title_id, table_b on user_id, title_id;
D = filter C by table_a::timestamp > table_b::timestamp;

Alan.

On Jul 5, 2012, at 12:21 PM, sonia gehlot wrote:

> Hi Guys,
>
> I want to join 2 tables in hive on couple of columns and out them one
> condition is timestamp of one column is greater then the other one. In SQL
> I could have written in this way:
>
> table_a a Join table_b b
> on a.user_id = b.user_id
> and a.title_id = b.title_id
> and a.timestamp > b.timestamp
>
> How to write last condition in Pig? *a.timestamp > b.timestamp*
>
> Thanks,
> Sonia
+
sonia gehlot 2012-07-05, 21:28
+
Dmitriy Ryaboy 2012-07-06, 15:09