Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Join with greater/less then condition


+
sonia gehlot 2012-07-05, 19:21
Copy link to this message
-
Re: Join with greater/less then condition
Pig can only do equi-joins.  Theta joins are hard in MapReduce.  So the way to do this is do the equi-join and then filter afterwards.  This will not create significant additional cost since the join results will be filtered before being materialized to disk.

C = Join table_a on user_id, title_id, table_b on user_id, title_id;
D = filter C by table_a::timestamp > table_b::timestamp;

Alan.

On Jul 5, 2012, at 12:21 PM, sonia gehlot wrote:

> Hi Guys,
>
> I want to join 2 tables in hive on couple of columns and out them one
> condition is timestamp of one column is greater then the other one. In SQL
> I could have written in this way:
>
> table_a a Join table_b b
> on a.user_id = b.user_id
> and a.title_id = b.title_id
> and a.timestamp > b.timestamp
>
> How to write last condition in Pig? *a.timestamp > b.timestamp*
>
> Thanks,
> Sonia
+
sonia gehlot 2012-07-05, 21:28
+
Dmitriy Ryaboy 2012-07-06, 15:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB