Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Join with greater/less then condition


+
sonia gehlot 2012-07-05, 19:21
+
Alan Gates 2012-07-05, 19:43
+
sonia gehlot 2012-07-05, 21:28
Copy link to this message
-
Re: Join with greater/less then condition
Dmitriy Ryaboy 2012-07-06, 15:09
Replace the filter with a foreach / generate (replace the right side
with nulls of > condition is not satisfied)

On Thu, Jul 5, 2012 at 2:28 PM, sonia gehlot <[EMAIL PROTECTED]> wrote:
> Actually I wanted to do left outer join, so not sure if filter will work in
> this case.
>
>
> On Thu, Jul 5, 2012 at 12:43 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>> Pig can only do equi-joins.  Theta joins are hard in MapReduce.  So the
>> way to do this is do the equi-join and then filter afterwards.  This will
>> not create significant additional cost since the join results will be
>> filtered before being materialized to disk.
>>
>> C = Join table_a on user_id, title_id, table_b on user_id, title_id;
>> D = filter C by table_a::timestamp > table_b::timestamp;
>>
>> Alan.
>>
>> On Jul 5, 2012, at 12:21 PM, sonia gehlot wrote:
>>
>> > Hi Guys,
>> >
>> > I want to join 2 tables in hive on couple of columns and out them one
>> > condition is timestamp of one column is greater then the other one. In
>> SQL
>> > I could have written in this way:
>> >
>> > table_a a Join table_b b
>> > on a.user_id = b.user_id
>> > and a.title_id = b.title_id
>> > and a.timestamp > b.timestamp
>> >
>> > How to write last condition in Pig? *a.timestamp > b.timestamp*
>> >
>> > Thanks,
>> > Sonia
>>
>>