Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> HIVE left semi join multiple tables with columns from multiple table


+
Souvik Banerjee 2012-12-27, 23:16
+
Mark Grover 2012-12-28, 03:06
+
Souvik Banerjee 2012-12-28, 05:26
+
Mark Grover 2012-12-28, 13:40
Copy link to this message
-
Re: HIVE left semi join multiple tables with columns from multiple table
Thanks Mark for writing back to me.
I did the same thing and still not getting the desired result.
I was looking for the cause. I managed to get the desired result. Sharing
my thought process and how I did it. Let me know your thoughts on this,
that would be really nice to have. Also please share any better idea to
achieve this result and feel free to point out if you think that my
explanation is wrong.
I think you noticed the fact that I want the value column for both the ID_1
and ID_2 and there values in these columns are not UNIQUE.
It happened that I used to get more rows returned than the original no of
rows in the second table due to the fact that there are duplicates in ID_1
and ID_2 column.
So first of all to get the Value column for both ID_1 and ID_2 I have to
join twice and that too I can not have join on condition which have
duplicates.
So I created a new table out of table 2 with addition of a column having
unique value for each row using HIVE UDF.
Then I created 2 views each for ID_1 and ID_2 which contains the
description by joining on the condition a.ID_1 = b.ID.
Now I have two views and then I joined these two views on the condition
that view1.uniqueid = view2.uniqueid.
That gives me desired output.

Looking forward to have your views.

Thanks and regards,
Souvik.

On Fri, Dec 28, 2012 at 7:40 AM, Mark Grover <[EMAIL PROTECTED]>wrote:

> Souvik,
> In your new example, you need a RIGHT OUTER JOIN between table1 and table2
> (order matters - table1 on left, table2 on right) on the ID1 column.
> Something like this (untested by me):
> SELECT
>    table1.*,
>    table2.*
> FROM
>    table1
>    RIGHT OUTER JOIN table2
>    ON (table1.id=table2.id_1);
>
> Mark
>
>
> On Thu, Dec 27, 2012 at 9:26 PM, Souvik Banerjee <[EMAIL PROTECTED]
> > wrote:
>
>> Thanks a lot Mark for your attention.
>> But I think I cannot go for INNER join, the reason behind the fact being
>> that I want all rows of Table3 irrespective of there is any row
>> corresponding to that ID in Table 1 or Table 2.
>> Probably I would have taken care of that while providing the example.
>> The more refined example would be like below.
>>
>> Look forward for your help.
>>
>> Thanks and regards,
>> Souvik.
>>
>> P.S. Dropped one table from the earlier example and Now we have table 1
>> and table 2, we are looking for table 4. (Hope so it's more simple and it's
>> exactly we need)
>>
>>
>>
>> *Table 1*
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V11
>>
>> 2
>>
>> V12
>>
>> 3
>>
>> V13
>>
>>
>>
>> *Table 2*
>>
>> ID_1
>>
>> ID_2
>>
>> Other_Column1
>>
>> 1
>>
>> 3
>>
>> C1
>>
>> 2
>>
>> 1
>>
>> C2
>>
>> 3
>>
>> 2
>>
>> C3
>>
>> 4
>>
>> 9
>>
>> C4
>>
>> 12
>>
>> 3
>>
>> C5
>>
>> 1
>>
>> 8
>>
>> C6
>>
>>
>>
>> I want to formulate a table which would look like (*Table 4*)
>>
>> ID_1
>>
>> ID_1_Value
>>
>> ID_2
>>
>> ID_2_value
>>
>> Other_Column1
>>
>> 1
>>
>> V11
>>
>> 3
>>
>> V13
>>
>> C1
>>
>> 2
>>
>> V12
>>
>> 1
>>
>> V11
>>
>> C2
>>
>> 3
>>
>> V13
>>
>> 2
>>
>> V12
>>
>> C3
>>
>> 4
>>
>> NULL / Empty
>>
>> 9
>>
>> NULL / Empty
>>
>> C4
>>
>> 12
>>
>> NULL / Empty
>>
>> 3
>>
>> V13
>>
>> C5
>>
>> 1
>>
>> V11
>>
>> 8
>>
>> NULL / Empty
>>
>> C6
>>
>>
>> On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Souvik,
>>> Let me begin by saying that simplifying the problem goes a long way in
>>> helping us answer your question. You did it really nicely here, so thank
>>> you for doing that.
>>>
>>> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
>>> the same query as INNER JOIN with no restrictions on what can be included
>>> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
>>> efficiently. If you want columns from the right table in your select list,
>>> just do the regular (aka inefficient way) inner join.
>>>
>>> Of course, you can optimize inner joins as map joins, sorted merge join
>>> or sorted merge bucketed joins depending on your use case.
>>>
>>> Mark
>>>
+
Mark Grover 2012-12-30, 02:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB