Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HIVE left semi join multiple tables with columns from multiple table


Copy link to this message
-
Re: HIVE left semi join multiple tables with columns from multiple table
Souvik Banerjee 2012-12-28, 05:26
Thanks a lot Mark for your attention.
But I think I cannot go for INNER join, the reason behind the fact being
that I want all rows of Table3 irrespective of there is any row
corresponding to that ID in Table 1 or Table 2.
Probably I would have taken care of that while providing the example.
The more refined example would be like below.

Look forward for your help.

Thanks and regards,
Souvik.

P.S. Dropped one table from the earlier example and Now we have table 1 and
table 2, we are looking for table 4. (Hope so it's more simple and it's
exactly we need)

*Table 1*

ID

Value

1

V11

2

V12

3

V13

*Table 2*

ID_1

ID_2

Other_Column1

1

3

C1

2

1

C2

3

2

C3

4

9

C4

12

3

C5

1

8

C6

I want to formulate a table which would look like (*Table 4*)

ID_1

ID_1_Value

ID_2

ID_2_value

Other_Column1

1

V11

3

V13

C1

2

V12

1

V11

C2

3

V13

2

V12

C3

4

NULL / Empty

9

NULL / Empty

C4

12

NULL / Empty

3

V13

C5

1

V11

8

NULL / Empty

C6
On Thu, Dec 27, 2012 at 9:06 PM, Mark Grover <[EMAIL PROTECTED]>wrote:

> Souvik,
> Let me begin by saying that simplifying the problem goes a long way in
> helping us answer your question. You did it really nicely here, so thank
> you for doing that.
>
> Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express
> the same query as INNER JOIN with no restrictions on what can be included
> in the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
> efficiently. If you want columns from the right table in your select list,
> just do the regular (aka inefficient way) inner join.
>
> Of course, you can optimize inner joins as map joins, sorted merge join or
> sorted merge bucketed joins depending on your use case.
>
> Mark
>
> On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee <[EMAIL PROTECTED]
> > wrote:
>
>> Hi,
>>
>> I am struggling with a problem described below.
>> Any help how to resolve this problem is highly appreciated.
>>
>> I have got few tables the structure is over simplified for the sake of
>> describing the nature of the problem that I am facing.
>>
>> *Table 1*
>> * *
>>
>>
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V11
>>
>> 2
>>
>> V12
>>
>> 3
>>
>> V13
>>
>>
>>
>> *Table 2*
>> * *
>>
>>
>>
>> ID
>>
>> Value
>>
>> 1
>>
>> V21
>>
>> 2
>>
>> V22
>>
>> 3
>>
>> V23
>>
>>
>>
>> *Table 3*
>> * *
>>
>>
>>
>> ID_1
>>
>> ID_2
>>
>> Other_Column1
>>
>> 1
>>
>> 3
>>
>> C1
>>
>> 2
>>
>> 1
>>
>> C2
>>
>> 3
>>
>> 2
>>
>> C3
>>
>>
>>
>> I want to formulate a table which would look like
>>
>> ID_1
>>
>> ID_1_Value
>>
>> ID_2
>>
>> ID_2_value
>>
>> Other_Column1
>>
>> 1
>>
>> V11
>>
>> 3
>>
>> V23
>>
>> C1
>>
>> 2
>>
>> V12
>>
>> 1
>>
>> V21
>>
>> C2
>>
>> 3
>>
>> V13
>>
>> 2
>>
>> V22
>>
>> C3
>>
>>
>>
>> I am facing problem with this.
>> I tried to LEFT SEMI JOIN in Hive.
>>
>> I tried to do it in two steps (For 3 tables)
>> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
>> But the problem is that in LEFT SEMI JOIN you can not have columns from
>> the right table in the select clause. So after join my new table simply
>> looks like TABLE 3.
>>
>> Can you help me how I can achieve this is HIVE.
>>
>> Thanks and regards,
>> Souvik.
>>
>
>