|
|
-
Joining two tables using HiveQLRaihan Jamal 2012-07-07, 22:20
Hi Everyone,
CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table through which comparisons need to be made) ( BUYER_ID BIGINT, ITEM_ID BIGINT, CREATED_TIME STRING ) And this is the data in the above first table * **BUYER_ID** | **ITEM_ID** | **CREATED_TIME** * * ----------------------+---------------------------+----------------------------------------- * * 1015826235 220003038067 2012-06-21 07:57:39 * * 1015826235 300003861266 2012-06-21 21:11:12 * * 1015826235 140002997245 2012-06-14 20:10:16 * * 1015826235 200002448035 2012-06-08 22:02:17 * * 1015826235 *260003553381* *2002-01-30 23:12:18* * This is Second table in Hive- It also contains information about the items we are purchasing. CREATE EXTERNAL TABLE IF NOT EXISTS Table2 ( USER_ID BIGINT, PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) And this is the data in the above second table (Table2)- ***USER_ID** | **PURCHASED_ITEM*** * -------------------------------+--------------------------------------------- * * 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"}, {"product_id":300003861266,"timestamps":"1340271857000"}, * * {"product_id":140002997245,"timestamps":"1339694926000"}, {"product_id":200002448035,"timestamps":"1339172659000"}]* **Compare Table2 with Table1 on USER_ID and BUYER_ID.** Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and `CREATED_TIME` which are missing from `Table2` after comparing from `Table1` on `BUYER_ID`. So If you look `Table2` data this(last) `ITEM_ID(PRODUCT_ID) 260003553381` and `CREATED_TIME(TIMESTAMPS) 2002-01-30 23:12:18` is missing from Table2. So I want to show the below result using the HiveQL query. * **BUYER_ID or USER_ID** | **ITEM_ID** | **CREATED_TIME** | **TIMESTAMP_OF_CREATED_TIME*** * -----------------------------------------------------+------------------------------------+----------------------------------------------+---------------------------------------------------------------- * * 1015826235 260003553381 2002-01-30 23:12:18 1012457538* Can anyone help me with this. As I am new to HiveQL so having lot of Problem. I wrote the below HiveQL, but it is not working the way I wanted to. Can anyone suggest me what wrong I am doing wrong? **Updated:-** I have written this query, but it doesn't working the way I wanted to. SELECT Table1.buyer_id, Table1.item_id, Table1.created_time, UNIX_TIMESTAMP(Table1.created_time) FROM ( SELECT user_id,prodID FROM Table2 test2 LATERAL VIEW explode( test2.purchased_item.product_id ) prodTable AS prodID ) prodTable RIGHT OUTER JOIN Table1 ON prodTable.user_id = Table1.buyer_id; *Raihan Jamal* |