|
|
-
HiveQL Joins- HadoopRaihan Jamal 2012-07-05, 23:26
Can anyone help me with the below query?
* * *CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table through which comparisons need to be made)* * (* * BUYER_ID BIGINT,* * ITEM_ID BIGINT,* * CREATED_TIME STRING* * )* And this is the data in the above first table * * * **BUYER_ID** **ITEM_ID** **CREATED_TIME** * * 1015826235 220003038067 2012-06-21 07:57:39 * * 1015826235 300003861266 2012-06-21 21:11:12 * * 1015826235 140002997245 2012-06-14 20:10:16 * * 1015826235 200002448035 2012-06-08 22:02:17 * * 1015826235 *260003553381* *2002-01-30 23:12:18* * This is Second table in Hive- It also contains information about the items we are purchasing. *CREATE EXTERNAL TABLE IF NOT EXISTS Table2* * (* * USER_ID BIGINT,* * PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>* * )* And this is the data in the above second table- ***USER_ID** **PURCHASED_ITEM*** * 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"}, {"product_id":300003861266,"timestamps":"1340271857000"}, * * {"product_id":140002997245,"timestamps":"1339694926000"},** {"product_id":200002448035,"timestamps":"1339172659000"}]* *Compare Table2 with Table1 on USER_ID and BUYER_ID.* *Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and `CREATED_TIME` which are missing from `Table2` after comparing from `Table1` on `BUYER_ID`.* So If you look `Table2` data this `ITEM_ID(PRODUCT_ID) 260003553381` and `CREATED_TIME(TIMESTAMPS) 2002-01-30 23:12:18` is missing from Table2. So I want to show the below result using the HiveQL query. ***BUYER_ID or USER_ID** **ITEM_ID** **CREATED_TIME** **TIMESTAMP_OF_CREATED_TIME*** * 1015826235 260003553381 2002-01-30 23:12:18 1012457538* Below is the query I wrote, I am not sure whether it is right or not as HiveQL is totally new to me. Can anyone help with this HiveQL query? select BUYER_ID, ITEM_ID, CREATED_TIME, UNIX_TIMESTAMP(CREATED_TIME) from table1 where table1.user_id = table2.buyer_id and (table1.item_id <> table2.product_id or UNIX_TIMESTAMP(table1.created_time) <> table2.timestamps) *Raihan Jamal* |