Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> HiveQL Joins- Hadoop


Copy link to this message
-
HiveQL Joins- Hadoop
Can anyone help me with the below query?
*
*

*CREATE EXTERNAL TABLE IF NOT EXISTS Table1   (This is the MAIN table
through which comparisons need to be made)*

*    (*

*    BUYER_ID BIGINT,*

*    ITEM_ID BIGINT,*

*    CREATED_TIME STRING*

*    )*

And this is the data in the above first table

* *

*    **BUYER_ID**        **ITEM_ID**            **CREATED_TIME**   *

*     1015826235              220003038067                  2012-06-21
07:57:39 *

*     1015826235              300003861266                  2012-06-21
21:11:12 *

*     1015826235             140002997245                  2012-06-14
20:10:16 *

*     1015826235             200002448035                 2012-06-08
22:02:17 *

*     1015826235         *260003553381*           *2002-01-30 23:12:18* *

This is Second table in Hive- It also contains information about the items
we are purchasing.

    *CREATE EXTERNAL TABLE IF NOT EXISTS Table2*

*    (*

*    USER_ID BIGINT,*

*    PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>*

*    )*

And this is the data in the above second table-

    ***USER_ID**    **PURCHASED_ITEM***

*    1015826235
 [{"product_id":220003038067,"timestamps":"1340321132000"},
{"product_id":300003861266,"timestamps":"1340271857000"},     *

*
 {"product_id":140002997245,"timestamps":"1339694926000"},**
{"product_id":200002448035,"timestamps":"1339172659000"}]*

*Compare Table2 with Table1 on USER_ID and BUYER_ID.*

*Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and
`CREATED_TIME` which are missing from `Table2` after comparing from
`Table1` on `BUYER_ID`.*

So If you look `Table2` data this `ITEM_ID(PRODUCT_ID) 260003553381` and
`CREATED_TIME(TIMESTAMPS)  2002-01-30 23:12:18` is missing from Table2. So
I want to show the below result using the HiveQL query.

***BUYER_ID or USER_ID**                   **ITEM_ID**
   **CREATED_TIME**                    **TIMESTAMP_OF_CREATED_TIME***

*      1015826235                                 260003553381
                  2002-01-30 23:12:18
   1012457538*
Below is the query I wrote, I am not sure whether it is right or not as
HiveQL is totally new to me. Can anyone help with this HiveQL query?

select BUYER_ID, ITEM_ID, CREATED_TIME, UNIX_TIMESTAMP(CREATED_TIME) from
table1 where table1.user_id = table2.buyer_id and (table1.item_id <>
table2.product_id or UNIX_TIMESTAMP(table1.created_time) <>
table2.timestamps)
*Raihan Jamal*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB