Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Joining two tables using HiveQL


Copy link to this message
-
Joining two tables using HiveQL
Hi Everyone,

  CREATE EXTERNAL TABLE IF NOT EXISTS Table1   (This is the MAIN table
through which comparisons need to be made)
    (
    BUYER_ID BIGINT,
    ITEM_ID BIGINT,
    CREATED_TIME STRING
    )

 And this is the data in the above first table

*    **BUYER_ID**  |  **ITEM_ID**         |      **CREATED_TIME**   *
*
----------------------+---------------------------+-----------------------------------------
*
*     1015826235        220003038067           2012-06-21 07:57:39 *
*     1015826235        300003861266           2012-06-21 21:11:12 *
*     1015826235        140002997245           2012-06-14 20:10:16 *
*     1015826235        200002448035           2012-06-08 22:02:17 *
*     1015826235      *260003553381*         *2002-01-30 23:12:18* *

This is Second table in Hive- It also contains information about the items
we are purchasing.

    CREATE EXTERNAL TABLE IF NOT EXISTS Table2
    (
    USER_ID BIGINT,
    PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
    )

And this is the data in the above second table (Table2)-

    ***USER_ID**         |       **PURCHASED_ITEM***
*
-------------------------------+---------------------------------------------
*
*    1015826235
 [{"product_id":220003038067,"timestamps":"1340321132000"},
 {"product_id":300003861266,"timestamps":"1340271857000"},     *
*
{"product_id":140002997245,"timestamps":"1339694926000"},
 {"product_id":200002448035,"timestamps":"1339172659000"}]*
 **Compare Table2 with Table1 on USER_ID and BUYER_ID.**
 Find the `BUYER_ID(USER_ID)` and as well as those `ITEM_ID` and
`CREATED_TIME` which are missing from `Table2` after comparing from
`Table1` on `BUYER_ID`.

So If you look `Table2` data this(last) `ITEM_ID(PRODUCT_ID) 260003553381`
and `CREATED_TIME(TIMESTAMPS)  2002-01-30 23:12:18` is missing from Table2.
So I want to show the below result using the HiveQL query.
 * **BUYER_ID or USER_ID**      |             **ITEM_ID**       |
**CREATED_TIME**         |          **TIMESTAMP_OF_CREATED_TIME***
*
-----------------------------------------------------+------------------------------------+----------------------------------------------+----------------------------------------------------------------
*
*              1015826235                                260003553381
           2002-01-30 23:12:18                                1012457538*
Can anyone help me with this. As I am new to HiveQL so having lot of
Problem. I wrote the below HiveQL, but it is not working the way I wanted
to. Can anyone suggest me what wrong I am doing wrong?

**Updated:-**

I have written this query, but it doesn't working the way I wanted to.

    SELECT
      Table1.buyer_id, Table1.item_id, Table1.created_time,
      UNIX_TIMESTAMP(Table1.created_time)
    FROM ( SELECT user_id,prodID
           FROM Table2 test2
           LATERAL VIEW explode( test2.purchased_item.product_id )
prodTable AS prodID
         ) prodTable
    RIGHT OUTER JOIN Table1 ON prodTable.user_id = Table1.buyer_id;
*Raihan Jamal*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB