Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive Join with distinct rows


Copy link to this message
-
Hive Join with distinct rows
Hi,

I have 2 tables, T1 and T2 of structure columns= {ID, url, timestamp}

In T1 ID's are repeatable. But T2 ID's are kind of primary key hence only
unique values are present.

I want to join both tables T1 & T2 such that select all the unique ID's in
T1 which are not in  T2

I have written 2 queries:

Query1: select distinct T1.ID from T1 LEFT OUTER JOIN T2 on T1.ID=T2.ID AND
T2.ID=NULL

this gives me expected results, but with the below I am getting all the
rows in T1

Query2: select distinct(T1.ID), T1.url, T1.timestamp from T1 LEFT OUTER
JOIN T2 on T1.ID=T2.ID AND T2.ID=NULL

Can some one point me how to achieve: select all the unique ID's in T1
which are not in T2

Thanks,
Praveen
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB