Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Improving self join time

Copy link to this message
Re: Improving self join time
hmm. would this not fall under the general problem of identifying

Would something like this meet your needs? (untested)

select  -- outer query finds the ids for the duplicates

from (  -- inner query lists duplicate values
       count(*) as cnt,
     group by
       count(*) > 1
     ) z
     join foo a on (a.value = z.value)

table foo is your table elements
key is your id,
value is your element
On Thu, Mar 20, 2014 at 7:03 AM, Jeff Storey <[EMAIL PROTECTED]> wrote: