Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Improving self join time


Copy link to this message
-
Re: Improving self join time
hmm. would this not fall under the general problem of identifying
duplicates?

Would something like this meet your needs? (untested)

select  -- outer query finds the ids for the duplicates
    key

from (  -- inner query lists duplicate values
     select
       count(*) as cnt,
       value
     from
        foo
     group by
        value
     having
       count(*) > 1
     ) z
     join foo a on (a.value = z.value)
;

table foo is your table elements
key is your id,
value is your element
On Thu, Mar 20, 2014 at 7:03 AM, Jeff Storey <[EMAIL PROTECTED]> wrote: