Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Improving self join time


Copy link to this message
-
Re: Improving self join time
hmm. would this not fall under the general problem of identifying
duplicates?

Would something like this meet your needs? (untested)

select  -- outer query finds the ids for the duplicates
    key

from (  -- inner query lists duplicate values
     select
       count(*) as cnt,
       value
     from
        foo
     group by
        value
     having
       count(*) > 1
     ) z
     join foo a on (a.value = z.value)
;

table foo is your table elements
key is your id,
value is your element
On Thu, Mar 20, 2014 at 7:03 AM, Jeff Storey <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB