Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive QL - NOT IN, NOT EXIST


Copy link to this message
-
Re: Hive QL - NOT IN, NOT EXIST
@Peter  Does the query plan demonstrate that the 3Meg row table is being
map-joined and the 400M table streamed through? That is what you want: but
you might either need to fiddle with hints to get it to happen

Details:
    Read uuids s of feed into  in-memory map on all nodes (mapjoin)
    Stream the 400M message records through the in-memory maps, copying
id's from the "all feed uuids"  map to a  "matched feed uuid's map for
entries that have matches in the messages

     Note: this way the 400M rows are only read once on the cluster.

You can see whether hive can manage this or if you write a custom m/r job
to do it.

2013/5/5 Peter Chu <[EMAIL PROTECTED]>

> It works but it takes a very long time because the subqueries in NOT IN
> contains 400 million rows (the message table in the example) and the feed
> table contains 3 million rows.
>
> SELECT uuid from feed f WHERE f.uuid NOT IN (SELECT uuid FROM message);
>
> > Date: Sun, 5 May 2013 20:25:15 -0700
> > From: [EMAIL PROTECTED]
> > Subject: Re: Hive QL - NOT IN, NOT EXIST
> > To: [EMAIL PROTECTED]
>
> >
> >
> > --- On Sun, 5/5/13, Peter Chu <[EMAIL PROTECTED]> wrote:
> >
> > > I am wondering if there is any way to do this without resorting to
> > > using left outer join and finding nulls.
> >
> > I have found this to be an acceptable substitute. Is it not working for
> you?
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB