Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Comparing two logs, finding missing records


Copy link to this message
-
Re: Comparing two logs, finding missing records
I believe you meant,

SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid
WHERE LOG2.recordid is null. (this would produce set of records in LOG1 and
which are not present in LOG2).

In PIG, we have to add additional filter with "is null" condition.

~Rajesh.B

On Mon, Jun 27, 2011 at 6:34 AM, Bharath Mundlapudi
<[EMAIL PROTECTED]>wrote:

> SQL:
>
> SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid;
>
>
> PIG:
> data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid;
> DUMP data;
>
>
> If you need more PIG help, please post in PIG email alias.
>
> -Bharath
>
>
> ________________________________
> From: Mark Kerzner <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Bharath Mundlapudi <
> [EMAIL PROTECTED]>
> Sent: Sunday, June 26, 2011 5:50 PM
> Subject: Re: Comparing two logs, finding missing records
>
>
> Bharath,
>
> how would a Pig query look like?
>
> Thank you,
> Mark
>
>
> On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <[EMAIL PROTECTED]>
> wrote:
>
> If you have Serde or PigLoader for your log format, probably Pig or Hive
> will be a quicker solution with the join.
> >
> >-Bharath
> >
> >
> >
> >________________________________
> >From: Mark Kerzner <[EMAIL PROTECTED]>
> >To: Hadoop Discussion Group <[EMAIL PROTECTED]>
> >Sent: Saturday, June 25, 2011 9:39 PM
> >Subject: Comparing two logs, finding missing records
> >
> >
> >Hi,
> >
> >I have two logs which should have all the records for the same record_id,
> in
> >other words, if this record_id is found in the first log, it should also
> be
> >found in the second one. However, I suspect that the second log is
> filtered
> >out, and I need to find the missing records. Anything is allowed:
> MapReduce
> >job, Hive, Pig, and even a NoSQL database.
> >
> >Thank you.
> >
> >It is also a good time to express my thanks to all the members of the
> group
> >who are always very helpful.
> >
> >Sincerely,
> >Mark
>

--
~Rajesh.B
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB