Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Comparing two logs, finding missing records


Copy link to this message
-
Re: Comparing two logs, finding missing records
Thank you, Bharath, tomorrow I will get the reaction to my solution from the
actual person who posed the problem for me, and then we will see what
details I might have missed.

Mark

On Sun, Jun 26, 2011 at 8:04 PM, Bharath Mundlapudi
<[EMAIL PROTECTED]>wrote:

> SQL:
> SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid;
>
> PIG:
> data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid;
> DUMP data;
>
> If you need more PIG help, please post in PIG email alias.
>
> -Bharath
>
> ------------------------------
> *From:* Mark Kerzner <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Bharath Mundlapudi <
> [EMAIL PROTECTED]>
> *Sent:* Sunday, June 26, 2011 5:50 PM
> *Subject:* Re: Comparing two logs, finding missing records
>
> Bharath,
>
> how would a Pig query look like?
>
> Thank you,
> Mark
>
> On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <[EMAIL PROTECTED]
> > wrote:
>
> If you have Serde or PigLoader for your log format, probably Pig or Hive
> will be a quicker solution with the join.
>
> -Bharath
>
>
>
> ________________________________
> From: Mark Kerzner <[EMAIL PROTECTED]>
> To: Hadoop Discussion Group <[EMAIL PROTECTED]>
> Sent: Saturday, June 25, 2011 9:39 PM
> Subject: Comparing two logs, finding missing records
>
> Hi,
>
> I have two logs which should have all the records for the same record_id,
> in
> other words, if this record_id is found in the first log, it should also be
> found in the second one. However, I suspect that the second log is filtered
> out, and I need to find the missing records. Anything is allowed: MapReduce
> job, Hive, Pig, and even a NoSQL database.
>
> Thank you.
>
> It is also a good time to express my thanks to all the members of the group
> who are always very helpful.
>
> Sincerely,
> Mark
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB