Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Comparing two logs, finding missing records


Copy link to this message
-
Re: Comparing two logs, finding missing records
Mark Kerzner 2011-06-26, 22:24
Interesting, Bharath, I will look at these.

Mark

On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi
<[EMAIL PROTECTED]>wrote:

> If you have Serde or PigLoader for your log format, probably Pig or Hive
> will be a quicker solution with the join.
>
> -Bharath
>
>
>
> ________________________________
> From: Mark Kerzner <[EMAIL PROTECTED]>
> To: Hadoop Discussion Group <[EMAIL PROTECTED]>
> Sent: Saturday, June 25, 2011 9:39 PM
> Subject: Comparing two logs, finding missing records
>
> Hi,
>
> I have two logs which should have all the records for the same record_id,
> in
> other words, if this record_id is found in the first log, it should also be
> found in the second one. However, I suspect that the second log is filtered
> out, and I need to find the missing records. Anything is allowed: MapReduce
> job, Hive, Pig, and even a NoSQL database.
>
> Thank you.
>
> It is also a good time to express my thanks to all the members of the group
> who are always very helpful.
>
> Sincerely,
> Mark
>