In you mapper make the line no as the key and the line contents as
the value. In your reducer check whether the two values for a key are
matching. ie if you are comparing two files then there would be two values
for a line number. If non matching patterns found increment a counter to
determine the number of non matching patterns and write those patterns to
output file . If the values matches for a key do nothing, no need even
writing to output dir.
On Tue, Mar 20, 2012 at 2:01 PM, botma lin <[EMAIL PROTECTED]> wrote:
> Hi, all
> I'm newbie to hadoop.
> I'm trying to compare two large file and get the difference between
> them ,like the diff cmd in linux,
> however, the mapred api can only get one record at a time . so how can I
> get the relative records in two files and compare them by using mapred api.