I believe a reduce side join is what you are looking for.
You can use MultipleInputs and achieve a reduce side join to achieve this.
Sent from remote device, Please excuse typos
From: jamal sasha <[EMAIL PROTECTED]>
Date: Mon, 14 Jan 2013 15:44:25
To: [EMAIL PROTECTED]<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: probably very stupid question
Probably a very lame question.
I have two documents and I want to find the overlap of both documents in
map reduce fashion and then compare the overlap (lets say I have some
measure to do that)
SO this is what I am thinking:
1) Run the normal wordcount job on one document (
2) But rather than saving a file, save everything in a
3) Pass that HashMap along the second wordcount mapreduce program and
then as I am processing the second document, check the words against the
HashMap to find whether the word is present or not.
So, something like this
1) HashMap<String, boolean> hm = runStepOne(); <-- map reduce job
2) runSteptwo(HashMap<String, boolean>)
How do I do this in hadoop
I know there can be some other hacks but what I am trying to achieve is get
comfortable with the java framework..
So, from the above link.. how do i save the datastrcuture instead of file.
How do I pass the datastructure as an argument?