|
|
-
probably very stupid questionjamal sasha 2013-01-14, 23:44
Hi,
Probably a very lame question. I have two documents and I want to find the overlap of both documents in map reduce fashion and then compare the overlap (lets say I have some measure to do that) SO this is what I am thinking: 1) Run the normal wordcount job on one document ( https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-times-a-word-appeared-in-a-file-using-map-reduce-framework ) 2) But rather than saving a file, save everything in a HashMap(word,true) 3) Pass that HashMap along the second wordcount mapreduce program and then as I am processing the second document, check the words against the HashMap to find whether the word is present or not. So, something like this 1) HashMap<String, boolean> hm = runStepOne(); <-- map reduce job 2) runSteptwo(HashMap<String, boolean>) How do I do this in hadoop I know there can be some other hacks but what I am trying to achieve is get comfortable with the java framework.. So, from the above link.. how do i save the datastrcuture instead of file. How do I pass the datastructure as an argument? |