Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - chaining (the output of) jobs/ reducers

Copy link to this message
chaining (the output of) jobs/ reducers
Adrian CAPDEFIER 2013-09-12, 13:36

My application requires 2 distinct processing steps (reducers) to be
performed on the input data. The first operation generates changes the key
values and, records that had different keys in step 1 can end up having the
same key in step 2.

The heavy lifting of the operation is in step1 and step2 only combines
records where keys were changed.

In short the overview is:
Sequential file -> Step 1 -> Step 2 -> Output.
To implement this in hadoop, it seems that I need to create a separate job
for each step.

Now I assumed, there would some sort of job management under hadoop to link
Job 1 and 2, but the only thing I could find was related to job scheduling
and nothing on how to synchronize the input/output of the linked jobs.

The only crude solution that I can think of is to use a temporary file
under HDFS, but even so I'm not sure if this will work.

The overview of the process would be:
Sequential Input (lines) => Job A[Mapper (key1, value1) => ChainReducer
(key2, value2)] => Temporary file => Job B[Mapper (key2, value2) => Reducer
(key2, value 3)] => output.

Is there a better way to pass the output from Job A as input to Job B (e.g.
using network streams or some built in java classes that don't do disk

The temporary file solution will work in a single node configuration, but
I'm not sure about an MPP config.

Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or
both jobs run on all 4 nodes - will HDFS be able to redistribute
automagically the records between nodes or does this need to be coded
Vinod Kumar Vavilapalli 2013-09-13, 04:26
Adrian CAPDEFIER 2013-09-17, 13:23
Adrian CAPDEFIER 2013-09-12, 16:35
Bryan Beaudreault 2013-09-12, 17:38
Adrian CAPDEFIER 2013-09-12, 19:02
Bryan Beaudreault 2013-09-12, 19:49
Venkata K Pisupat 2013-09-12, 20:07
Shahab Yunus 2013-09-12, 17:33