Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # general - Fwd: debugging hadoop streaming programs (first code)


Copy link to this message
-
Fwd: debugging hadoop streaming programs (first code)
jamal sasha 2012-11-20, 01:18
Forgot reducer :)

---------- Forwarded message ----------
From: jamal sasha <[EMAIL PROTECTED]>
Date: Mon, Nov 19, 2012 at 8:17 PM
Subject: debugging hadoop streaming programs (first code)
To: [EMAIL PROTECTED]

Hi,
  This is my first attempt to learn the map reduce abstraction.

My problem is as follows
I have a text file as follows:
id 1, id2, date,time,mrps,code,code2

3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
Now what I want is to do is to count the number of transaction
happening in every half an hour between 7 am and 11 am.

So here are the  intervals.
7-7:30 ->0

7:30-8 -> 1

8-8:30->2

....

10:30-11->7

So ultimately what I am doing is creating a 2d dictionary

d[id2][interval] = count_transactions.
My mappers and reducers are attached (sample input also).

The code run just fine if i run via

cat input.txt | python mapper.py | sort | python reducer.py
Gives me the output but when i run it on clusters.. it throws an error
which is not helpful (basically on the terminal it says job
unsuccesful reason NA).

Any suggestion on what am i doing wrong.
Jamal