|
|
-
Re: debugging hadoop streaming programs (first code)jamal sasha 2012-11-20, 13:33
Hi,
If I just use pipes, then the code runs just fine.. the issue is when I deploy it on clusters... :( Any suggestions on how to debug it. On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija <[EMAIL PROTECTED]>wrote: > Hi Jamal, > > You can debug your MapReduce program if it is written in java > code, by running your MR job in LocalRunner mode via eclipse. > Or even you can have some debug statements (or even S.O.Ps) > written in your code so that you can check where your job fails. > > But I am NOT sure for python, but one suggestion is can you run > your Python code (Map unit & reduce unit) locally on your input data and > see whether your logic has any issues. > > Best, > Mahesh Balija, > Calsoft Labs. > > > On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <[EMAIL PROTECTED]>wrote: > >> >> >> >> Hi, >> This is my first attempt to learn the map reduce abstraction. >> >> My problem is as follows >> I have a text file as follows: >> id 1, id2, date,time,mrps,code,code2 >> >> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 >> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 >> >> >> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am. >> >> So here are the intervals. >> >> >> 7-7:30 ->0 >> >> 7:30-8 -> 1 >> >> 8-8:30->2 >> >> .... >> >> 10:30-11->7 >> >> So ultimately what I am doing is creating a 2d dictionary >> >> d[id2][interval] = count_transactions. >> >> >> My mappers and reducers are attached (sample input also). >> >> The code run just fine if i run via >> >> cat input.txt | python mapper.py | sort | python reducer.py >> >> >> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA). >> >> Any suggestion on what am i doing wrong. >> >> >> Jamal >> >> >> >> >> >> >> > |