Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Fwd: debugging hadoop streaming programs (first code)


Copy link to this message
-
Fwd: debugging hadoop streaming programs (first code)
Forgot reducer :)

---------- Forwarded message ----------
From: jamal sasha <[EMAIL PROTECTED]>
Date: Mon, Nov 19, 2012 at 8:17 PM
Subject: debugging hadoop streaming programs (first code)
To: [EMAIL PROTECTED]

Hi,
  This is my first attempt to learn the map reduce abstraction.

My problem is as follows
I have a text file as follows:
id 1, id2, date,time,mrps,code,code2

3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
Now what I want is to do is to count the number of transaction
happening in every half an hour between 7 am and 11 am.

So here are the  intervals.
7-7:30 ->0

7:30-8 -> 1

8-8:30->2

....

10:30-11->7

So ultimately what I am doing is creating a 2d dictionary

d[id2][interval] = count_transactions.
My mappers and reducers are attached (sample input also).

The code run just fine if i run via

cat input.txt | python mapper.py | sort | python reducer.py
Gives me the output but when i run it on clusters.. it throws an error
which is not helpful (basically on the terminal it says job
unsuccesful reason NA).

Any suggestion on what am i doing wrong.
Jamal
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB