Hi Marcos,

Thanks for the pointers. I am also thinking on the similar lines.
I am doubtful at 1 point :

I will be having separate data files for every interval. Let's take example if I have 5 mins interval file which contain data for 2 hours and 10 mins. In this scenario I want to process 2 hours data with hours job and 10 mins data with mins job. Now since I will provide my data file as Input to MR jobs so I think original file needs to split in 2 files : HourFile and
MinsFile. HourFile wll contain data for 2 hours and MinsFile will conatin data for 10 mins.

I have attained file splitting with simple Java class but I think there is too much I/O operations and if I can attain this also in MR or in some efficient way, it will be good because the original data files can be huge and then the initial breaking of files will itself take too much time.

Please suggest.
Thanks

-----Original Message-----
From: Marcos Ortiz [mailto:[EMAIL PROTECTED]]
Sent: Sunday, February 26, 2012 7:40 PM
To: [EMAIL PROTECTED]
Cc: Stuti Awasthi
Subject: Re: Query Regarding design MR job for Billing

Well, first, you can design 6 MR jobs:
1- for 5 mins interval
2- for 1 hour
3- for 1 day
4- for 1 month
5- for 1 year
6- and a last for any interval

If you say that for each interval, you have to do a different calculation; this way could be a solution (at least I think that).
You can read the "design patterns" for MapReduce algorithms proposed by Jimmy Lin and Chris Dyer on his "Data-Intensive Text Processing with MapReduce" book.

Regards
On 02/27/2012 05:39 AM, Stuti Awasthi wrote:
Marcos Luis Ortíz Valmaseda
  Senior Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186

Fin a la injusticia, LIBERTAD AHORA A NUESTROS CINCO COMPATRIOTAS QUE SE ENCUENTRAN INJUSTAMENTE EN PRISIONES DE LOS EEUU!
http://www.antiterroristas.cu
http://justiciaparaloscinco.wordpress.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB