Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Cumulative value using mapreduce


+
Sarath 2012-10-04, 13:58
+
Bertrand Dechoux 2012-10-04, 16:20
+
Ted Dunning 2012-10-04, 17:52
+
java8964 java8964 2012-10-04, 19:02
+
Bertrand Dechoux 2012-10-04, 21:21
Copy link to this message
-
Re: Cumulative value using mapreduce
Thanks for all your responses. As suggested will go through the
documentation once again.

But just to clarify, this is not my first map-reduce program. I've
already written a map-reduce for our product which does filtering and
transformation of the financial data. This is a new requirement we've
got. I have also did the logic of calculating the cumulative sums. But
the output is not coming as desired and I feel I'm not doing it right
way and missing something. So thought of taking a quick help from the
mailing list.

As an example, say we have records as below -
Txn ID
Txn Date
Cr/Dr Indicator
Amount
1001
9/22/2012
CR
1000
1002
9/25/2012
DR
500
1003
10/1/2012
DR
1500
1004
10/4/2012
CR
2000
When this file passed the logic should append the below 2 columns to the
output for each record above -
CR Cumulative Amount
DR Cumulative Amount
1000
0
1000
500
1000
2000
3000
2000
Hope the problem is clear now. Please provide your suggestions on the
approach to the solution.

Regards,
Sarath.

On Friday 05 October 2012 02:51 AM, Bertrand Dechoux wrote:
> I indeed didn't catch the cumulative sum part. Then I guess it begs
> for what-is-often-called-a-secondary-sort, if you want to compute
> different cumulative sums during the same job. It can be more or less
> easy to implement depending on which API/library/tool you are using.
> Ted comments on performance are spot on.
>
> Regards
>
> Bertrand
>
> On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     I did the cumulative sum in the HIVE UDF, as one of the project
>     for my employer.
>
>     1) You need to decide the grouping elements for your cumulative.
>     For example, an account, a department etc. In the mapper, combine
>     these information as your omit key.
>     2) If you don't have any grouping requirement, you just want a
>     cumulative sum for all your data, then send all the data to one
>     common key, so they will all go to the same reducer.
>     3) When you calculate the cumulative sum, does the output need to
>     have a sorting order? If so, you need to do the 2nd sorting, so
>     the data will be sorted as the order you want in the reducer.
>     4) In the reducer, just do the sum, omit every value per original
>     record (Not per key).
>
>     I will suggest you do this in the UDF of HIVE, as it is much easy,
>     if you can build a HIVE schema on top of your data.
>
>     Yong
>
>     ------------------------------------------------------------------------
>     From: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>     Date: Thu, 4 Oct 2012 18:52:09 +0100
>     Subject: Re: Cumulative value using mapreduce
>     To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>
>
>     Bertrand is almost right.
>
>     The only difference is that the original poster asked about
>     cumulative sum.
>
>     This can be done in reducer exactly as Bertrand described except
>     for two points that make it different from word count:
>
>     a) you can't use a combiner
>
>     b) the output of the program is as large as the input so it will
>     have different performance characteristics than aggregation
>     programs like wordcount.
>
>     Bertrand's key recommendation to go read a book is the most
>     important advice.
>
>     On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>         Hi,
>
>         It sounds like a
>         1) group information by account
>         2) compute sum per account
>
>         If that not the case, you should precise a bit more about your
>         context.
>
>         This computing looks like a small variant of wordcount. If you
>         do not know how to do it, you should read books about Hadoop
>         MapReduce and/or online tutorial. Yahoo's is old but still a
>         nice read to begin with :
>         http://developer.yahoo.com/hadoop/tutorial/
+
Bertrand Dechoux 2012-10-05, 06:38
+
Ted Dunning 2012-10-05, 05:50
+
Steve Loughran 2012-10-05, 14:43
+
Jane Wayne 2012-10-05, 15:21
+
Jane Wayne 2012-10-05, 15:31
+
java8964 java8964 2012-10-05, 14:03
+
Sarath 2012-10-19, 06:03