Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Cumulative value using mapreduce


Copy link to this message
-
Re: Cumulative value using mapreduce
Ted Dunning 2012-10-05, 05:50
The answer is really the same.  Your problem is just using a goofy
representation for negative numbers (after all, negative numbers are a
relatively new concept in accounting).

You still need to use the account number as the key and the date as a sort
key.  Many financial institutions also process all debits before credits on
a particular day in order to maximize overdraft fees so you may want to use
the CR/DR field as a secondary key in the sort.

Then the addition is field driven.  Add to one sum or the other and always
add both sums to the record.

On Fri, Oct 5, 2012 at 5:56 AM, Sarath <
[EMAIL PROTECTED]> wrote:

>  Thanks for all your responses. As suggested will go through the
> documentation once again.
>
> But just to clarify, this is not my first map-reduce program. I've already
> written a map-reduce for our product which does filtering and
> transformation of the financial data. This is a new requirement we've got.
> I have also did the logic of calculating the cumulative sums. But the
> output is not coming as desired and I feel I'm not doing it right way and
> missing something. So thought of taking a quick help from the mailing list.
>
> As an example, say we have records as below -
>   Txn ID
>  Txn Date
>  Cr/Dr Indicator
>  Amount
>   1001
>  9/22/2012
>  CR
>  1000
>   1002
>  9/25/2012
>  DR
>  500
>   1003
>  10/1/2012
>  DR
>  1500
>   1004
>  10/4/2012
>  CR
>  2000
>
> When this file passed the logic should append the below 2 columns to the
> output for each record above -
>   CR Cumulative Amount
>  DR Cumulative Amount
>   1000
>  0
>   1000
>  500
>   1000
>  2000
>   3000
>  2000
>
> Hope the problem is clear now. Please provide your suggestions on the
> approach to the solution.
>
> Regards,
> Sarath.
>
> On Friday 05 October 2012 02:51 AM, Bertrand Dechoux wrote:
>
> I indeed didn't catch the cumulative sum part. Then I guess it begs for
> what-is-often-called-a-secondary-sort, if you want to compute different
> cumulative sums during the same job. It can be more or less easy to
> implement depending on which API/library/tool you are using. Ted comments
> on performance are spot on.
>
>  Regards
>
>  Bertrand
>
> On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964 <[EMAIL PROTECTED]>wrote:
>
>>  I did the cumulative sum in the HIVE UDF, as one of the project for my
>> employer.
>>
>>  1) You need to decide the grouping elements for your cumulative. For
>> example, an account, a department etc. In the mapper, combine these
>> information as your omit key.
>> 2) If you don't have any grouping requirement, you just want a cumulative
>> sum for all your data, then send all the data to one common key, so they
>> will all go to the same reducer.
>> 3) When you calculate the cumulative sum, does the output need to have a
>> sorting order? If so, you need to do the 2nd sorting, so the data will be
>> sorted as the order you want in the reducer.
>> 4) In the reducer, just do the sum, omit every value per original record
>> (Not per key).
>>
>>  I will suggest you do this in the UDF of HIVE, as it is much easy, if
>> you can build a HIVE schema on top of your data.
>>
>>  Yong
>>
>>  ------------------------------
>> From: [EMAIL PROTECTED]
>> Date: Thu, 4 Oct 2012 18:52:09 +0100
>> Subject: Re: Cumulative value using mapreduce
>> To: [EMAIL PROTECTED]
>>
>>
>> Bertrand is almost right.
>>
>>  The only difference is that the original poster asked about cumulative
>> sum.
>>
>>  This can be done in reducer exactly as Bertrand described except for
>> two points that make it different from word count:
>>
>>  a) you can't use a combiner
>>
>>  b) the output of the program is as large as the input so it will have
>> different performance characteristics than aggregation programs like
>> wordcount.
>>
>>  Bertrand's key recommendation to go read a book is the most important
>> advice.
>>
>> On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>>
>> Hi,
>>
>>  It sounds like a