Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Cumulative value using mapreduce


Copy link to this message
-
Re: Cumulative value using mapreduce
I indeed didn't catch the cumulative sum part. Then I guess it begs for
what-is-often-called-a-secondary-sort, if you want to compute different
cumulative sums during the same job. It can be more or less easy to
implement depending on which API/library/tool you are using. Ted comments
on performance are spot on.

Regards

Bertrand

On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964 <[EMAIL PROTECTED]>wrote:

>  I did the cumulative sum in the HIVE UDF, as one of the project for my
> employer.
>
> 1) You need to decide the grouping elements for your cumulative. For
> example, an account, a department etc. In the mapper, combine these
> information as your omit key.
> 2) If you don't have any grouping requirement, you just want a cumulative
> sum for all your data, then send all the data to one common key, so they
> will all go to the same reducer.
> 3) When you calculate the cumulative sum, does the output need to have a
> sorting order? If so, you need to do the 2nd sorting, so the data will be
> sorted as the order you want in the reducer.
> 4) In the reducer, just do the sum, omit every value per original record
> (Not per key).
>
> I will suggest you do this in the UDF of HIVE, as it is much easy, if you
> can build a HIVE schema on top of your data.
>
> Yong
>
> ------------------------------
> From: [EMAIL PROTECTED]
> Date: Thu, 4 Oct 2012 18:52:09 +0100
> Subject: Re: Cumulative value using mapreduce
> To: [EMAIL PROTECTED]
>
>
> Bertrand is almost right.
>
> The only difference is that the original poster asked about cumulative sum.
>
> This can be done in reducer exactly as Bertrand described except for two
> points that make it different from word count:
>
> a) you can't use a combiner
>
> b) the output of the program is as large as the input so it will have
> different performance characteristics than aggregation programs like
> wordcount.
>
> Bertrand's key recommendation to go read a book is the most important
> advice.
>
> On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>
> Hi,
>
> It sounds like a
> 1) group information by account
> 2) compute sum per account
>
> If that not the case, you should precise a bit more about your context.
>
> This computing looks like a small variant of wordcount. If you do not know
> how to do it, you should read books about Hadoop MapReduce and/or online
> tutorial. Yahoo's is old but still a nice read to begin with :
> http://developer.yahoo.com/hadoop/tutorial/
>
> Regards,
>
> Bertrand
>
>
> On Thu, Oct 4, 2012 at 3:58 PM, Sarath <
> [EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have a file which has some financial transaction data. Each transaction
> will have amount and a credit/debit indicator.
> I want to write a mapreduce program which computes cumulative credit &
> debit amounts at each record
> and append these values to the record before dumping into the output file.
>
> Is this possible? How can I achieve this? Where should i put the logic of
> computing the cumulative values?
>
> Regards,
> Sarath.
>
>
>
>
> --
> Bertrand Dechoux
>
>
>
--
Bertrand Dechoux