Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Cumulative value using mapreduce


+
Sarath 2012-10-04, 13:58
+
Bertrand Dechoux 2012-10-04, 16:20
+
Ted Dunning 2012-10-04, 17:52
+
java8964 java8964 2012-10-04, 19:02
Copy link to this message
-
Re: Cumulative value using mapreduce
I indeed didn't catch the cumulative sum part. Then I guess it begs for
what-is-often-called-a-secondary-sort, if you want to compute different
cumulative sums during the same job. It can be more or less easy to
implement depending on which API/library/tool you are using. Ted comments
on performance are spot on.

Regards

Bertrand

On Thu, Oct 4, 2012 at 9:02 PM, java8964 java8964 <[EMAIL PROTECTED]>wrote:

>  I did the cumulative sum in the HIVE UDF, as one of the project for my
> employer.
>
> 1) You need to decide the grouping elements for your cumulative. For
> example, an account, a department etc. In the mapper, combine these
> information as your omit key.
> 2) If you don't have any grouping requirement, you just want a cumulative
> sum for all your data, then send all the data to one common key, so they
> will all go to the same reducer.
> 3) When you calculate the cumulative sum, does the output need to have a
> sorting order? If so, you need to do the 2nd sorting, so the data will be
> sorted as the order you want in the reducer.
> 4) In the reducer, just do the sum, omit every value per original record
> (Not per key).
>
> I will suggest you do this in the UDF of HIVE, as it is much easy, if you
> can build a HIVE schema on top of your data.
>
> Yong
>
> ------------------------------
> From: [EMAIL PROTECTED]
> Date: Thu, 4 Oct 2012 18:52:09 +0100
> Subject: Re: Cumulative value using mapreduce
> To: [EMAIL PROTECTED]
>
>
> Bertrand is almost right.
>
> The only difference is that the original poster asked about cumulative sum.
>
> This can be done in reducer exactly as Bertrand described except for two
> points that make it different from word count:
>
> a) you can't use a combiner
>
> b) the output of the program is as large as the input so it will have
> different performance characteristics than aggregation programs like
> wordcount.
>
> Bertrand's key recommendation to go read a book is the most important
> advice.
>
> On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>
> Hi,
>
> It sounds like a
> 1) group information by account
> 2) compute sum per account
>
> If that not the case, you should precise a bit more about your context.
>
> This computing looks like a small variant of wordcount. If you do not know
> how to do it, you should read books about Hadoop MapReduce and/or online
> tutorial. Yahoo's is old but still a nice read to begin with :
> http://developer.yahoo.com/hadoop/tutorial/
>
> Regards,
>
> Bertrand
>
>
> On Thu, Oct 4, 2012 at 3:58 PM, Sarath <
> [EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have a file which has some financial transaction data. Each transaction
> will have amount and a credit/debit indicator.
> I want to write a mapreduce program which computes cumulative credit &
> debit amounts at each record
> and append these values to the record before dumping into the output file.
>
> Is this possible? How can I achieve this? Where should i put the logic of
> computing the cumulative values?
>
> Regards,
> Sarath.
>
>
>
>
> --
> Bertrand Dechoux
>
>
>
--
Bertrand Dechoux
+
Sarath 2012-10-05, 04:56
+
Bertrand Dechoux 2012-10-05, 06:38
+
Ted Dunning 2012-10-05, 05:50
+
Steve Loughran 2012-10-05, 14:43
+
Jane Wayne 2012-10-05, 15:21
+
Jane Wayne 2012-10-05, 15:31
+
java8964 java8964 2012-10-05, 14:03
+
Sarath 2012-10-19, 06:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB