Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Is the order of collected outputs in the map step preserved til the reduce step?


Copy link to this message
-
Re: Is the order of collected outputs in the map step preserved til the reduce step?
There is no such guarantee made by the framework. The only guarantee
is made at the key-sort level, that ensures that each iteration of
reduce() only carries one key and all associated values (in no
particular order) and that the keys overall are iterated in proper,
sorted order.

However, you can solve this form of a requirement using a secondary
sort technique:
http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

On Thu, May 10, 2012 at 3:06 PM, Björn-Elmar Macek
<[EMAIL PROTECTED]> wrote:
> Hello all,
>
> i am currently working with a set of data which is chronologically ordered
> (every data element has a timestamp and they are monotonically increasing).
> Please correct me, if i am mistaken, but the data should "arrive"
> chronologically ordered at the mapper, right? But is the order in which i
> push values to the output preserved so that the Iterator given as a
> parameter of the reduce-function contains the values also chronologically
> ordered?
>
> Thank you for help in advance!
>
> Best regards,
> Björn

--
Harsh J