Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Sending data to all reducers


+
Hamid Oliaei 2012-08-23, 08:41
+
Tim Robertson 2012-08-23, 08:44
+
Hamid Oliaei 2012-08-23, 08:47
Copy link to this message
-
Re: Sending data to all reducers
Sorry to ask too many questions, but it will help the user list best offer
you advice, as this is not a typical MR use case.

- Do you foresee the reducer store the data on a local files system to the
machine?
- Do you need to use specific input formats for the job, or is it really
just text files?
- Are the input files on the HDFS, or are you (e.g.) reading from HBase, or
some other source?

If your data is on HDFS, and if it is just text files, have you considered
a simple HDFS getMerge on each machine?  You could use several tools (e.g.
Fabric) which could trigger a getMerge on each machine.

The problems with MR for this, is that you would be circumventing (if it is
at all possible) the job scheduling which is trying to balance the load
across the cluster.

Cheers,
Tim

On Thu, Aug 23, 2012 at 10:47 AM, Hamid Oliaei <[EMAIL PROTECTED]> wrote:

> exactly!!
>
>
+
Hamid Oliaei 2012-08-23, 11:08
+
Tim Robertson 2012-08-23, 13:14
+
Hamid Oliaei 2012-08-23, 13:20
+
Sonal Goyal 2012-08-23, 13:37