Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Sending data to all reducers


Copy link to this message
-
Re: Sending data to all reducers
Then I think you might be best exploring running a getmerge on each
client.  How you trigger that is up to you, but something like Fabric [1]
might help.  Others might propose different solutions, but it doesn't sound
like MR is a natural choice to me.

I would expect this is the very fastest way of getting the data locally.

There is one alternative you might consider - set the replication factor to
be the same as the number of machines for whatever is producing the input
files.  This way they will all be local, although will likely be split into
multiple files (part000001 etc)

I hope this helps,
Tim

[1] http://docs.fabfile.org/en/1.4.3/index.html

On Thu, Aug 23, 2012 at 1:08 PM, Hamid Oliaei <[EMAIL PROTECTED]> wrote:

> Hi,
>
> First of all, thank you Tim for giving your time.
>
> The answer of first question is yes.
> My inputs are in  format of triples (sub,pre,obj) and they are stored on
> the HDFS.
> The problem is: After running some MR jobs,some data generated in all
> machines and I want to each machine send part of that to others in minimum
> time, using for next phase.
> I know that this is unfamiliar with MR nature but that was the first
> solution coming to my mind and I am glad to know other suggestions.
>
> Regards,
> Hamid
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB