Then I think you might be best exploring running a getmerge on each
client. How you trigger that is up to you, but something like Fabric 
might help. Others might propose different solutions, but it doesn't sound
like MR is a natural choice to me.
I would expect this is the very fastest way of getting the data locally.
There is one alternative you might consider - set the replication factor to
be the same as the number of machines for whatever is producing the input
files. This way they will all be local, although will likely be split into
multiple files (part000001 etc)
I hope this helps,
On Thu, Aug 23, 2012 at 1:08 PM, Hamid Oliaei <[EMAIL PROTECTED]> wrote:
> First of all, thank you Tim for giving your time.
> The answer of first question is yes.
> My inputs are in format of triples (sub,pre,obj) and they are stored on
> the HDFS.
> The problem is: After running some MR jobs,some data generated in all
> machines and I want to each machine send part of that to others in minimum
> time, using for next phase.
> I know that this is unfamiliar with MR nature but that was the first
> solution coming to my mind and I am glad to know other suggestions.