I'm not sure I fully understand your description of the problem, but
you might want to check out the JIRA ticket for making the replica
placement algorithms in HDFS pluggable
(https://issues.apache.org/jira/browse/HADOOP-3799) and add your use
On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS <[EMAIL PROTECTED]> wrote:
> We have thousands of files, each dedicated to a user. (Each user has
> access to other users' files, but they do this not very often.)
> Each user runs map-reduce jobs on the cluster.
> So we should seperate his/her files equally across the cluster,
> so that every machine can take part in the process (assuming he/she is
> the only user running jobs).
> For this we should initially copy files to specified nodes:
> User A : first file : Node 1, second file: Node 2, .. etc.
> User B : first file : Node 1, second file: Node 2, .. etc.
> I know, hadoop create also replicas, but in our solution at least one
> file will be in the right place
> (or we're willing to control other replicas too).
> Rebalancing is also not a problem, assuming it uses the information
> about how much a computer is in use.
> It even helps for a better organization of files.
> How can we copy files to specified nodes?
> Or do you have a better solution for us?
> I couldn't find a solution to this, probably such an option doesn't exist.
> But I wanted to take an expert's opinion about this.
> Thanks in advance..