Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Copying a file to specified nodes


Copy link to this message
-
Re: Copying a file to specified nodes
Jeff Hammerbacher 2009-02-10, 20:50
Hey Rasit,

I'm not sure I fully understand your description of the problem, but
you might want to check out the JIRA ticket for making the replica
placement algorithms in HDFS pluggable
(https://issues.apache.org/jira/browse/HADOOP-3799) and add your use
case there.

Regards,
Jeff

On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> We have thousands of files, each dedicated to a user.  (Each user has
> access to other users' files, but they do this not very often.)
> Each user runs map-reduce jobs on the cluster.
> So we should seperate his/her files equally across the cluster,
> so that every machine can take part in the process (assuming he/she is
> the only user running jobs).
> For this we should initially copy files to specified nodes:
> User A :   first file : Node 1, second file: Node 2, .. etc.
> User B :   first file : Node 1, second file: Node 2, .. etc.
>
> I know, hadoop create also replicas, but in our solution at least one
> file will be in the right place
> (or we're willing to control other replicas too).
>
> Rebalancing is also not a problem, assuming it uses the information
> about how much a computer is in use.
> It even helps for a better organization of files.
>
> How can we copy files to specified nodes?
> Or do you have a better solution for us?
>
> I couldn't find a solution to this, probably such an option doesn't exist.
> But I wanted to take an expert's opinion about this.
>
> Thanks in advance..
> Rasit