-Re: Very large file copied to cluster, and the copy fails. All blocks bad
Saptarshi Guha 2009-02-13, 05:51
> Did you run the copy command from machine A?
> I had to have the client doing the copy either on the master or on an "off-cluster"
Thanks! I uploaded it from an off cluster (i.e not participating in
the hdfs) and it worked splendidly.
On Thu, Feb 12, 2009 at 11:03 PM, TCK <[EMAIL PROTECTED]> wrote:
I believe that if you do the copy from an hdfs client that is on the
same machine as a data node, then for each block the primary copy
always goes to that data node, and only the replicas get distributed
among other data nodes. I ran into this issue once -- I had to have
the client doing the copy either on the master or on an "off-cluster"
> --- On Thu, 2/12/09, Saptarshi Guha <[EMAIL PROTECTED]> wrote:
> From: Saptarshi Guha <[EMAIL PROTECTED]>
> Subject: Very large file copied to cluster, and the copy fails. All blocks bad
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Thursday, February 12, 2009, 9:50 PM
> I have a 42 GB file on the local fs(call the machine A) which i need
> to copy to a HDFS (replicattion 1), according the HDFS webtracker it
> has 208GB across 7 machines.
> Note, the machine A has about 80 GB total, so there is no place to
> store copies of the file.
> Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails,
> with all blocks being bad. This is not surprising since the file is
> copied entirely to the HDFS region that resides on A. Had the file
> been copied across all machines, this would not have failed.
> I have more experience with mapreduce and not much with the hdfs side
> of things.
> Is there a configuration option i'm missing that forces the file to be
> split across the machines(when it is being copied)?
> Saptarshi Guha - [EMAIL PROTECTED]
Saptarshi Guha - [EMAIL PROTECTED]