|
|
-
Very large file copied to cluster, and the copy fails. All blocks bad
Saptarshi Guha 2009-02-13, 02:50
hello, I have a 42 GB file on the local fs(call the machine A) which i need to copy to a HDFS (replicattion 1), according the HDFS webtracker it has 208GB across 7 machines. Note, the machine A has about 80 GB total, so there is no place to store copies of the file. Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails, with all blocks being bad. This is not surprising since the file is copied entirely to the HDFS region that resides on A. Had the file been copied across all machines, this would not have failed.
I have more experience with mapreduce and not much with the hdfs side of things. Is there a configuration option i'm missing that forces the file to be split across the machines(when it is being copied)? -- Saptarshi Guha - [EMAIL PROTECTED]
-
Re: Very large file copied to cluster, and the copy fails. All blocks bad
TCK 2009-02-13, 04:03
Did you run the copy command from machine A? I believe that if you do the copy from an hdfs client that is on the same machine as a data node, then for each block the primary copy always goes to that data node, and only the replicas get distributed among other data nodes. I ran into this issue once -- I had to have the client doing the copy either on the master or on an "off-cluster" node. -TCK
--- On Thu, 2/12/09, Saptarshi Guha <[EMAIL PROTECTED]> wrote: From: Saptarshi Guha <[EMAIL PROTECTED]> Subject: Very large file copied to cluster, and the copy fails. All blocks bad To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: Thursday, February 12, 2009, 9:50 PM
hello, I have a 42 GB file on the local fs(call the machine A) which i need to copy to a HDFS (replicattion 1), according the HDFS webtracker it has 208GB across 7 machines. Note, the machine A has about 80 GB total, so there is no place to store copies of the file. Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails, with all blocks being bad. This is not surprising since the file is copied entirely to the HDFS region that resides on A. Had the file been copied across all machines, this would not have failed.
I have more experience with mapreduce and not much with the hdfs side of things. Is there a configuration option i'm missing that forces the file to be split across the machines(when it is being copied)? -- Saptarshi Guha - [EMAIL PROTECTED]
-
Re: Very large file copied to cluster, and the copy fails. All blocks bad
Saptarshi Guha 2009-02-13, 05:51
> Did you run the copy command from machine A? Yes, exactly. > I had to have the client doing the copy either on the master or on an "off-cluster" Thanks! I uploaded it from an off cluster (i.e not participating in the hdfs) and it worked splendidly.
Regards Saptarshi On Thu, Feb 12, 2009 at 11:03 PM, TCK <[EMAIL PROTECTED]> wrote: > I believe that if you do the copy from an hdfs client that is on the same machine as a data node, then for each block the primary copy always goes to that data node, and only the replicas get distributed among other data nodes. I ran into this issue once -- I had to have the client doing the copy either on the master or on an "off-cluster" node. > -TCK > > > > --- On Thu, 2/12/09, Saptarshi Guha <[EMAIL PROTECTED]> wrote: > From: Saptarshi Guha <[EMAIL PROTECTED]> > Subject: Very large file copied to cluster, and the copy fails. All blocks bad > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Thursday, February 12, 2009, 9:50 PM > > hello, > I have a 42 GB file on the local fs(call the machine A) which i need > to copy to a HDFS (replicattion 1), according the HDFS webtracker it > has 208GB across 7 machines. > Note, the machine A has about 80 GB total, so there is no place to > store copies of the file. > Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails, > with all blocks being bad. This is not surprising since the file is > copied entirely to the HDFS region that resides on A. Had the file > been copied across all machines, this would not have failed. > > I have more experience with mapreduce and not much with the hdfs side > of things. > Is there a configuration option i'm missing that forces the file to be > split across the machines(when it is being copied)? > -- > Saptarshi Guha - [EMAIL PROTECTED] > > > >
-- Saptarshi Guha - [EMAIL PROTECTED]
|
|