Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Very large file copied to cluster, and the copy fails. All blocks bad


Copy link to this message
-
Re: Very large file copied to cluster, and the copy fails. All blocks bad
> Did you run the copy command from machine A?
Yes, exactly.
> I had to have the client doing the copy either on the master or on an "off-cluster"
 Thanks! I uploaded it from an off cluster (i.e not participating in
the hdfs) and it worked splendidly.

Regards
Saptarshi
On Thu, Feb 12, 2009 at 11:03 PM, TCK <[EMAIL PROTECTED]> wrote:
>
 I believe that if you do the copy from an hdfs client that is on the
same machine as a data node, then for each block the primary copy
always goes to that data node, and only the replicas get distributed
among other data nodes. I ran into this issue once -- I had to have
the client doing the copy either on the master or on an "off-cluster"
node.
> -TCK
>
>
>
> --- On Thu, 2/12/09, Saptarshi Guha <[EMAIL PROTECTED]> wrote:
> From: Saptarshi Guha <[EMAIL PROTECTED]>
> Subject: Very large file copied to cluster, and the copy fails. All blocks bad
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Thursday, February 12, 2009, 9:50 PM
>
> hello,
> I have a 42 GB file on the local fs(call the machine A)  which i need
> to copy to a HDFS (replicattion 1), according the HDFS webtracker it
> has 208GB across 7 machines.
> Note, the machine A has about 80 GB total, so there is no place to
> store copies of the file.
> Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails,
> with all blocks being bad. This is not surprising since the file is
> copied entirely to the HDFS region that resides on A. Had the file
> been copied across all machines, this would not have failed.
>
> I have more experience with mapreduce and not much with the hdfs side
> of things.
> Is there a configuration option i'm missing that forces the file to be
> split across the machines(when it is being copied)?
> --
> Saptarshi Guha - [EMAIL PROTECTED]
>
>
>
>

--
Saptarshi Guha - [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB