Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Why do some blocks refuse to replicate...?


+
Felix GV 2013-03-28, 20:00
+
MARCOS MEDRADO RUBINELLI 2013-03-28, 20:45
+
Felix GV 2013-03-28, 21:23
Copy link to this message
-
Re: Why do some blocks refuse to replicate...?
Tapas Sarangi 2013-03-29, 00:25

On Mar 28, 2013, at 7:13 PM, Felix GV <[EMAIL PROTECTED]> wrote:

> I'm using the version of hadoop in CDH 4.2, which is a version of Hadoop 2.0 with a bunch of patches on top...
>
> I've tried copying one block and its .meta file to one of my new DN, then restarted the DN service, and it did pick up the missing block and replicate it properly within the new slaves. So... that's great, but it's also super annoying, I don't want to have to pick and choose each block manually. Although I guess I could parse the output of fsck, figure out where the blocks are and script the whole thing.

This should flush out all the corrupt or missing blocks :

hadoop fsck <path to HDFS dir> -files -blocks -locations | egrep  "CORRUPT|MISSING"

You can put in a for loop and copy them another node. None but little scripting.

------------
>
> I'm now trying to rsync all of the data from an old node to a new one, and see if it's gonna be able to pick that up, but I'm afraid the subdir structure might not port over nicely to the new node. Also, this is acceptable today to save me from picking each block manually (or come up with the script) because I don't have that much data on the old node, but if I had gotten in that situation with a large amount of data, that would not have been a very good solution...
>
> I'll report back when I'll have made some more progres...
>
> --
> Felix
>
>
> On Thu, Mar 28, 2013 at 7:01 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> which hadoop version you used?
>
> On Mar 29, 2013 5:24 AM, "Felix GV" <[EMAIL PROTECTED]> wrote:
> >
> > Yes, I didn't specify how I was testing my changes, but basically, here's what I did:
> >
> > My hdfs-site.xml file was modified to include a reference the a file containing a list of all datanodes (via dfs.hosts) and a reference to a file containing decommissioned nodes (via dfs.hosts.exclude). After that, I just changed these files, not hdfs-site.xml.
> >
> > I first added all my old nodes in the dfs.hosts.exclude file, did hdfs dfsadmin -refreshNodes, and most of the data replicated correctly.
> >
> > I then tried removing all old nodes from the dfs.hosts file, did hdfs dfsadmin -refreshNodes, and I saw that I now had a coupe of corrupt and missing blocks (60 of them).
> >
> > I re-added all the old nodes in the dfs.hosts file, and removed them gradually, each time doing the refreshNodes or restarting the NN, and I narrowed it down to three datanodes in particular, which seem to be the three nodes where all of those 60 blocks are located.
> >
> > Is it possible, perhaps, that these three nodes are completely incapable of replicating what they have (because they're corrupt or something), and so every block was replicated from other nodes, but the blocks that happened to be located on these three nodes are... doomed? I can see the data in those blocks in the NN hdfs browser, so I guess it's not corrupted... I also tried pinging the new nodes from those old ones and it works too, so I guess there is no network partition...
> >
> > I'm in the process of increasing replication factor above 3, but I don't know if that's gonna do anything...
> >
> > --
> > Felix
> >
> >
> > On Thu, Mar 28, 2013 at 4:45 PM, MARCOS MEDRADO RUBINELLI <[EMAIL PROTECTED]> wrote:
> >>
> >> Felix,
> >>
> >> After changing hdfs-site.xml, did you run "hadoop dfsadmin -refreshNodes"? That should have been enough, but you can try increasing the replication factor of these files, wait for them to be replicated to the new nodes, then setting it back to its original value.
> >>
> >> Cheers,
> >> Marcos
> >>
> >>
> >> In 28-03-2013 17:00, Felix GV wrote:
> >>>
> >>> Hello,
> >>>
> >>> I've been running a virtualized CDH 4.2 cluster. I now want to migrate all my data to another (this time physical) set of slaves and then stop using the virtualized slaves.
> >>>
> >>> I added the new physical slaves in the cluster, and marked all the old virtualized slaves as decommissioned using the dfs.hosts.exclude setting in hdfs-site.xml.
+
Felix GV 2013-04-13, 18:43