|
Aji Janis
2012-08-10, 18:38
Mohammad Tariq
2012-08-10, 18:43
Ted Dunning
2012-08-10, 18:55
anil gupta
2012-08-10, 19:12
Mohammad Tariq
2012-08-10, 19:16
Harsh J
2012-08-12, 12:43
Harsh J
2012-08-12, 12:45
Mohammad Tariq
2012-08-12, 17:47
Arun C Murthy
2012-08-12, 19:07
Aji Janis
2012-08-13, 13:57
Harsh J
2012-08-13, 14:55
Harsh J
2012-08-13, 15:42
Steve Loughran
2012-08-13, 16:10
Jeffrey Buell
2012-08-14, 00:15
|
-
Hadoop hardware failure recoveryAji Janis 2012-08-10, 18:38
I am very new to Hadoop. I am considering setting up a Hadoop cluster
consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance.
-
Re: Hadoop hardware failure recoveryMohammad Tariq 2012-08-10, 18:43
Hello Aji,
Hadoop's redundancy feature allows data to be replicated over the entire cluster. So, even if entire disk is gone or even the entire machine for that matter, your data is still there in other node(s). But, we need to keep one thing in mind that the 'master' node is the single point of failure in a Hadoop cluster. If the machine running master process(es) is down, you are trapped. For more detail you can visit the home page at : redundancy feature Regards, Mohammad Tariq On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <[EMAIL PROTECTED]> wrote: > I am very new to Hadoop. I am considering setting up a Hadoop cluster > consisting of 5 nodes where each node has 3 internal hard drives. I > understand HDFS has a configurable redundancy feature but what happens if an > entire drive crashes (physically) for whatever reason? How does Hadoop > recover, if it can, from this situation? What else should I know before > setting up my cluster this way? Thanks in advance. > >
-
Re: Hadoop hardware failure recoveryTed Dunning 2012-08-10, 18:55
Hadoop's file system was (mostly) copied from the concepts of Google's old
file system. The original paper is probably the best way to learn about that. http://research.google.com/archive/gfs.html On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: > I am very new to Hadoop. I am considering setting up a Hadoop cluster > consisting of 5 nodes where each node has 3 internal hard drives. I > understand HDFS has a configurable redundancy feature but what happens if > an entire drive crashes (physically) for whatever reason? How does Hadoop > recover, if it can, from this situation? What else should I know before > setting up my cluster this way? Thanks in advance. > > >
-
Re: Hadoop hardware failure recoveryanil gupta 2012-08-10, 19:12
Hi Aji,
Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha). Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0. HTH, Anil Gupta On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Hadoop's file system was (mostly) copied from the concepts of Google's old > file system. > > The original paper is probably the best way to learn about that. > > http://research.google.com/archive/gfs.html > > > > On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: > >> I am very new to Hadoop. I am considering setting up a Hadoop cluster >> consisting of 5 nodes where each node has 3 internal hard drives. I >> understand HDFS has a configurable redundancy feature but what happens if >> an entire drive crashes (physically) for whatever reason? How does Hadoop >> recover, if it can, from this situation? What else should I know before >> setting up my cluster this way? Thanks in advance. >> >> >> > -- Thanks & Regards, Anil Gupta
-
Re: Hadoop hardware failure recoveryMohammad Tariq 2012-08-10, 19:16
Very correctly said by Anil. Actually Hadoop HA is not yet production
ready and you are about to begin your Hadoop journey, so just thought of not mentioning it. If you want to use HA, just pull it from the trunk and do a build. Regards, Mohammad Tariq On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Aji, > > Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then > Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of > production quality yet(its in Alpha). > Namenode use to be a Single Point of Failure in releases prior to Hadoop > 2.0.0. > > HTH, > Anil Gupta > > > On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> >> Hadoop's file system was (mostly) copied from the concepts of Google's old >> file system. >> >> The original paper is probably the best way to learn about that. >> >> http://research.google.com/archive/gfs.html >> >> >> >> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: >>> >>> I am very new to Hadoop. I am considering setting up a Hadoop cluster >>> consisting of 5 nodes where each node has 3 internal hard drives. I >>> understand HDFS has a configurable redundancy feature but what happens if an >>> entire drive crashes (physically) for whatever reason? How does Hadoop >>> recover, if it can, from this situation? What else should I know before >>> setting up my cluster this way? Thanks in advance. >>> >>> >> > > > > -- > Thanks & Regards, > Anil Gupta
-
Re: Hadoop hardware failure recoveryHarsh J 2012-08-12, 12:43
Mohammad,
The HA feature is very much functional, if you've tried it. I would not say it is "not production ready", for I see it in use at several places already, including my humbly small MBP (two local NNs, just for the fun of it) :) On Sat, Aug 11, 2012 at 12:46 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Very correctly said by Anil. Actually Hadoop HA is not yet production > ready and you are about to begin your Hadoop journey, so just thought > of not mentioning it. If you want to use HA, just pull it from the > trunk and do a build. > > Regards, > Mohammad Tariq > > > On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <[EMAIL PROTECTED]> wrote: >> Hi Aji, >> >> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then >> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of >> production quality yet(its in Alpha). >> Namenode use to be a Single Point of Failure in releases prior to Hadoop >> 2.0.0. >> >> HTH, >> Anil Gupta >> >> >> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: >>> >>> Hadoop's file system was (mostly) copied from the concepts of Google's old >>> file system. >>> >>> The original paper is probably the best way to learn about that. >>> >>> http://research.google.com/archive/gfs.html >>> >>> >>> >>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: >>>> >>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster >>>> consisting of 5 nodes where each node has 3 internal hard drives. I >>>> understand HDFS has a configurable redundancy feature but what happens if an >>>> entire drive crashes (physically) for whatever reason? How does Hadoop >>>> recover, if it can, from this situation? What else should I know before >>>> setting up my cluster this way? Thanks in advance. >>>> >>>> >>> >> >> >> >> -- >> Thanks & Regards, >> Anil Gupta -- Harsh J
-
Re: Hadoop hardware failure recoveryHarsh J 2012-08-12, 12:45
Aji,
Your question seems to be about data (blocks) loss. Hadoop would recover by detecting the blocks you've lost from that disk failure (auto) and re-replicate their copies from the other available redundant copies in the cluster (Controlled via the configurable replication factor). On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <[EMAIL PROTECTED]> wrote: > I am very new to Hadoop. I am considering setting up a Hadoop cluster > consisting of 5 nodes where each node has 3 internal hard drives. I > understand HDFS has a configurable redundancy feature but what happens if an > entire drive crashes (physically) for whatever reason? How does Hadoop > recover, if it can, from this situation? What else should I know before > setting up my cluster this way? Thanks in advance. > > -- Harsh J
-
Re: Hadoop hardware failure recoveryMohammad Tariq 2012-08-12, 17:47
Hello Harsh,
Thanks a lot for keeping me updated. @Aji : Apologies for misguiding you unintentionally. Regards, Mohammad Tariq On Sun, Aug 12, 2012 at 6:15 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Aji, > > Your question seems to be about data (blocks) loss. Hadoop would > recover by detecting the blocks you've lost from that disk failure > (auto) and re-replicate their copies from the other available > redundant copies in the cluster (Controlled via the configurable > replication factor). > > On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <[EMAIL PROTECTED]> wrote: > > I am very new to Hadoop. I am considering setting up a Hadoop cluster > > consisting of 5 nodes where each node has 3 internal hard drives. I > > understand HDFS has a configurable redundancy feature but what happens > if an > > entire drive crashes (physically) for whatever reason? How does Hadoop > > recover, if it can, from this situation? What else should I know before > > setting up my cluster this way? Thanks in advance. > > > > > > > > -- > Harsh J >
-
Re: Hadoop hardware failure recoveryArun C Murthy 2012-08-12, 19:07
Yep, hadoop-2 is alpha but is progressing nicely...
However, if you have access to some 'enterprise HA' utilities (VMWare or Linux HA) you can get *very decent* production-grade high-availability in hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce). Arun On Aug 10, 2012, at 12:12 PM, anil gupta wrote: > Hi Aji, > > Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha). > Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0. > > HTH, > Anil Gupta > > On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Hadoop's file system was (mostly) copied from the concepts of Google's old file system. > > The original paper is probably the best way to learn about that. > > http://research.google.com/archive/gfs.html > > > > On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: > I am very new to Hadoop. I am considering setting up a Hadoop cluster consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance. > > > > > > > -- > Thanks & Regards, > Anil Gupta -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
-
Re: Hadoop hardware failure recoveryAji Janis 2012-08-13, 13:57
Thank you everyone for all the feedback and suggestions. Its good to know
these details as I move forward. Piling on to the question, I am curious if any of you have experience with Accumulo (a requirement for me hence not optional). I was wondering if the data loss (physical crash of the hard drive) in this case would be resolved by Hadoop (HDFS I should say). Any suggestions and/or where I could find some specs on this would be really appreciated! Thank you again for all the pointers. -Aji On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Yep, hadoop-2 is alpha but is progressing nicely... > > However, if you have access to some 'enterprise HA' utilities (VMWare or > Linux HA) you can get *very decent* production-grade high-availability in > hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce). > > Arun > > On Aug 10, 2012, at 12:12 PM, anil gupta wrote: > > Hi Aji, > > Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha > then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not > of production quality yet(its in Alpha). > Namenode use to be a Single Point of Failure in releases prior to Hadoop > 2.0.0. > > HTH, > Anil Gupta > > On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <[EMAIL PROTECTED]>wrote: > >> Hadoop's file system was (mostly) copied from the concepts of Google's >> old file system. >> >> The original paper is probably the best way to learn about that. >> >> http://research.google.com/archive/gfs.html >> >> >> >> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: >> >>> I am very new to Hadoop. I am considering setting up a Hadoop cluster >>> consisting of 5 nodes where each node has 3 internal hard drives. I >>> understand HDFS has a configurable redundancy feature but what happens if >>> an entire drive crashes (physically) for whatever reason? How does Hadoop >>> recover, if it can, from this situation? What else should I know before >>> setting up my cluster this way? Thanks in advance. >>> >>> >>> >> > > > -- > Thanks & Regards, > Anil Gupta > > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > >
-
Re: Hadoop hardware failure recoveryHarsh J 2012-08-13, 14:55
Aji,
The best place would be to ask on Apache Accumulo's own user lists, subscrib-able at http://accumulo.apache.org/mailing_list.html That said, if Accumulo bases itself on HDFS, then its data safety should be the same or nearly the same as what HDFS itself can offer. Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a working hsync() API that allows you to write files with guarantee that it has been written to the disk (like the fsync() *nix call). You can read some more about this at an earlier thread: http://search-hadoop.com/m/ATVOETSy4X1 HTH, and do let us know what you find on the Accumulo side. On Mon, Aug 13, 2012 at 7:27 PM, Aji Janis <[EMAIL PROTECTED]> wrote: > Thank you everyone for all the feedback and suggestions. Its good to know > these details as I move forward. > > Piling on to the question, I am curious if any of you have experience with > Accumulo (a requirement for me hence not optional). I was wondering if the > data loss (physical crash of the hard drive) in this case would be resolved > by Hadoop (HDFS I should say). Any suggestions and/or where I could find > some specs on this would be really appreciated! > > > Thank you again for all the pointers. > -Aji > > > > > > > > > On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> >> Yep, hadoop-2 is alpha but is progressing nicely... >> >> However, if you have access to some 'enterprise HA' utilities (VMWare or >> Linux HA) you can get *very decent* production-grade high-availability in >> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce). >> >> Arun >> >> On Aug 10, 2012, at 12:12 PM, anil gupta wrote: >> >> Hi Aji, >> >> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha >> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not >> of production quality yet(its in Alpha). >> Namenode use to be a Single Point of Failure in releases prior to Hadoop >> 2.0.0. >> >> HTH, >> Anil Gupta >> >> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <[EMAIL PROTECTED]> >> wrote: >>> >>> Hadoop's file system was (mostly) copied from the concepts of Google's >>> old file system. >>> >>> The original paper is probably the best way to learn about that. >>> >>> http://research.google.com/archive/gfs.html >>> >>> >>> >>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <[EMAIL PROTECTED]> wrote: >>>> >>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster >>>> consisting of 5 nodes where each node has 3 internal hard drives. I >>>> understand HDFS has a configurable redundancy feature but what happens if an >>>> entire drive crashes (physically) for whatever reason? How does Hadoop >>>> recover, if it can, from this situation? What else should I know before >>>> setting up my cluster this way? Thanks in advance. >>>> >>>> >>> >> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> > -- Harsh J
-
Re: Hadoop hardware failure recoveryHarsh J 2012-08-13, 15:42
Hey Steve,
Interesting, thanks for pointing that out! I didn't know that it disables this by default :) On Mon, Aug 13, 2012 at 8:38 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > > > On 13 August 2012 07:55, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >> Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a >> working hsync() API that allows you to write files with guarantee that >> it has been written to the disk (like the fsync() *nix call). > > > A guarantee that the OS thinks it's been written to HDD. > > For anyone using Hadoop or any other program (e.g MySQL) in a virtualized > environment , even when the OS thinks it has flushed a virtual disk -know > that you may have set some VM params to say "when we said "flush to disk" we > meant it": > https://forums.virtualbox.org/viewtopic.php?t=13661 > > > -- Harsh J
-
Re: Hadoop hardware failure recoverySteve Loughran 2012-08-13, 16:10
On 13 August 2012 08:42, Harsh J <[EMAIL PROTECTED]> wrote:
> Hey Steve, > > Interesting, thanks for pointing that out! I didn't know that it > disables this by default :) > > It's always something to watch out for: someone implementing a disk FS, OS, VM environment discovering that they get great benchmark numbers if they make flushing async, and thinking "most people don't need it anyway". That's mostly true -and some programs over-flush-, but if you do want to be sure your data is saved to disk, these people are being dangerous rather than helpful I don't think it's an issue if you are saving to network mounted storage -which can include the storage of the host OS. If you do some experiments, NFS chat to the host system is usually as fast as working with a virtual HDD in the same host OS -which has extra layers of indirection. Virtual HDDs can get fragmented even when the VM thinks it's just allocated big linear blocks -you need to defrag the virtual HDD then the physical disk image to correct that.
-
RE: Hadoop hardware failure recoveryJeffrey Buell 2012-08-14, 00:15
This is never an issue on vSphere. The ESXi hypervisor does not send the completion interrupt back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native OS.
The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed caches), but this applies equally to native and virtualized OSes. It is always tempting to implement some kind of write caching in the virtualization layer to try to improve storage performance, but of course this comes at the cost of safety and predictability. Jeff From: Steve Loughran [mailto:[EMAIL PROTECTED]] Sent: Monday, August 13, 2012 8:08 AM To: [EMAIL PROTECTED] Subject: Re: Hadoop hardware failure recovery On 13 August 2012 07:55, Harsh J <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a working hsync() API that allows you to write files with guarantee that it has been written to the disk (like the fsync() *nix call). A guarantee that the OS thinks it's been written to HDD. For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even when the OS thinks it has flushed a virtual disk -know that you may have set some VM params to say "when we said "flush to disk" we meant it": https://forums.virtualbox.org/viewtopic.php?t=13661 |