Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> risks of using Hadoop


Copy link to this message
-
Re: risks of using Hadoop
Losing the name node does not necessarily mean lost data. You should always have your name node write its metadata to an NFS server to guard against it. Also, while unavailability is a risk, it is not very common in practice.

-Joey

On Sep 17, 2011, at 19:38, Tom Deutsch <[EMAIL PROTECTED]> wrote:

> I disagree Brian - data loss and system down time (both potentially non-trival) should not be taken lightly. Use cases and thus availability requirements do vary, but I would not encourage anyone to shrug them off as "overblown", especially as Hadoop become more production oriented in utilization.
>
> ---------------------------------------
> Sent from my Blackberry so please excuse typing and spelling errors.
>
>
> ----- Original Message -----
> From: Brian Bockelman [[EMAIL PROTECTED]]
> Sent: 09/17/2011 05:11 PM EST
> To: [EMAIL PROTECTED]
> Subject: Re: risks of using Hadoop
>
>
>
>
> On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote:
>
>> Hi Kobina,
>>
>> Some experiences which may helpful for you with respective to DFS.
>>
>> 1. Selecting the correct version.
>>   I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well.
>> Dont go for 21 version.This version is not a stable version.This is risk.
>>
>> 2. You should perform thorough test with your customer operations.
>> (of-course you will do this :-))
>>
>> 3. 0.20x version has the problem of SPOF.
>>  If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required.
>> In latest trunk SPOF will be addressed bu HDFS-1623.
>>
>> 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well.
>>
>
> With respect to (3) and (4) - these are often completely overblown for many Hadoop use cases.  If you use Hadoop as originally designed (large scale batch data processing), these likely don't matter.
>
> If you're looking at some of the newer use cases (low latency stuff or time-critical processing), or if you architect your solution poorly (lots of small files), these issues become relevant.  Another case where I see folks get frustrated is using Hadoop as a "plain old batch system"; for non-data workflows, it doesn't measure up against specialized systems.
>
> You really want to make sure that Hadoop is the best tool for your job.
>
> Brian