Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - How to rebuild the shared edits directory


Copy link to this message
-
Re: How to rebuild the shared edits directory
Todd Lipcon 2012-07-25, 18:51
Hi Jeff,

I don't know the HP offerings very well myself, but I know some of our
customers are successfully using lower end NetApp devices.

You should also be aware that work on the NAS-less shared storage is
well under way: HDFS-3077. So if your timeline is more than a few
months out to production, you may consider waiting for it to get your
HA setup running.

-Todd

On Tue, Jul 24, 2012 at 12:05 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote:
> Todd or anyone who knows,
>
> I'm reviving an old thread because we are collocating into a data center
> rather than just using the cloud.  You mentioned "We currently require the
> NFS direcory to be highly available itself. This is achievable with even
> pretty inexpensive NAS devices from your vendor of choice."    What hardware
> would you suggest that would give us an HA filer?  Specifically we are going
> all HP in the colo.
>
>  I've looked around and was unable to find any suggestions.  The docs just
> say "high-quality dedicated NAS appliance."  Any suggestions would be great!
>
> https://ccp.cloudera.com/display/CDH4DOC/HDFS+High+Availability+Hardware+Configuration
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419
>
> Thanks,
> ~Jeff
>
>
> On 5/8/2012 6:49 PM, Todd Lipcon wrote:
>>
>> Hi Jeff,
>>
>> Check out HDFS-3077. We'll probably need the most help when it comes
>> time to do testing. Any testing you can do on the current HA solution,
>> non-ideal as it may be, is also immensely valuable. For example, if
>> you can reproduce the case where it didn't exit upon loss of shared
>> edits, that would also be a bug which would hit the quorum-based
>> solution.
>>
>> Thanks
>> -Todd
>>
>> On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote:
>>>
>>> Thanks for being patient and listening to my rants.  I'm excited to see
>>> hdfs
>>> continue to move forward.  If the organization I'm working for was
>>> willing
>>> spend some resources to help speed this process up, where should be start
>>> looking?  I'm sure there are quite a few jiras on these issues.
>>>
>>> Most of what we've done with the hadoop eco system has been zookeeper and
>>> hbase related.
>>>
>>> Thanks,
>>> ~Jeff
>>>
>>>
>>> On 5/8/2012 2:46 PM, Todd Lipcon wrote:
>>>>
>>>> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>> It seems the NN was originally written with the assumption that disks
>>>>> fail
>>>>> and stuff happens.  Hence the ability to have multiple directories
>>>>> store
>>>>> your NN data even though each directory is mostly likely redundant /
>>>>> HA.
>>>>>
>>>>> [start rant]
>>>>>
>>>>> My opinion is that it is a step backwards that the shared edits wasn't
>>>>> written with the same assumptions.  If any one problem can take out
>>>>> your
>>>>> cluster then it isn't HA.  So allowing  a single nfs failure taking
>>>>> down
>>>>> your cluster and saying make nfs HA, just seems to move the HA problem
>>>>> not
>>>>> solve it.  I would expect a true HA solution to be completely self
>>>>> contained
>>>>> within the hadoop ecosystem.  All machines fail...eventually and it
>>>>> needs
>>>>> to
>>>>> be planned for.  At a minimum a failure of the shared edits should only
>>>>> disable fail over and provide a recovery mechanism; Ideally the NN
>>>>> should
>>>>> have been rewritten to be a cluster (similar to zookeeper or ceph) to
>>>>> enable
>>>>> HA.
>>>>>
>>>>> [end rant]
>>>>
>>>> Like I said earlier in the thread, work is already under way on this
>>>> and should be complete within a number of months.
>>>>
>>>> In many practical deployments, what we have already can provide
>>>> complete HA. In others, like the AWS example you mentioned, we need a
>>>> bit more, and we're working on it. Hang on a bit longer and it will be
>>>> good to go.
>>>>
>>>> -Todd
>>>>
>>>>> Sorry for the rant.  I just really want to see HDFS become complete HA

Todd Lipcon
Software Engineer, Cloudera