Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> How to rebuild the shared edits directory


Copy link to this message
-
Re: How to rebuild the shared edits directory
Hi Jeff,

I don't know the HP offerings very well myself, but I know some of our
customers are successfully using lower end NetApp devices.

You should also be aware that work on the NAS-less shared storage is
well under way: HDFS-3077. So if your timeline is more than a few
months out to production, you may consider waiting for it to get your
HA setup running.

-Todd

On Tue, Jul 24, 2012 at 12:05 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote:
> Todd or anyone who knows,
>
> I'm reviving an old thread because we are collocating into a data center
> rather than just using the cloud.  You mentioned "We currently require the
> NFS direcory to be highly available itself. This is achievable with even
> pretty inexpensive NAS devices from your vendor of choice."    What hardware
> would you suggest that would give us an HA filer?  Specifically we are going
> all HP in the colo.
>
>  I've looked around and was unable to find any suggestions.  The docs just
> say "high-quality dedicated NAS appliance."  Any suggestions would be great!
>
> https://ccp.cloudera.com/display/CDH4DOC/HDFS+High+Availability+Hardware+Configuration
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419
>
> Thanks,
> ~Jeff
>
>
> On 5/8/2012 6:49 PM, Todd Lipcon wrote:
>>
>> Hi Jeff,
>>
>> Check out HDFS-3077. We'll probably need the most help when it comes
>> time to do testing. Any testing you can do on the current HA solution,
>> non-ideal as it may be, is also immensely valuable. For example, if
>> you can reproduce the case where it didn't exit upon loss of shared
>> edits, that would also be a bug which would hit the quorum-based
>> solution.
>>
>> Thanks
>> -Todd
>>
>> On Tue, May 8, 2012 at 4:20 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote:
>>>
>>> Thanks for being patient and listening to my rants.  I'm excited to see
>>> hdfs
>>> continue to move forward.  If the organization I'm working for was
>>> willing
>>> spend some resources to help speed this process up, where should be start
>>> looking?  I'm sure there are quite a few jiras on these issues.
>>>
>>> Most of what we've done with the hadoop eco system has been zookeeper and
>>> hbase related.
>>>
>>> Thanks,
>>> ~Jeff
>>>
>>>
>>> On 5/8/2012 2:46 PM, Todd Lipcon wrote:
>>>>
>>>> On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting<[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>> It seems the NN was originally written with the assumption that disks
>>>>> fail
>>>>> and stuff happens.  Hence the ability to have multiple directories
>>>>> store
>>>>> your NN data even though each directory is mostly likely redundant /
>>>>> HA.
>>>>>
>>>>> [start rant]
>>>>>
>>>>> My opinion is that it is a step backwards that the shared edits wasn't
>>>>> written with the same assumptions.  If any one problem can take out
>>>>> your
>>>>> cluster then it isn't HA.  So allowing  a single nfs failure taking
>>>>> down
>>>>> your cluster and saying make nfs HA, just seems to move the HA problem
>>>>> not
>>>>> solve it.  I would expect a true HA solution to be completely self
>>>>> contained
>>>>> within the hadoop ecosystem.  All machines fail...eventually and it
>>>>> needs
>>>>> to
>>>>> be planned for.  At a minimum a failure of the shared edits should only
>>>>> disable fail over and provide a recovery mechanism; Ideally the NN
>>>>> should
>>>>> have been rewritten to be a cluster (similar to zookeeper or ceph) to
>>>>> enable
>>>>> HA.
>>>>>
>>>>> [end rant]
>>>>
>>>> Like I said earlier in the thread, work is already under way on this
>>>> and should be complete within a number of months.
>>>>
>>>> In many practical deployments, what we have already can provide
>>>> complete HA. In others, like the AWS example you mentioned, we need a
>>>> bit more, and we're working on it. Hang on a bit longer and it will be
>>>> good to go.
>>>>
>>>> -Todd
>>>>
>>>>> Sorry for the rant.  I just really want to see HDFS become complete HA

Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB