Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> CheckPoint Node


Copy link to this message
-
Re: CheckPoint Node
Jean-Marc (Sorry if I've been spelling your name wrong),

0.94 does support Hadoop-2 already, and works pretty well with it, if
that is your only concern. You only need to use the right download (or
if you compile, use the -Dhadoop.profile=23 maven option).

You will need to restart the NameNode to make changes to the
dfs.name.dir property and set it into effect. A reasonably fast disk
is needed for quicker edit log writes (few bytes worth in each round)
but a large, or SSD-style disk is not a requisite. An external disk
would work fine too (instead of an NFS), as long as it is reliable.

You do not need to copy data manually - just ensure that your NameNode
process user owns the directory and it will auto-populate the empty
directory on startup.

Operationally speaking, in case 1/2 disk fails, the NN Web UI (and
metrics as well) will indicate this (see bottom of NN UI page for an
example of what am talking about) but the NN will continue to run with
the lone remaining disk, but its not a good idea to let it run for too
long without fixing/replacing the disk, for you will be losing out on
redundancy.

On Thu, Nov 22, 2012 at 11:59 PM, Jean-Marc Spaggiari
<[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> Again, thanks a lot for all those details.
>
> I read the previous link and I totally understand the HA NameNode. I
> already have a zookeeper quorum (3 servers) that I will be able to
> re-use. However, I'm running HBase 0.94.2 which is not yet compatible
> (I think) with Hadoop 2.0.x. So I will have to go with a non-HA
> NameNode until I can migrate to a stable 0.96 HBase version.
>
> Can I "simply" add one directory to dfs.name.dir and restart
> my namenode? Is it going to feed all the required information in this
> directory? Or do I need to copy the data of the existing one in the
> new one before I restart it? Also, does it need a fast transfert rate?
> Or will an exteral hard drive (quick to be moved to another server if
> required) be enought?
>
>
> 2012/11/22, Harsh J <[EMAIL PROTECTED]>:
>> Please follow the tips provided at
>> http://wiki.apache.org/hadoop/FAQ#How_do_I_set_up_a_hadoop_node_to_use_multiple_volumes.3Fand
>> http://wiki.apache.org/hadoop/FAQ#If_the_NameNode_loses_its_only_copy_of_the_fsimage_file.2C_can_the_file_system_be_recovered_from_the_DataNodes.3F
>>
>> In short, if you use a non-HA NameNode setup:
>>
>> - Yes the NN is a very vital persistence point in running HDFS and its
>> data should be redundantly stored for safety.
>> - You should, in production, configure your NameNode's image and edits
>> disk (dfs.name.dir in 1.x+, or dfs.namenode.name.dir in 0.23+/2.x+) to
>> be a dedicated one with adequate free space for gradual growth, and
>> should configure multiple disks (with one off-machine NFS point highly
>> recommended for easy recovery) for adequate redundancy.
>>
>> If you instead use a HA NameNode setup (I'd highly recommend doing
>> this since it is now available), the presence of > 1 NameNodes and the
>> journal log mount or quorum setup would automatically act as
>> safeguards for the FS metadata.
>>
>> On Thu, Nov 22, 2012 at 11:03 PM, Jean-Marc Spaggiari
>> <[EMAIL PROTECTED]> wrote:
>>> Hi Harsh,
>>>
>>> Thanks for pointing me to this link. I will take a close look at it.
>>>
>>> So with 1.x and 0.23.x, what's the impact on the data if the namenode
>>> server hard-drive die? Is there any critical data stored locally? Or I
>>> simply need to build a new namenode, start it and restart all my
>>> namenodes to find my data back?
>>>
>>> I can deal with my application not beeing available, but loosing data
>>> can be a bigger issue.
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>> 2012/11/22, Harsh J <[EMAIL PROTECTED]>:
>>>> Hey Jean,
>>>>
>>>> The 1.x, 0.23.x release lines both don't have NameNode HA features.
>>>> The current 2.x releases carry HA-NN abilities, and this is documented
>>>> at
>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html.
>>>
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB