|
|
-
Hadoop doesnt use Replication Level of Namenode
Ralf Heyde 2011-09-12, 22:01
Hello,
I've writed an HDFS Client which works pretty well.
But . on my Namenode I configured a replication leven of 2 . on my Client - the config - hold a value of 1.
If I now write a file from my HDFSClient to the HDFS it gets the replication-value of 1.
I know that I can manually put the property to 2 in my Java Code - but:
Is there any possibility OR workaround to use/get the configuration of the Namenode?
My current workaround is the copying of all configuration files from the namenode to the local client - but that's *** .
Does anybody has an idea?
Thanks, Ralf
-
Re: Hadoop doesnt use Replication Level of Namenode
Bharath Mundlapudi 2011-09-13, 01:17
If you don't set any replication factor, HDFS client will use the default setting unless you edit the client confs.
Three ways you can play around with replication: 1. Java APIĀ 2. Client Confs 3. Cluster Confs (Namenode conf)
-Bharath
________________________________ From: Ralf Heyde <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, September 12, 2011 3:01 PM Subject: Hadoop doesnt use Replication Level of Namenode
Hello,
I've writed an HDFS Client which works pretty well.
But . on my Namenode I configured a replication leven of 2 . on my Client - the config - hold a value of 1.
If I now write a file from my HDFSClient to the HDFS it gets the replication-value of 1.
I know that I can manually put the property to 2 in my Java Code - but:
Is there any possibility OR workaround to use/get the configuration of the Namenode?
My current workaround is the copying of all configuration files from the namenode to the local client - but that's *** .
Does anybody has an idea?
Thanks, Ralf
-
Re: Hadoop doesnt use Replication Level of Namenode
Harsh J 2011-09-13, 04:02
Ralf,
There is no current way to 'fetch' a config at the moment. You have the NameNode's config available at NNHOST:WEBPORT/conf page which you can perhaps save as a resource (dynamically) and load into your Configuration instance, but apart from this hack the only other ways are the ones Bharath mentioned. This might lead to slow start ups of your clients, but would give you the result you want.
You can also write a simple service that 'serves' you the latest, loaded configs from its location (you can use ZK for availability of such a service). And all your code can use this location to get its configuration objects from. This is another centralized way, I suppose. IIRC someone on the list had something like this in place as well.
Take it as something like flexibility. Client-side configurations are flexible so that each client node can submit with their own properties, based on its needs. You get finer control, but I agree that at least a base-minimum server config could get auto-discovered. That feature isn't present yet. Feel free to open or comment on existing JIRAs surrounding this, and if possible, patches welcome! :)
On Tue, Sep 13, 2011 at 3:31 AM, Ralf Heyde <[EMAIL PROTECTED]> wrote: > Hello, > > > > I've writed an HDFS Client which works pretty well. > > But . on my Namenode I configured a replication leven of 2 . on my Client - > the config - hold a value of 1. > > If I now write a file from my HDFSClient to the HDFS it gets the > replication-value of 1. > > I know that I can manually put the property to 2 in my Java Code - but: > > Is there any possibility OR workaround to use/get the configuration of the > Namenode? > > > > My current workaround is the copying of all configuration files from the > namenode to the local client - but that's *** . > > > > Does anybody has an idea? > > > > Thanks, Ralf > >
-- Harsh J
-
Re: Hadoop doesnt use Replication Level of Namenode
Steve Loughran 2011-09-13, 09:53
On 13/09/11 05:02, Harsh J wrote: > Ralf, > > There is no current way to 'fetch' a config at the moment. You have > the NameNode's config available at NNHOST:WEBPORT/conf page which you > can perhaps save as a resource (dynamically) and load into your > Configuration instance, but apart from this hack the only other ways > are the ones Bharath mentioned. This might lead to slow start ups of > your clients, but would give you the result you want.
I've done it a modified version of Hadoop, all it takes is a servlet in the NN. It even served up the live data of the addresses and ports a NN was running on, even if it didn't know in advance.
-
Re: Hadoop doesnt use Replication Level of Namenode
Edward Capriolo 2011-09-13, 14:56
On Tue, Sep 13, 2011 at 5:53 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 13/09/11 05:02, Harsh J wrote: > >> Ralf, >> >> There is no current way to 'fetch' a config at the moment. You have >> the NameNode's config available at NNHOST:WEBPORT/conf page which you >> can perhaps save as a resource (dynamically) and load into your >> Configuration instance, but apart from this hack the only other ways >> are the ones Bharath mentioned. This might lead to slow start ups of >> your clients, but would give you the result you want. >> > > I've done it a modified version of Hadoop, all it takes is a servlet in the > NN. It even served up the live data of the addresses and ports a NN was > running on, even if it didn't know in advance. > > Another technique is that if you are using a single replication factor on all files you can mark the property as <final>true</final> in the configuration of the NameNode and DataNode. This will always override the client settings. However in general it is best to manage client configurations as carefully as you manage the server ones, and ensure that you give clients the configuration they MUST use puppet/cfengine etc. Essentially do not count on a client to get them right because the risk is too high if they are set wrong. IE your situation. "I thought everything was replicated 3 times"
-
Re: Hadoop doesnt use Replication Level of Namenode
Joey Echeverria 2011-09-13, 20:52
That won't work with the replication level as that is entirely a client side config. You can partially control it by setting the maximum replication level.
-Joey
On Tue, Sep 13, 2011 at 10:56 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > On Tue, Sep 13, 2011 at 5:53 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > >> On 13/09/11 05:02, Harsh J wrote: >> >>> Ralf, >>> >>> There is no current way to 'fetch' a config at the moment. You have >>> the NameNode's config available at NNHOST:WEBPORT/conf page which you >>> can perhaps save as a resource (dynamically) and load into your >>> Configuration instance, but apart from this hack the only other ways >>> are the ones Bharath mentioned. This might lead to slow start ups of >>> your clients, but would give you the result you want. >>> >> >> I've done it a modified version of Hadoop, all it takes is a servlet in the >> NN. It even served up the live data of the addresses and ports a NN was >> running on, even if it didn't know in advance. >> >> > Another technique is that if you are using a single replication factor on > all files you can mark the property as <final>true</final> in the > configuration of the NameNode and DataNode. This will always override the > client settings. However in general it is best to manage client > configurations as carefully as you manage the server ones, and ensure that > you give clients the configuration they MUST use puppet/cfengine etc. > Essentially do not count on a client to get them right because the risk is > too high if they are set wrong. IE your situation. "I thought everything was > replicated 3 times" >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
|
|