-Re: how to get the time of a hadoop cluster, v0.20.2
Michael Segel 2013-05-18, 15:54
Then you have a problem where the solution is more of people management and not technical.
All of your servers should be using NTP. At a minimum, you have one server that gets the time from a national (government) time server, and then have all of the machines in that Data Center use that machine as its NTP server, or you can have all machines by default use the government server for NTP.
You can also buy your own clock server that syncs to either GPS or national time servers via a radio signal.
But you have a problem of staff that is either unwilling or unable to do their job.
You can either take a carrot or a stick approach.
I suggest that maybe bribing them with a bottle of scotch. (That seems to be the current liquid lubricator that works universally these days, unless of course they don't drink...)
On May 17, 2013, at 9:13 AM, Jane Wayne <[EMAIL PROTECTED]> wrote:
> and please remember, i stated that although the hadoop cluster uses NTP,
> the "server" (the machine that is not a part of the hadoop cluster) cannot
> assume to be using NTP (and in fact, doesn't).
> On Fri, May 17, 2013 at 10:10 AM, Jane Wayne <[EMAIL PROTECTED]>wrote:
>> "if NTP is correclty used"
>> that's the key statement. in several of our clusters, NTP setup is kludgy.
>> note that the professionals administering the cluster are different from
>> "us" the engineers. so, there's a lot of red tape to go through to get
>> something trivial or not fixed. we have noticed that NTP is not setup
>> correctly (using default GMT timezone, for example). without explaining all
>> the tedious details, this mismatch of date/time (of the hadoop cluster to
>> the server machine) is causing some pains.
>> i'm not sure i agree with "the local OS time from your server machine will
>> be the best estimation." that doesn't make sense.
>> but what i want to achieve is very simple. as stated before, i just want
>> to ask the namenode or jobtracker, "hey, what date/time do you have?"
>> unfortunately for me, as niels pointed out, this query is not possible via
>> the hadoop api.
>> thanks for helping, though.
>> On Fri, May 17, 2013 at 10:02 AM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>>> For hadoop, 'cluster time' is the local OS time. You might want to get the
>>> time of the namenode machine but indeed if NTP is correctly used, the
>>> OS time from your server machine will be the best estimation. If you
>>> request the time from the namenode machine, you will be penalized by the
>>> delay of your request.
>>> On Fri, May 17, 2013 at 3:17 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
>>>>> i have another computer (which i have referred to as a server, since
>>>>> running tomcat), and this computer is NOT a part of the hadoop cluster
>>>>> doesn't run any of the hadoop daemons), but does submit jobs to the
>>>>> cluster via a JEE webapp interface. i need to check that the time on
>>>>> computer is in sync with the time on the hadoop cluster. when i say
>>>>> that the time is in sync", there is a defined tolerance/threshold
>>>>> difference in date/time that i am willing to accept (e.g. the
>>>>> should be the same down to the minute).
>>>> If you ensure (using NTP) that all your servers have the same time then
>>>> can simply query your local server for the time and you have the correct
>>>> answer to your question.
>>>> You are searching for a solution in the Hadoop API (where this does not
>>>> exist) when the solution is present at a different level.
>>>> Best regards / Met vriendelijke groeten,
>>>> Niels Basjes