> What is meant by 'cluster time'? and What you want to achieve?
let me try to clarify. i have a hadoop cluster (e.g. name node, data nodes,
job tracker, task trackers, etc...). all the nodes in this hadoop cluster
use ntp to sync time.
i have another computer (which i have referred to as a server, since it is
running tomcat), and this computer is NOT a part of the hadoop cluster (it
doesn't run any of the hadoop daemons), but does submit jobs to the hadoop
cluster via a JEE webapp interface. i need to check that the time on this
computer is in sync with the time on the hadoop cluster. when i say "check
that the time is in sync", there is a defined tolerance/threshold
difference in date/time that i am willing to accept (e.g. the date/time
should be the same down to the minute).
so, using niels link, i can get the time on the "server" (the computer that
is running tomcat and not a part of the hadoop cluster). which solves 1/3
of the problem.
how do i get the time of the hadoop cluster? this is 1/3 of the problem.
the last 1/3 of the problem, for me, is to then take the time on the
"server", denote this as A, the time on the hadoop cluster, denote this as
B, and subtract them,
C = | A - B |
and then i want to see if C < threshold.
by "cluster time", i am assuming, per my understanding, that the hadoop
cluster (all its nodes), somehow has a notion of "the time" (maybe i'm
wrong). now, i know that having all the date/time to the second or
millisecond between all the hadoop nodes to be exactly the same is unlikely
(similar to what you have stated). but, at least, the date/time between the
nodes should be the same down to the minute (i think that's reasonably fair
to expect that condition). but even if that's not the case, that's ok,
because that's not really what i'm trying to check (not my goal to ensure
time sync, as my goal is to probe the date/time from the cluster and
compare it to the "server").
so, is there a way to programmatically (via the hadoop API) get the hadoop
cluster's date/time? or can i get the date/time via the hadoop API from
just the name node or job tracker? (preferably the latter).
On Thu, May 16, 2013 at 12:46 PM, Michael Segel
> Uhm... sort of...
> Niels is essentially correct and for the most of us, just starting an
> NNTPd on a server that sync's with a government clock and then your local
> servers sync to that... will be enough. However... in more detail...
> Time is relative. ;-)
> Ok... being a bit more serious...
> There are two things you have to consider... What is meant by 'cluster
> time'? and What you want to achieve?
> Each machine in the cluster has its own clock. These will still have a
> certain amount of drift throughout the day.
> So you can set up your own NTP server. (You can either run NTPd and sync
> to a known government clock) or you can spend money and buy an atomic clock
> for your servers or machine room.
> (See http://www.atomic-clock.galleon.eu.com/ )
> Then periodically throughout the day, via cron, have the machines in your
> machine room sync to the local NTP server.
> This way all of your machines will have the same and correct time.
> So this will sync the clocks to a degree, but then drift sets in.
> Of course you also need to set up a machine to sync from... my vote would
> be the Name node. ;-)
> On May 16, 2013, at 10:34 AM, Niels Basjes <[EMAIL PROTECTED]> wrote:
> > If you make sure that everything uses NTP then this becomes an irrelevant
> > distinction.
> > On Thu, May 16, 2013 at 4:01 PM, Jane Wayne <[EMAIL PROTECTED]
> >> yes, but that gets the current time on the server, not the hadoop
> >> i need to be able to probe the date/time of the hadoop cluster.
> >> On Tue, May 14, 2013 at 5:09 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
> >>> I made a typo. I meant API (instead of SPI).
> >>> Have a look at this for more information: