Time keeps drifting


Page 1 of 2 12 LastLast
Results 1 to 15 of 26

Thread: Time keeps drifting

  1. #1
    Join Date
    Mar 2005
    Location
    US
    Posts
    300

    Time keeps drifting

    I have 4 identical PCs running OpenSuSE10.2, 3 out of the 4 keeps loosing time, after 1 minute (after starting ntp) the time can be anything up to 10 secs or more out and continues to drift.

    The /etc/ntp.conf file is identical on all 4 machines, example of ntp.conf

    restrict default noquery notrust nomodify
    restrict 127.0.0.1
    restrict 192.168.199.0 mask 255.255.255.0
    server 127.127.1.0
    fudge 127.127.1.0 stratum 3
    driftfile /var/lib/ntp/drift/ntp.drift
    logfile /var/log/ntp
    server 192.168.199.220

    The server address is our default network time server, I have tried changing to another but the result is the same.

    Sometimes, after restarting the ntp daemon, the time stays stable for a few weeks at a time, but, eventually it seems to go completely hay wire!!

    All other PCs (both Windows & Linux) keep rock steady time from out time server.

    I'm wondering if it might be a NICdriver issue?

    Any help appreciated.

  2. #2
    Join Date
    Oct 2002
    Location
    Earth
    Posts
    1,616
    "Time keeps on slippin' slippin' slippin'... into the future." Steve Miller

    Just had to throw that in.

    Sorry, I don't have an answer.
    -------------
    Folding is Fun

    I thought I made a mistake once, but, of course, I was mistaken.

  3. #3
    Join Date
    Dec 2002
    Location
    Montreal Canada
    Posts
    383
    cmos battery on the motherboard, may be ?

    Linux Counter


    Debian "Lenny"
    Mandriva 2010.2

    "Where am I?" "In the Village." "What do you want?" "Information.""Whose side are you on?" "That would be telling.... We want information. Information! INFORMATION!"

  4. #4
    Join Date
    Jan 2004
    Location
    Los Angeles CA
    Posts
    185
    I had this with Gentoo for a bit but updates fixed it after a while still no idea why it was happening.

    oo yha it was a Gentoo64

  5. #5
    Join Date
    Jan 2004
    Location
    Singapore
    Posts
    355
    I used to have a similar problem on Gentoo as well.. cant rmb exactly how i fixed it.. was something about resetting the drift file back to 0.0 0.0 0.0 and stopping the system clock from writing to the drift file
    Registered Linux User #388117

  6. #6
    Join Date
    Mar 2005
    Location
    US
    Posts
    300
    Doesn't work when set to 'Random Pool of Servers' as well

    Maybe this is a cause ' Can't open /var/lib/ntp/drift.ntp.drift.TEMP: operation not permitted'
    Last edited by fishface; 06-08-2007 at 09:04 AM.

  7. #7
    Join Date
    Jul 2003
    Location
    Spokane, Washington
    Posts
    580
    It sounds like you have a number of good leads, and I have no more ideas; but I am curious: Which way are they drifting? and are they all by the same amount?

    PS. I would check that file permission error first, then the CMOS battery. I'm glad those batteries aren't soldered to the clock chip any more.

  8. #8
    Join Date
    Dec 2001
    Location
    Greenville, SC
    Posts
    68
    One thing I would do is periodically update one or all of them from not only internal time sources, but include two or three external sources as well. I keep my desktop system at home tuned in to an NTP server on the network. My computer clocks are more accurate than the clocks in my home, though all are pretty good.

    Now if you need millisecond, microsecond, or nanosecond accuracy you have to get proprietary extensions to NTP (that cost money) and invest in both hardware and software that interact with the real clocks. Some sites use radio transmitters to get signals from the "official" time clocks then minimize network latency, getting the result to a very nearby server, which is the site NTP server. Sites that really care about accuracy have both redundancy and latency reducing features to lock in on an accurate time.

    You probably don't need all of this, but it may give you some ideas - do use at least one trustworthy time source.
    Brian W. Masinick
    Masinick at Yahoo Dot Com

  9. #9
    Join Date
    Mar 2005
    Location
    US
    Posts
    300
    I'll check the cmos batteries, could be a bad batch I guess, the machines are only 4 months old.

    I will do more logging of the time drift, but it appears to be always losing time, almost immediately after starting the ntpm service, at least it is now consistently broken!

    I used Samba to map GUID/UIDs to our Windows DC for single sign on, so any drift that is to severe will cause logon issues, it's a real headache this.

    Our DC is our internal network time server, it gets its time source from 3 different external public ntp servers and I have not had a problem with time keeping for at least 5 years, it has always been rock steady. I feel very confident that the problem is local to the 3 machines as all other machines seem fine (around 60).

  10. #10
    Join Date
    Mar 2005
    Location
    US
    Posts
    300

    Unhappy

    well, they are always losing time that is for sure.

    I did the following.

    Stopped ntp
    Started it
    Time is correct - briefly
    Immediately set the hardware clock to the system time using 'hwclock --systohw'

    Use 'hwclock --show' to cross check the hardware clock against another PC who's time is correct, hardware clock is accurate, however, even after a few minutes when using the 'date' command to show system, the time is already nearly 10 seconds adrift!!

    So, system time is screwed - hardware is fine.


    I would have thoughtshould be a very simple thing to get right and I'm starting to lose patience with this

    Update: I have configured the system clock to sync the hardware clock, which it does, so the hardware clock is now perfectly in sync with the incorrect system time!
    Last edited by fishface; 06-17-2007 at 03:52 AM.

  11. #11
    Join Date
    Mar 2005
    Location
    US
    Posts
    300
    Still got a problem and have no idea - it appears I'm not the only one having issues with it.

    'I've installed openSUSE 10.2 on my machine and the clock is always slow, something like an hour back per day ( ! )
    NTP is configured to run, and each time I restart NTP the clock is fine, only it looks as if it runs only on boot, and my machine being a linux machine, is rebooted once in a long time.
    Checking the connection to the NTP server show that all is well (well, after all, if I run ntpdate <server> I get the real time ), but running ntpd -c peers will have the server there with no character to signify synchronization.'

    checked the time on pc70, 71, 72 & 73 today, results as follows:

    pc 70 out by miles, behind, days!
    pc 71 bang on time
    pc 72 out by miles, behind, days!
    pc 73 bang on time

    So, took pc72 off line, took it out of the server room and set it up in the lab, first off, hardware clock was miles out so set to GMT. Reboot again, still in lab, tested again. time bang on, left for 5 minutes or so, time still bang on, if there was a problem the time would have drifted within 5 minutes. Placed
    pc72 back in server room, rebooted and time drifts again within the first minute. Decide to swap network cable with the one connected to pc73 as I know pc73 has been keeping time. So, swapped network cables over, pc72 now has the cable that was attached to pc73 and vice versa - rebooted. Time still drifts, behind within a few minutes of booting. Booted pc73, which still 'has' pc72 network cable attached - time is now out on pc73 as well as 72.

    Looked at ntp log file, and it looks like pc72 never even queries out ntp time server, 192.168.199.210, where as pc73 does - maybe this is a clue?

    Sample pc72 ntp log

    26 Jun 14:41:48 ntpd[3854]: synchronized to LOCAL(0), stratum 10
    26 Jun 14:41:48 ntpd[3854]: kernel time sync enabled 0001
    8 Jul 19:10:22 ntpd[3854]: ntpd exiting on signal 15
    9 Jul 08:46:40 ntpd[3744]: synchronized to LOCAL(0), stratum 10
    9 Jul 08:46:40 ntpd[3744]: kernel time sync enabled 0001
    9 Jul 08:48:18 ntpd[3744]: ntpd exiting on signal 15
    9 Jul 08:58:05 ntpd[3664]: synchronized to LOCAL(0), stratum 10
    9 Jul 08:58:05 ntpd[3664]: kernel time sync enabled 0001
    9 Jul 09:05:20 ntpd[3664]: ntpd exiting on signal 15
    9 Jul 09:14:07 ntpd[3649]: synchronized to LOCAL(0), stratum 10
    9 Jul 09:14:07 ntpd[3649]: kernel time sync enabled 0001

    Sample pc73 ntp log

    26 Jun 14:41:47 ntpd[3850]: kernel time sync enabled 0001
    26 Jun 14:42:52 ntpd[3850]: synchronized to 192.168.199.210, stratum 3
    4 Jul 18:23:36 ntpd[3850]: time reset +0.187859 s
    4 Jul 18:27:51 ntpd[3850]: synchronized to LOCAL(0), stratum 10
    4 Jul 18:27:57 ntpd[3850]: synchronized to 192.168.199.210, stratum 3
    5 Jul 17:39:06 ntpd[3850]: time reset +0.137770 s
    5 Jul 17:43:21 ntpd[3850]: synchronized to LOCAL(0), stratum 10
    5 Jul 17:44:26 ntpd[3850]: synchronized to 192.168.199.210, stratum 3
    9 Jul 09:08:20 ntpd[3850]: ntpd exiting on signal 15
    9 Jul 09:21:40 ntpd[3681]: synchronized to LOCAL(0), stratum 10
    9 Jul 09:21:40 ntpd[3681]: kernel time sync enabled 0001


    At the moment I'm at a loss as to what is causing this, all desktop PCs keep perfect time as do the Linux servers.

    I've have configured scores of Linux machines with ntp and, until this latest batch of pcs, have never had a problem with ntp - however, we have also changed OS to OpenSuSE10.2 which might be a factor also, so can't rule that out.

    Usually, it looks up the ntp server at boot, locks onto the time given out by the time server and keeps accurate time within seconds of reaching run level 5.

    I set the hardware clock up and then used hwclock --show and left it for a few minutes or so and it appears that the hardware clock keeps time.

    I then tried the edit in /etc/sysconfig/ntp and changed NTPD_ADJUST_CMOS_CLOCK 'yes' stop start NTP, and as I expected, the hardware clock now syncs with the NTP time, but both drift after a few minutes.

    I'm going to install a PCI GbNIC it case it is a network card/driver issuse, they currently use the onboard GbNIC (a Intel D965F + Intel Core2Duo E6600).

    It is odd on how it seems to work fine if it is plugged into a 100mbps switch, they are normally set-up and left in a server room where they connect to a Gigabit switch, so Gigabit to Gigabit screws up, Gigabit to Megabit works!
    Last edited by fishface; 07-09-2007 at 07:03 AM.

  12. #12
    Join Date
    Jul 2002
    Location
    Tallahassee, FL
    Posts
    512
    I don't know if it's related to your problem, but I had a similar issue on a Thinkpad T20. The clock would run twice as fast as normal. I was able to fix it by passing "no_timer_check" to the kernel through GRUB.

    Hope this helps you.
    Registered Linux User No. 321,742

    "At Harvard they have this policy where if you pass too many classes they ask you to leave."
    ---Richard M. Stallman

  13. #13
    Join Date
    Apr 2001
    Location
    SF Bay Area, CA
    Posts
    14,936
    Quote Originally Posted by fishface
    Sample pc72 ntp log

    26 Jun 14:41:48 ntpd[3854]: synchronized to LOCAL(0), stratum 10
    26 Jun 14:41:48 ntpd[3854]: kernel time sync enabled 0001
    8 Jul 19:10:22 ntpd[3854]: ntpd exiting on signal 15
    9 Jul 08:46:40 ntpd[3744]: synchronized to LOCAL(0), stratum 10
    9 Jul 08:46:40 ntpd[3744]: kernel time sync enabled 0001
    9 Jul 08:48:18 ntpd[3744]: ntpd exiting on signal 15
    9 Jul 08:58:05 ntpd[3664]: synchronized to LOCAL(0), stratum 10
    9 Jul 08:58:05 ntpd[3664]: kernel time sync enabled 0001
    9 Jul 09:05:20 ntpd[3664]: ntpd exiting on signal 15
    9 Jul 09:14:07 ntpd[3649]: synchronized to LOCAL(0), stratum 10
    9 Jul 09:14:07 ntpd[3649]: kernel time sync enabled 0001
    Looks to me like your ntpd config file isn't right. For whatever reason, ntpd is using the local clock as its only time source, not the server that's presumably configured in its config file.

    Double-check its config file (or post that here, but if there are any authentication parameters stored in it, strip them out first -- I don't know if they're allowed in there or not). Also ensure it can ping the NTP server (hey, you never know). Also double check that whenever the network goes down and comes back up, that ntpd restarts -- it may not be usable until it restarts.

    (It's possible that it depends on whether your IP address changes; I know that restarting my PPPoE connection kills ntpd on that machine (which is syncing out over that connection). It has something to do with the source IP addresses not getting reset.)

    It may also help to move the ntpd start script to the very end of the boot sequence, but that's a bit of a hack. (I'm thinking that maybe something after the ntpd start script takes the interface down. This won't apply if you've restarted ntpd manually, though (without restarting the whole machine).)

  14. #14
    Join Date
    Mar 2005
    Location
    US
    Posts
    300
    Pings work, as do nslookups, host etc, in fact, when using YaST to configure NTP is says the server responds and is Ok!

    I agree, it does look as if it is not even trying to get an external time source even though the ntp.conf has one stated.

    Every is internal to our our network, no firewalls on these machine, I have configured them to run Torque so they have quite a slimmed down software install. Torque runs fine, thankfully.

    Sample pc72 ntp.conf

    ## Undisciplined Local Clock. This is a fake driver intended for backup
    ## and when no outside source of synchronized time is available.
    ##
    server 127.127.1.0
    # local clock (LCL)
    fudge 127.127.1.0 stratum 10
    # LCL is unsynchronized

    ##
    ## Outside source of synchronized time
    ##
    ## server xx.xx.xx.xx # IP address of server

    ##
    ## Miscellaneous stuff
    ##

    driftfile /var/lib/ntp/drift/ntp.drift
    # path for drift file

    logfile /var/log/ntp
    server 192.168.199.210
    # alternate log file
    # logconfig =syncstatus + sysevents
    # logconfig =all

    # statsdir /tmp/ # directory for statistics files
    # filegen peerstats file peerstats type day enable
    # filegen loopstats file loopstats type day enable
    # filegen clockstats file clockstats type day enable

    #
    # Authentication stuff
    #
    # keys /etc/ntp.keys # path for keys file
    # trustedkey 1 2 3 4 5 6 14 15 # define trusted keys
    # requestkey 15 # key (7) for accessing server variables
    # controlkey 15 # key (6) for accessing server variables


    Sample pc73 ntp.conf

    ## Undisciplined Local Clock. This is a fake driver intended for backup
    ## and when no outside source of synchronized time is available.
    ##
    server 127.127.1.0
    # local clock (LCL)
    fudge 127.127.1.0 stratum 10
    # LCL is unsynchronized

    ##
    ## Outside source of synchronized time
    ##
    ## server xx.xx.xx.xx # IP address of server

    ##
    ## Miscellaneous stuff
    ##

    driftfile /var/lib/ntp/drift/ntp.drift
    # path for drift file

    logfile /var/log/ntp
    server 192.168.199.210
    # alternate log file
    # logconfig =syncstatus + sysevents
    # logconfig =all

    # statsdir /tmp/ # directory for statistics files
    # filegen peerstats file peerstats type day enable
    # filegen loopstats file loopstats type day enable
    # filegen clockstats file clockstats type day enable

    #
    # Authentication stuff
    #
    # keys /etc/ntp.keys # path for keys file
    # trustedkey 1 2 3 4 5 6 14 15 # define trusted keys
    # requestkey 15 # key (7) for accessing server variables
    # controlkey 15 # key (6) for accessing server variables



    I'm getting the feeling it is hardware related, as this is the only common element at the moment. Another new machine, which has a quadcore, in the same switch, in the same server room as the suspect ones, but has a different make of mainboard (Gigabyte instead of Intel) has been fine for the last 2 days.

    Other aspect of networking work fine, DNS, Samba (used for single sign on).
    Last edited by fishface; 07-10-2007 at 04:48 AM.

  15. #15
    Join Date
    Apr 2001
    Location
    SF Bay Area, CA
    Posts
    14,936
    Try commenting out those "undisciplined local clock" lines -- anything referring to 127.x.x.x should be removed. There's no point in having ntpd update the local clock with data from the local clock.

    It's possible that it's just choosing the local clock over the remote machine (perhaps because it doesn't recognize the remote machine, or perhaps because it doesn't get far enough in the config file, or perhaps because the fudge line isn't honored for whatever reason).

    And now that I think about it, it's also possible that the target machine is denying traffic from these machines; I had some issues with the win 2k server ntp service when trying to talk to it from ntpd. It would work for a few hours, then suddenly stop responding (likely because ntpd was acting like a real NTP client, and syncing the time about every 10 minutes; normal win clients sync maybe once a day). Maybe it'd be worth doing a packet capture (NTP traffic is UDP port 123), to ensure a response is coming from the remote machine when ntpd starts.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •