How to debug ntp issues?

Ntp has been the de-facto protocol used by computers to synchronize their clocks over a network, and maintain very accurate time, with as much as 10 millisecond precision. The ntp daemon or ntpd is the reference implementation, that can be found running on almost all Linux (and Unix) systems. This may change in the future though, as Chrony is going to replace ntpd, and will be the default ntp client in Fedora 16. Nevertheless, many systems use ntpd, and I don’t see it going away any time soon.

In this post, we will take a brief look at how the ntp daemon works and look at ways to debug some common issues.

When the ntp service first starts, a clock selection process begins, with the daemon polling the servers configured in ntp.conf, at 64 second intervals. Depending on the configuration, this process can take 5 to 10 minutes. To check the status, run the following :

# ntpq
ntpq> peers
     remote           refid           st t when poll reach   delay   offset  jitter
=======================================================================================
*time.ferea.org       8.16.24.15       2 u  972 1024  377   28.066   -0.181   4.126
+dg1.rieta.net        15.15.26.3       3 u  467 1024  377  141.664  -23.531   0.140
 mighty.poclabs.      .STEP.          16 u    - 1024    0    0.000    0.000   0.000
 LOCAL(0)             .LOCL.          10 l   32   64  377    0.000    0.000   0.001

During the clock selection process the refid column should read .INIT.  and the st (stratum) set to 16.

The * indicates that this particular association is the chosen ntp source.
The  + indicates that this ntp peer is a candidate (a peer is a ntp server on the same stratum).
An empty space indicates that the server is unreachable and therefore rejected (stratum 16).

If the current local time is greater than 1000 seconds, ntpd will not set the clock. The time can then be manually set using the “date” command or using “ntpdate” :

# ntpdate time.ferea.org

If no ntp servers get selected, run the following :

ntpq> as

ind assID status  conf reach auth condition  last_event cnt
===========================================================
  1 29581  9624   yes   yes  none  sys.peer   reachable  1
  2 29582  9014   yes   yes  none  candidat   reachable  1
  4 29583  8000   yes   yes  none    reject
  5 29584  9024   yes   yes  none    reject   reachable  2

The associations shown above correspond to the entries shown in the peer command. Most of the fields are self-explanatory,  except the status column. Use the table here to decipher the status codes.

Use the “assID” for the following command  :

ntpq> rv 29583

assID=62236 status=9014 reach, conf, 1 event, event_reach,
srcadr=192.168.23.1, srcport=123, dstadr=192.168.247.11, dstport=123,
leap=00, stratum=3, precision=-6, rootdelay=218.750,
rootdispersion=1381.516, refid=24.1.4.14, reach=377, unreach=0,
hmode=3, pmode=4, hpoll=10, ppoll=10, flash=400 peer_dist, keyid=0,
ttl=0, offset=-29.750, delay=0.316, dispersion=30.400, jitter=1.136,
reftime=d1e4505b.d456f5b0  Thu, Aug  4 2011  0:55:23.829,
org=d1e4c793.e477ba4b  Thu, Aug  4 2011  9:24:03.892,
rec=d1e4c793.ec1fc3ac  Thu, Aug  4 2011  9:24:03.922,
xmt=d1e4c793.ec0b133c  Thu, Aug  4 2011  9:24:03.922,
filtdelay=     0.32    0.40    0.33    0.45    0.42    0.42    0.33    0.38,
filtoffset=  -29.75  -30.89  -29.97  -30.11  -30.15  -29.20  -30.25  -30.36,
filtdisp=     15.63   31.00   46.38   61.75   77.14   92.52  107.91  123.28

The flash codes in the rv command output give the reason for the ntp source to get rejected :

flash=400 peer_dist

This flash code corresponds to “distance threshold exceeded”. Check all the flash codes here.

Also, check the following variables :

rootdispersion=1381.516
dispersion=30.400
jitter=1.136

Dispersion is an estimate of error, and a large value indicates that the ntp server is not a reliable source, and can indicate conditions such as severe packet loss and network congestion.

Another useful aid is to run ntpdate with the -d switch :

# ntpdate -d time.rhl.com

17 Oct 00:20:51 ntpdate[26388]: ntpdate 4.2.2p1@1.1570-o Thu Nov 26 11:34:35 UTC 2009 (1)
Looking for host time.rhl.com and service ntp
host found : time.rhl.com
transmit(66.125.13.54)
receive(66.125.13.54)
transmit(66.125.13.54)
receive(66.125.13.54)
transmit(66.125.13.54)
receive(66.125.13.54)
transmit(66.125.13.54)
receive(66.125.13.54)
transmit(66.125.13.54)
server 66.125.13.54, port 123
stratum 1, precision -16, leap 00, trust 000
refid [CDMA], delay 0.32297, dispersion 0.00040
transmitted 4, in filter 4
reference time:    d245a5fe.2fdfe09b  Mon, Oct 17 2011  0:20:38.187
originate timestamp: d245a60c.e2117d1e  Mon, Oct 17 2011  0:20:52.883
transmit timestamp:  d245a60c.b9c9b413  Mon, Oct 17 2011  0:20:52.725
filter delay:  0.32361  0.32382  0.32297  0.32619
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.003892 0.004005 0.003607 0.004972
         0.000000 0.000000 0.000000 0.000000
delay 0.32297, dispersion 0.00040
offset 0.003607
17 Oct 00:20:53 ntpdate[26388]: adjust time server 66.187.233.4 offset 0.003607 sec

Most, if not all ntp issues can be resolved with the information gathered from the above commands.

Do you have any tips on debugging ntp problems?

Advertisements

5 thoughts on “How to debug ntp issues?

  1. This is a nice enhancement to the man page of ntpd, but how exactly do I debug ntp problems? How can I tell ntpd to log each attempt to synchronize its clock with an uplink server?

    I’ve got the following problem:
    – ntpdate updates the local time without any problems: ntpdate -b -d -s ntpserver.mydomain.com
    – but ntpd tells me, that the very same server (ntpserver.mydomain.com) is not reachable (flash=1600)

    How can I tell ntpd to log every single step it does and what the outcome is?
    ntpd’s just saying: ‘It does not work’. But that’s no help to me.

  2. I was having a problem with ntpd syncronisation.
    I googled around a lot and found several articles and tutorials, beside of course RTFM.
    Your article was the one that gave me the answer.
    Thanks.

  3. Hi, Thanks for this article, it was helpfull and solved my reject and unreachable error message when executing ntpq -as command. I was not succesfull changing the “time2” option in /etc/ntp.conf file (x instead of *) before GPS. This is output ntpq -p:

    remote refid st t when poll reach delay offset jitter
    ==========================================================================
    *GPS_NMEA(0) .GPS. 0 l 4 16 377 0.000 -7.721 0.945
    oPPS(0) .PPS. 0 l 3 16 377 0.000 -0.005 0.004
    +auth1.xs4all.nl 193.79.237.14 2 u 59 64 177 19.475 -0.036 1.500
    +213.109.127.195 193.79.237.14 2 u 12 64 377 24.753 -1.592 1.068
    +ntp4.bit.nl .PPS. 1 u 14 64 377 21.438 -1.827 1.084
    +ran.as65342.net 192.36.144.23 2 u 12 64 377 21.323 -0.246 1.168
    +nyx.jonathanj.n 193.79.237.14 2 u 8 64 377 20.922 -2.093 0.978

    And below configuration of /etc/ntp.conf:

    # NMEA /dev/gps0, RMC, 9600
    server 127.127.20.0 mode 17 minpoll 4 maxpoll 4 prefer
    fudge 127.127.20.0 flag1 0 flag3 0 time2 0.115

    # Atom/PPS (/dev/pps0)
    server 127.127.22.0 minpoll 4 maxpoll 4
    fudge 127.127.22.0 flag3 0 refid PPS

    Do you have some considerations regarding offset and delay of GPS_EMEA driver?

    Thanks in advanced,

    Michiel

  4. Pingback: Debug : Client NTP Linux vers NTP Windows Server - Zwindler's Reflection

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s