NTP - Network Time Protocol
stenn 1.1     | Leap Second Smearing with NTP
stenn 1.1     | -----------------------------
stenn 1.1     | 
stenn 1.2     | By Martin Burnicki
stenn 1.2     | with some edits by Harlan Stenn
stenn 1.2     | 
stenn 1.2     | The NTP software protocol and its reference implementation, ntpd, were
stenn 1.2     | originally designed to distribute UTC time over a network as accurately as
stenn 1.2     | possible.
stenn 1.2     | 
stenn 1.2     | Unfortunately, leap seconds are scheduled to be inserted into or deleted
stenn 1.2     | from the UTC time scale in irregular intervals to keep the UTC time scale
stenn 1.2     | synchronized with the Earth rotation.  Deletions haven't happened, yet, but
stenn 1.2     | insertions have happened over 30 times.
stenn 1.2     | 
stenn 1.2     | The problem is that POSIX requires 86400 seconds in a day, and there is no
stenn 1.2     | prescribed way to handle leap seconds in POSIX.
stenn 1.2     | 
stenn 1.2     | Whenever a leap second is to be handled ntpd either:
stenn 1.2     | 
stenn 1.2     | - passes the leap second announcement down to the OS kernel (if the OS
stenn 1.2     | supports this) and the kernel handles the leap second automatically, or
stenn 1.2     | 
stenn 1.2     | - applies the leap second correction itself.
stenn 1.2     | 
stenn 1.2     | NTP servers also pass a leap second warning flag down to their clients via
stenn 1.2     | the normal NTP packet exchange, so clients also become aware of an
stenn 1.2     | approaching leap second, and can handle the leap second appropriately.
stenn 1.2     | 
stenn 1.1     | 
stenn 1.1     | The Problem on Unix-like Systems
stenn 1.1     | --------------------------------
stenn 1.2     | If a leap second is to be inserted then in most Unix-like systems the OS
stenn 1.2     | kernel just steps the time back by 1 second at the beginning of the leap
stenn 1.2     | second, so the last second of the UTC day is repeated and thus duplicate
stenn 1.2     | timestamps can occur.
stenn 1.2     | 
stenn 1.2     | Unfortunately there are lots of applications which get confused it the
stenn 1.2     | system time is stepped back, e.g. due to a leap second insertion.  Thus,
stenn 1.2     | many users have been looking for ways to avoid this, and tried to introduce
stenn 1.2     | workarounds which may work properly, or not.
stenn 1.2     | 
stenn 1.2     | So even though these Unix kernels normally can handle leap seconds, the way
stenn 1.2     | they do this is not optimal for applications.
stenn 1.2     | 
stenn 1.2     | One good way to handle the leap second is to use ntp_gettime() instead of
stenn 1.2     | the usual calls, because ntp_gettime() includes a "clock state" variable
stenn 1.2     | that will actually tell you if the time you are receiving is OK or not, and
stenn 1.2     | if it is OK, if the current second is an in-progress leap second.  But even
stenn 1.2     | though this mechanism has been available for about 20 years' time, almost
stenn 1.2     | nobody uses it.
stenn 1.1     | 
stenn 1.1     | 
stenn 1.1     | NTP Client for Windows Contains a Workaround
stenn 1.1     | --------------------------------------------
stenn 1.2     | The Windows system time knows nothing about leap seconds, so for many years
stenn 1.2     | the Windows port of ntpd provides a workaround where the system time is
stenn 1.2     | slewed by the client to compensate the leap second.
stenn 1.2     | 
stenn 1.2     | Thus it is not required to use a smearing NTP server for Windows clients,
stenn 1.2     | but of course the smearing server approach also works.
stenn 1.1     | 
stenn 1.1     | 
stenn 1.1     | The Leap Smear Approach
stenn 1.1     | -----------------------
stenn 1.2     | Due to the reasons mentioned above some support for leap smearing has
stenn 1.2     | recently been implemented in ntpd.  This means that to insert a leap second
stenn 1.2     | an NTP server adds a certain increasing "smear" offset to the real UTC time
stenn 1.2     | sent to its clients, so that after some predefined interval the leap second
stenn 1.2     | offset is compensated.  The smear interval should be long enough,
stenn 1.2     | e.g. several hours, so that NTP clients can easily follow the clock drift
stenn 1.2     | caused by the smeared time.
stenn 1.2     | 
stenn 1.2     | During the period while the leap smear is being performed, ntpd will include
stenn 1.2     | a specially-formatted 'refid' in time packets that contain "smeared" time.
stenn 1.2     | This refid is of the form 254.x.y.z, where x.y.z are 24 encoded bits of the
stenn 1.2     | smear value.
stenn 1.2     | 
stenn 1.2     | With this approach the time an NTP server sends to its clients still matches
stenn 1.2     | UTC before the leap second, up to the beginning of the smear interval, and
stenn 1.2     | again corresponds to UTC after the insertion of the leap second has
stenn 1.2     | finished, at the end of the smear interval.  By examining the first byte of
stenn 1.2     | the refid, one can also determine if the server is offering smeared time or
stenn 1.2     | not.
stenn 1.2     | 
stenn 1.2     | Of course, clients which receive the "smeared" time from an NTP server don't
stenn 1.2     | have to (and even must not) care about the leap second anymore.  Smearing is
stenn 1.2     | just transparent to the clients, and the clients don't even notice there's a
stenn 1.2     | leap second.
stenn 1.1     | 
stenn 1.1     | 
stenn 1.1     | Pros and Cons of the Smearing Approach
stenn 1.1     | --------------------------------------
stenn 1.1     | The disadvantages of this approach are:
stenn 1.1     | 
stenn 1.2     | - During the smear interval the time provided by smearing NTP servers
stenn 1.2     | differs significantly from UTC, and thus from the time provided by normal,
stenn 1.2     | non-smearing NTP servers.  The difference can be up to 1 second, depending
stenn 1.2     | on the smear algorithm.
stenn 1.2     | 
stenn 1.2     | - Since smeared time differs from true UTC, and many applications require
stenn 1.2     | correct legal time (UTC), there may be legal consequences to using smeared
stenn 1.2     | time.  Make sure you check to see if this requirement affects you.
stenn 1.2     | 
stenn 1.2     | However, for applications where it's only important that all computers have
stenn 1.2     | the same time and a temporary offset of up to 1 s to UTC is acceptable, a
stenn 1.2     | better approach may be to slew the time in a well defined way, over a
stenn 1.2     | certain interval, which is what we call smearing the leap second.
stenn 1.1     | 
stenn 1.1     | 
stenn 1.1     | The Motivation to Implement Leap Smearing
stenn 1.1     | -----------------------------------------
stenn 1.2     | Here is some historical background for ntpd, related to smearing/slewing
stenn 1.2     | time.
stenn 1.2     | 
stenn 1.2     | Up to ntpd 4.2.4, if kernel support for leap seconds was either not
stenn 1.2     | available or was not enabled, ntpd didn't care about the leap second at all.
stenn 1.2     | So if ntpd was run with -x and thus kernel support wasn't used, ntpd saw a
stenn 1.2     | sudden 1 s offset after the leap second and normally would have stepped the
stenn 1.2     | time by -1 s a few minutes later.  However, 'ntpd -x' does not step the time
stenn 1.2     | but "slews" the 1-second correction, which takes 33 minutes and 20 seconds
stenn 1.2     | to complete.  This could be considered a bug, but certainly this was only an
stenn 1.2     | accidental behavior.
stenn 1.2     | 
stenn 1.2     | However, as we learned in the discussion in http://bugs.ntp.org/2745, this
stenn 1.2     | behavior was very much appreciated since indeed the time was never stepped
stenn 1.2     | back, and even though the start of the slewing was somewhat undefined and
stenn 1.2     | depended on the poll interval.  The system time was off by 1 second for
stenn 1.2     | several minutes before slewing even started.
stenn 1.2     | 
stenn 1.2     | In ntpd 4.2.6 some code was added which let ntpd step the time at UTC
stenn 1.2     | midnight to insert a leap second, if kernel support was not used.
stenn 1.2     | Unfortunately this also happened if ntpd was started with -x, so the folks
stenn 1.2     | who expected that the time was never stepped when ntpd was run with -x found
stenn 1.2     | this wasn't true anymore, and again from the discussion in NTP bug 2745 we
stenn 1.2     | learn that there were even some folks who patched ntpd to get the 4.2.4
stenn 1.2     | behavior back.
stenn 1.2     | 
stenn 1.2     | In 4.2.8 the leap second code was rewritten and some enhancements were
stenn 1.2     | introduced, but the resulting code still showed the behavior of 4.2.6,
stenn 1.2     | i.e. ntpd with -x would still step the time.  This has only recently been
stenn 1.2     | fixed in the current ntpd stable code, but this fix is only available with a
stenn 1.2     | certain patch level of ntpd 4.2.8.
stenn 1.2     | 
stenn 1.2     | So a possible solution for users who were looking for a way to come over the
stenn 1.2     | leap second without the time being stepped could have been to check the
stenn 1.2     | version of ntpd installed on each of their systems.  If it's still 4.2.4 be
stenn 1.2     | sure to start the client ntpd with -x.  If it's 4.2.6 or 4.2.8 it won't work
stenn 1.2     | anyway except if you had a patched ntpd version instead of the original
stenn 1.2     | version.  So you'd need to upgrade to the current -stable code to be able to
stenn 1.2     | run ntpd with -x and get the desired result, so you'd still have the
stenn 1.2     | requirement to check/update/configure every single machine in your network
stenn 1.2     | that runs ntpd.
stenn 1.2     | 
stenn 1.2     | Google's leap smear approach is a very efficient solution for this, for
stenn 1.2     | sites that do not require correct timestamps for legal purposes.  You just
stenn 1.2     | have to take care that your NTP servers support leap smearing and configure
stenn 1.2     | those few servers accordingly.  If the smear interval is long enough so that
stenn 1.2     | NTP clients can follow the smeared time it doesn't matter at all which
stenn 1.2     | version of ntpd is installed on a client machine, it just works, and it even
stenn 1.2     | works around kernel bugs due to the leap second.
stenn 1.2     | 
stenn 1.2     | Since all clients follow the same smeared time the time difference between
stenn 1.2     | the clients during the smear interval is as small as possible, compared to
stenn 1.2     | the -x approach.  The current leap second code in ntpd determines the point
stenn 1.2     | in system time when the leap second is to be inserted, and given a
stenn 1.2     | particular smear interval it's easy to determine the start point of the
stenn 1.2     | smearing, and the smearing is finished when the leap second ends, i.e. the
stenn 1.2     | next UTC day begins.
stenn 1.2     | 
stenn 1.2     | The maximum error doesn't exceed what you'd get with the old smearing caused
stenn 1.2     | by -x in ntpd 4.2.4, so if users could accept the old behavior they would
stenn 1.2     | even accept the smearing at the server side.
stenn 1.2     | 
stenn 1.2     | In order to affect the local timekeeping as little as possible the leap
stenn 1.2     | smear support currently implemented in ntpd does not affect the internal
stenn 1.2     | system time at all.  Only the timestamps and refid in outgoing reply packets
stenn 1.2     | *to clients* are modified by the smear offset, so this makes sure the basic
stenn 1.2     | functionality of ntpd is not accidentally broken.  Also peer packets
stenn 1.2     | exchanged with other NTP servers are based on the real UTC system time and
stenn 1.2     | the normal refid, as usual.
stenn 1.2     | 
stenn 1.2     | The leap smear implementation is optionally available in ntp-4.2.8p3 and
stenn 1.2     | later, and the changes can be tracked via http://bugs.ntp.org/2855.
stenn 1.1     | 
stenn 1.1     | 
stenn 1.1     | Using NTP's Leap Second Smearing
stenn 1.1     | --------------------------------
stenn 1.2     | - Leap Second Smearing MUST NOT be used for public servers, e.g. servers
stenn 1.2     | provided by metrology institutes, or servers participating in the NTP pool
stenn 1.2     | project.  There would be a high risk that NTP clients get the time from a
stenn 1.2     | mixture of smearing and non-smearing NTP servers which could result in
stenn 1.2     | undefined client behavior.  Instead, leap second smearing should only be
stenn 1.2     | configured on time servers providing dedicated clients with time, if all
stenn 1.2     | those clients can accept smeared time.
stenn 1.2     | 
stenn 1.2     | - Leap Second Smearing is NOT configured by default.  The only way to get
stenn 1.2     | this behavior is to invoke the ./configure script from the NTP source code
stenn 1.2     | package with the --enable-leap-smear parameter before the executables are
stenn 1.2     | built.
stenn 1.2     | 
stenn 1.2     | - Even if ntpd has been compiled to enable leap smearing support, leap
stenn 1.2     | smearing is only done if explicitly configured.
stenn 1.2     | 
stenn 1.2     | - The leap smear interval should be at least several hours' long, and up to
stenn 1.2     | 1 day (86400s).  If the interval is too short then the applied smear offset
stenn 1.2     | is applied too quickly for clients to follow.  86400s (1 day) is a good
stenn 1.2     | choice.
stenn 1.2     | 
stenn 1.2     | - If several NTP servers are set up for leap smearing then the *same* smear
stenn 1.2     | interval should be configured on each server.
stenn 1.2     | 
stenn 1.2     | - Smearing NTP servers DO NOT send a leap second warning flag to client time
stenn 1.2     | requests.  Since the leap second is applied gradually the clients don't even
stenn 1.2     | notice there's a leap second being inserted, and thus there will be no log
stenn 1.2     | message or similar related to the leap second be visible on the clients.
stenn 1.2     | 
stenn 1.2     | - Since clients don't (and must not) become aware of the leap second at all,
stenn 1.2     | clients getting the time from a smearing NTP server MUST NOT be configured
stenn 1.2     | to use a leap second file.  If they had a leap second file they would apply
stenn 1.2     | the leap second twice: the smeared one from the server, plus another one
stenn 1.2     | inserted by themselves due to the leap second file.  As a result, the
stenn 1.2     | additional correction would soon be detected and corrected/adjusted.
stenn 1.2     | 
stenn 1.2     | - Clients MUST NOT be configured to poll both smearing and non-smearing NTP
stenn 1.2     | servers at the same time.  During the smear interval they would get
stenn 1.2     | different times from different servers and wouldn't know which server(s) to
stenn 1.2     | accept.
stenn 1.1     | 
stenn 1.1     | 
stenn 1.1     | Setting Up A Smearing NTP Server
stenn 1.1     | --------------------------------
stenn 1.2     | If an NTP server should perform leap smearing then the leap smear interval
stenn 1.2     | (in seconds) needs to be specified in the NTP configuration file ntp.conf,
stenn 1.2     | e.g.:
stenn 1.2     | 
stenn 1.2     |  leapsmearinterval 86400
stenn 1.2     | 
stenn 1.2     | Please keep in mind the leap smear interval should be between several and 24
stenn 1.2     | hours' long.  With shorter values clients may not be able to follow the
stenn 1.2     | drift caused by the smeared time, and with longer values the discrepancy
stenn 1.2     | between system time and UTC will cause more problems when reconciling
stenn 1.2     | timestamp differences.
stenn 1.2     | 
stenn 1.2     | When ntpd starts and a smear interval has been specified then a log message
stenn 1.2     | is generated, e.g.:
stenn 1.2     | 
stenn 1.2     |  ntpd[31120]: config: leap smear interval 86400 s
stenn 1.2     | 
stenn 1.2     | While ntpd is running with a leap smear interval specified the command:
stenn 1.2     | 
stenn 1.2     |  ntpq -c rv
stenn 1.2     | 
stenn 1.2     | reports the smear status, e.g.:
stenn 1.2     | 
stenn 1.2     | # ntpq -c rv
stenn 1.2     | associd=0 status=4419 leap_add_sec, sync_uhf_radio, 1 event, leap_armed,
stenn 1.2     | version="ntpd 4.2.8p3-RC1@1.3349-o Mon Jun 22 14:24:09 UTC 2015 (26)",
stenn 1.2     | processor="i586", system="Linux/3.7.1", leap=01, stratum=1,
stenn 1.2     | precision=-18, rootdelay=0.000, rootdisp=1.075, refid=MRS,
stenn 1.2     | reftime=d93dab96.09666671 Tue, Jun 30 2015 23:58:14.036,
stenn 1.2     | clock=d93dab9b.3386a8d5 Tue, Jun 30 2015 23:58:19.201, peer=2335,
stenn 1.2     | tc=3, mintc=3, offset=-0.097015, frequency=44.627, sys_jitter=0.003815,
stenn 1.2     | clk_jitter=0.451, clk_wander=0.035, tai=35, leapsec=201507010000,
stenn 1.2     | expire=201512280000, leapsmearinterval=86400, leapsmearoffset=-932.087
stenn 1.2     | 
stenn 1.2     | In the example above 'leapsmearinterval' reports the configured leap smear
stenn 1.2     | interval all the time, while the 'leapsmearoffset' value is 0 outside the
stenn 1.2     | interval and increases from 0 to -1000 ms over the interval.  So this can be
stenn 1.2     | used to monitor if and how the time sent to clients is smeared.  With a
stenn 1.2     | leapsmearoffset of -.932087, the refid reported in smeared packets would be
stenn 1.2     | 254.196.88.176.