MySQL的Seconds_Behind_Master 是如何计算的

Source

Seconds_Behind_Master 如何计算

以下是源码中关于延迟时间计算方法的注释说明

# 位于rpl_mi.h中定义clock_diff_with_master附近（翻阅了5.6.34和5.7.22 两个版本，对于复制延迟的计算公式两者一致）
# 从源码注释上来看，复制延迟的计算公式为 clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master
# 该公式的含义为：从库的当前系统（主机）时间 - 从库 SQL 线程正在执行的event的时间戳 - 主从库的系统（主机）之间的时间差
/*
    The difference in seconds between the clock of the master and the clock of
    the slave (second - first). It must be signed as it may be <0 or >0.
    clock_diff_with_master is computed when the I/O thread starts; for this the
    I/O thread does a SELECT UNIX_TIMESTAMP() on the master.
    "how late the slave is compared to the master" is computed like this:
    clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master

 */
# clock_diff_with_master 值为主从服务器的主机时间差，该值只在I/O线程启动时计算一次，后续每次计算Seconds_Behind_Master字段值时，是直接复用这个计算结果，每次重启I/O线程时该值会重新计算
 long clock_diff_with_master;  

# master_row[0] 为从库在主库上执行SELECT UNIX_TIMESTAMP()的操作，clock_diff_with_master为主从库主机的时间差计算结果
 mi->clock_diff_with_master=
     (long) (time((time_t*) 0) - strtoul(master_row[0], 0, 10));

# 从rpl_slave.cc 文件中启动 I/O 线程时可以看出：
     start_slave_thread-> # 启动start slave
           handle_slave_io-> # 启动start io thread
               get_master_version_and_clock # 获取当前slave和主机之间的时间差(clock_diff_with_master)

以下是源码中关于Seconds_Behind_Master 计算结果的一些判定值

/*
  The pseudo code to compute Seconds_Behind_Master:  # 阐明这是一段注释关于如何计算Seconds_Behind_Master的伪代码
  if (SQL thread is running)  # 如果SQL线程正在运行，则进入这个if判断内，假设这里标记为if one
  {
    if (SQL thread processed all the available relay log)  # 如果SQL线程应用完成了所有可用的relay log，则进入这个if判断内，假设这里标记为if two
    {
      if (IO thread is running)  # 如果I/O线程正在运行，则进入这个if判断内，假设这里标记为if three
         print 0;  # 如果if one/two/three三个条件都为真，则延迟值判定为0
      else
         print NULL;  # 如果if one/two为真，if three为假，则延迟值判定为NULL
    }
     else
       compute Seconds_Behind_Master;  # 如果if one为真，if two为假，则执行公式计算延迟值
   }
   else
    print NULL;  # 如果if one为假，则延迟值判定为NULL
*/

if (mi->rli->slave_running)
{
 /*
    Check if SQL thread is at the end of relay log
    Checking should be done using two conditions
    condition1: compare the log positions and
    condition2: compare the file names (to handle rotation case)
 */
 if ((mi->get_master_log_pos() == mi->rli->get_group_master_log_pos()) &&
     (!strcmp(mi->get_master_log_name(), mi->rli->get_group_master_log_name())))
 {
   if (mi->slave_running == MYSQL_SLAVE_RUN_CONNECT)
     protocol->store(0LL);
   else
     protocol->store_null();
 }
 else
 {
   long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp)
                    - mi->clock_diff_with_master);
   /*
     Apparently on some systems time_diff can be <0. Here are possible
     reasons related to MySQL:
     - the master is itself a slave of another master whose time is ahead.
     - somebody used an explicit SET TIMESTAMP on the master.
     Possible reason related to granularity-to-second of time functions
     (nothing to do with MySQL), which can explain a value of -1:
     assume the master's and slave's time are perfectly synchronized, and
     that at slave's connection time, when the master's timestamp is read,
     it is at the very end of second 1, and (a very short time later) when
     the slave's timestamp is read it is at the very beginning of second
     2. Then the recorded value for master is 1 and the recorded value for
     slave is 2. At SHOW SLAVE STATUS time, assume that the difference
     between timestamp of slave and rli->last_master_timestamp is 0
     (i.e. they are in the same second), then we get 0-(2-1)=-1 as a result.
     This confuses users, so we don't go below 0: hence the max().

     last_master_timestamp == 0 (an "impossible" timestamp 1970) is a
     special marker to say "consider we have caught up".
   */
   protocol->store((longlong)(mi->rli->last_master_timestamp ?
                                max(0L, time_diff) : 0));  # time_diff这里其实就是最终计算的Seconds_Behind_Master 值，如果为负数，则直接归零
 }
}

从源码注释上来看，复制延迟的计算公式为

clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master

该公式的含义为：从库的当前系统（主机）时间 - 从库 SQL 线程正在执行的event的时间戳 - 主从库的系统（主机）之间的时间差

clock_diff_with_master 值为主从服务器的主机时间差，该值只在I/O线程启动时计算一次，后续每次计算Seconds_Behind_Master字段值时，是直接复用这个计算结果，每次重启I/O线程时该值会重新计算

  long time_diff= ((long)(time(0) - mi->rli->last_master_timestamp)
                   - mi->clock_diff_with_master);

这行代码计算了一个名为 time_diff 的长整型变量，其值是当前时间（通过 time(0) 获取）与主实例上最后一个事件的时间戳（mi->rli->last_master_timestamp）之间的差异减去主从实例之间的时钟差异（mi->clock_diff_with_master）。这个 time_diff 的值表示当前时间与主实例最后一个事件的时间戳之间的秒数差异，减去主从实例之间的时钟差异。

显示的值分别代表什么

显示NULL

（1）当从库没有任何需要处理的更新时，如果I/O和SQL线程状态都为Yes，则此字段显示为0，如果有任意一个线程状态不为Yes，则此字段显示为NULL

（2）如果从库的SQL线程没运行、SQL线程正在运行且已经消费完了所有的relay log且I/O线程没有运行，则该字段显示为NULL

（3）如果I/O线程已经停止，但还存在着relay log未重放完成时，仍然会显示出复制延迟时间，直到所有relay log被重放完成之后，显示为NULL

显示0

（1）如果SQL线程和I/O线程都运行着，但是处于空闲状态（SQL线程已经重放完了I/O线程产生的

relay log），则该字段显示为0

（2）当从库没有任何需要处理的更新时，如果I/O和SQL线程状态都为Yes，则此字段显示为0

显示数值

代表延迟的数值

clock_of_slave - last_timestamp_executed_by_SQL_thread - clock_diff_with_master

这种计算方式的局限性

网络延迟的问题

实际上，这个字段是度量从库SQL线程和I/O线程之间的时间差，单位为秒，如果主备之间的网络非常快，那么从库的I/O线程读取的主库binlog会与主库中最新的binlog非常接近，所以这样计算得来得值就可以作为主备之间的数据延迟时间，但是如果主备之间的网络非常慢，可能导致从库SQL线程正在重放的主库binlog 非常接近从库I/O线程读取的主库binlog，而I/O线程因为网络慢的原因可能读取的主库binlog远远落后于主库最新的binlog，此时，这么计算得来的值是不可靠的，尽管这个时候有可能该字段显示为0，但实际上可能从库已经落后于主库非常多了。所以，对于网络比较慢的情况，该值并不可靠。

如何解决网络问题带来的误判

复制实例的SQL线程可能经常追赶读取速度较慢的复制实例的I/O线程，因此Seconds_Behind_Master经常显示为0，即使I/O线程相对于主实例是延迟的。换句话说，该列仅在快速网络中才有用。

那如何判断是否是网络的问题呢，这就取决于是IO线程的延迟还是SQL线程延迟的问题。

如何判断是否IO线程还是SQL线程的延迟

通过如下两对值进行比对

第一对：( File , Position ) & ( Master_Log_File , Read_Master_Log_Pos )

这里面，

( File , Position ) 记录了主库 binlog 的位置。
( Master_Log_File , Read_Master_Log_Pos ) 记录了 IO 线程当前正在接收的二进制日志事件在主库 binlog 中的位置。

如果 ( File , Position ) 大于 ( Master_Log_File , Read_Master_Log_Pos ) ，则意味着 IO 线程存在延迟。

第二对：( Master_Log_File , Read_Master_Log_Pos ) & ( Relay_Master_Log_File , Exec_Master_Log_Pos )

这里面，( Relay_Master_Log_File, Exec_Master_Log_Pos ) 记录了 SQL 线程当前正在重放的二进制日志事件在主库 binlog 的位置。

如果 ( Relay_Master_Log_File, Exec_Master_Log_Pos ) < ( Master_Log_File, Read_Master_Log_Pos ) ，则意味着 SQL 线程存在延迟。

主机时间可修改

如果主库与从库的server自身的时间不一致，那么，只要从库复制线程启动之后，没有做过任何时间变更，那么这个字段的值也可以正常计算，但是如果修改了server的时间，则可能导致时钟偏移，从而导致这个计算值不可靠

如何解决主机时间带来的误判

主机上执行date命令查看当前时间

主机命令行执行  
date


登录数据库执行
select now(),unix_timestamp(),from_unixtime(unix_timestamp());


确保两个查询的时间符合当前的时间值

如果系统时间不符合当前时间，则修改时间为当前正确时间，再重启复制线程。

前面提到过clock_diff_with_master 值为主从服务器的主机时间差，该值只在I/O线程启动时计算一次，后续每次计算Seconds_Behind_Master字段值时，是直接复用这个计算结果，每次重启I/O线程时该值会重新计算。因此不用担心主从的主机时间完全一致。只需要确保与当前实际时间没有太大误差即可，重启复制线程会覆盖clock_diff_with_master的值。

大事务的情况

当SQL线程重放大事务时，SQL线程的时间戳更新相当于被暂停了（因为一个大事务的event在重放时需要很长时间才能完成，虽然这个大事务也可能会有很多event，但是这些event的时间戳可能全都相同），此时，根据计算公式可以得出，无论主库是否有新的数据写入，从库复制延迟仍然会持续增大（也就是说此时的复制延迟值是不可靠的）。所以就会出现主库停止写入之后，从库复制延迟逐渐增大到某个最高值之后突然变为0的情况。

多线程复制下的波动

多线程复制，则此值是基于Exec_Master_Log_Pos点的event时间戳来计算的，因此可能不会反映从库最近提交的事务的位置。

从库不是read_only

如果从库上通过客户端连接进入并直接更新数据，这可能导致该字段的值随机波动，因为有时候event来源于主库，有时候来源于从库直接更新产生的event，而这个字段的值会受到直接更新产生的event的影响。

总结

对于主从库主机时间不一致的情况，在I/O线程第一次启动时，会计算主从之间的主机时间差，在后续计算复制延迟时，会把这个时间差减掉，这样就可以保证正确获取到复制延迟时间，但是该时间差只在I/O线程启动时才会进行计算，所以，当I/O线程启动之后修改了主从库的主机时间，则根据计算公式，会导致复制延迟时间不可靠，但是当I/O线程重启之后就可以恢复（因为I/O线程重启时，主从之间的时间差重新计算了）
在计算复制延迟时（执行 SHOW SLAVE STATUS语句时会进行计算），对Seconds_Behind_Master计算结果做一些判定（上文源码介绍部分的伪代码注释里有讲解过，这里再啰嗦一下）：

如果 I/O 和 SQL线程同时为 Yes，且SQL线程没有做任何事情（没有需要被执行的event），此时直接判定复制延迟结果为0，不会走公式计算延迟时间，否则会走公式计算延迟时间（所以，在该前置条件下不会出现当主库没有写任何binlog event时，从库延迟不断加大的情况）

如果 SQL线程为Yes，且还存在着 I/O 线程已经读取的relay log未应用完成的，则会走公式计算延迟时间，而不管 I/O线程是否正在运行，但当SQL线程重放完成了所有relay log时，如果 I/O线程不为Yes，直接判定复制延迟结果为NULL

任何时候，如果SQL线程不为Yes，直接判定复制延迟结果为NULL。当计算出的复制延迟为负数时，直接归零