springcloud 和apollo配置中心一起使用时踩到的一个坑

场景描述：今天收到线上一个服务的instance deregistered的告警邮件，在springboot admin上确实那个instance不在线了。于是去服务器上jps看了一下，进程正常的，但是/health端点信息显示eureka为down。查看error.log，并没有错误日志，查看warn.log，发现里面有些许的警告日志如下：[2018-09-20T19:12:24.159...

xiao_jun_0820

11578人浏览 · 2018-09-23 18:06:39

xiao_jun_0820 · 2018-09-23 18:06:39 发布

场景描述：

今天收到线上一个服务的instance deregistered的告警邮件，在springboot admin上确实那个instance不在线了。于是去服务器上jps看了一下，进程正常的，但是/health端点信息显示eureka为down。查看error.log，并没有错误日志，查看warn.log，发现里面有些许的警告日志如下：

[2018-09-20T19:12:24.159+08:00] WARN: [TID: N/A] [Apollo-Config-1] DiscoveryClient.java:1291 - Saw local status change event StatusChangeEvent [timestamp=1537441944159, current=DOWN, previous=UP]
[2018-09-20T19:12:24.160+08:00] WARN: [TID: N/A] [Apollo-Config-1] DiscoveryClient.java:1291 - Saw local status change event StatusChangeEvent [timestamp=1537441944160, current=UP, previous=DOWN]
[2018-09-20T19:12:24.160+08:00] WARN: [TID: N/A] [DiscoveryClient-InstanceInfoReplicator-0] DiscoveryClient.java:1291 - Saw local status change event StatusChangeEvent [timestamp=1537441944160, current=DOWN, previous=UP]
[2018-09-20T19:12:24.160+08:00] WARN: [TID: N/A] [DiscoveryClient-InstanceInfoReplicator-0] InstanceInfoReplicator.java:98 - Ignoring onDemand update due to rate limiter

打开InstanceInfoReplicator.java查看了一下源代码，发现是由于单位分钟内向eureka server同步次数超过限制，导致有一个StatusChangeEvent被忽略掉了。

从上面的警告日志也可以看出一些端倪，就是在很短的时间内，收到了2个StatusChangeEvent事件，一次是从UP-》DOWN，一次是从DOWN->UP。apollo自身也使用了eureka，所以apollo也处理了这个事件两次（第一条和第二条日志，在Apollo-Config线程组中）.然后springcloud内部也处理了这个事件(在DiscoveryClient-InstanceInfoReplicator线程组中)，但是只处理了一个从UP->DOWN的事件，另外一个从DOWN->UP的事件由于rate limiter限制导致被忽略了，最终导致这个instance的health永远是down状态。

以下是处理statusChangeEvent的代码片段，来自com.netflix.discovery.DiscoveryClient#initScheduledTasks方法：

 private void initScheduledTasks() {
        if (clientConfig.shouldFetchRegistry()) {
            // registry cache refresh timer
            int registryFetchIntervalSeconds = clientConfig.getRegistryFetchIntervalSeconds();
            int expBackOffBound = clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
            scheduler.schedule(
                    new TimedSupervisorTask(
                            "cacheRefresh",
                            scheduler,
                            cacheRefreshExecutor,
                            registryFetchIntervalSeconds,
                            TimeUnit.SECONDS,
                            expBackOffBound,
                            new CacheRefreshThread()
                    ),
                    registryFetchIntervalSeconds, TimeUnit.SECONDS);
        }

        if (clientConfig.shouldRegisterWithEureka()) {
            int renewalIntervalInSecs = instanceInfo.getLeaseInfo().getRenewalIntervalInSecs();
            int expBackOffBound = clientConfig.getHeartbeatExecutorExponentialBackOffBound();
            logger.info("Starting heartbeat executor: " + "renew interval is: " + renewalIntervalInSecs);

            // Heartbeat timer
            scheduler.schedule(
                    new TimedSupervisorTask(
                            "heartbeat",
                            scheduler,
                            heartbeatExecutor,
                            renewalIntervalInSecs,
                            TimeUnit.SECONDS,
                            expBackOffBound,
                            new HeartbeatThread()
                    ),
                    renewalIntervalInSecs, TimeUnit.SECONDS);

            // InstanceInfo replicator
            instanceInfoReplicator = new InstanceInfoReplicator(
                    this,
                    instanceInfo,
                    clientConfig.getInstanceInfoReplicationIntervalSeconds(),
                    2); // burstSize

            statusChangeListener = new ApplicationInfoManager.StatusChangeListener() {
                @Override
                public String getId() {
                    return "statusChangeListener";
                }

                @Override
                public void notify(StatusChangeEvent statusChangeEvent) {
                    if (InstanceStatus.DOWN == statusChangeEvent.getStatus() ||
                            InstanceStatus.DOWN == statusChangeEvent.getPreviousStatus()) {
                        // log at warn level if DOWN was involved
                        logger.warn("Saw local status change event {}", statusChangeEvent);
                    } else {
                        logger.info("Saw local status change event {}", statusChangeEvent);
                    }
                    instanceInfoReplicator.onDemandUpdate();
                }
            };

            if (clientConfig.shouldOnDemandUpdateStatusChange()) {
                applicationInfoManager.registerStatusChangeListener(statusChangeListener);
            }

            instanceInfoReplicator.start(clientConfig.getInitialInstanceInfoReplicationIntervalSeconds());
        } else {
            logger.info("Not registering with Eureka server per configuration");
        }
    }

中的StatusChangeListener的notify方法。里面调用了instanceInfoReplicator.onDemandUpdate()方法，尝试通过令牌桶算法来限流:

 public boolean onDemandUpdate() {
        if (rateLimiter.acquire(burstSize, allowedRatePerMinute)) {
            if (!scheduler.isShutdown()) {
                scheduler.submit(new Runnable() {
                    @Override
                    public void run() {
                        logger.debug("Executing on-demand update of local InstanceInfo");
    
                        Future latestPeriodic = scheduledPeriodicRef.get();
                        if (latestPeriodic != null && !latestPeriodic.isDone()) {
                            logger.debug("Canceling the latest scheduled update, it will be rescheduled at the end of on demand update");
                            latestPeriodic.cancel(false);
                        }
    
                        InstanceInfoReplicator.this.run();
                    }
                });
                return true;
            } else {
                logger.warn("Ignoring onDemand update due to stopped scheduler");
                return false;
            }
        } else {
            logger.warn("Ignoring onDemand update due to rate limiter");
            return false;
        }
    }

警告日志中的最后一行日志就是来自这个方法。

源码里面burstSize写死了等于2，没法改变，allowedRatePerMinute可以通过：

eureka.client.instance-info-replication-interval-seconds

参数来改变，默认是30秒同步一次。

每分钟限制同步多少次是在InstanceInfoReplicator类的构造函数中计算的:

    InstanceInfoReplicator(DiscoveryClient discoveryClient, InstanceInfo instanceInfo, int replicationIntervalSeconds, int burstSize) {
        this.discoveryClient = discoveryClient;
        this.instanceInfo = instanceInfo;
        this.scheduler = Executors.newScheduledThreadPool(1,
                new ThreadFactoryBuilder()
                        .setNameFormat("DiscoveryClient-InstanceInfoReplicator-%d")
                        .setDaemon(true)
                        .build());

        this.scheduledPeriodicRef = new AtomicReference<Future>();

        this.started = new AtomicBoolean(false);
        this.rateLimiter = new RateLimiter(TimeUnit.MINUTES);
        this.replicationIntervalSeconds = replicationIntervalSeconds;
        this.burstSize = burstSize;

        this.allowedRatePerMinute = 60 * this.burstSize / this.replicationIntervalSeconds;
        logger.info("InstanceInfoReplicator onDemand update allowed rate per min is {}", allowedRatePerMinute);
    }

this.allowedRatePerMinute = 60* this.burstSize / this.replicationIntervalSeconds ,默认为: 60*2/30=4。所以导致了第四个StatusChangedEvent被丢弃了。

那我建议在和apollo一起使用时，将eureka.client.instance-info-replication-interval-seconds改成24，即每分钟最多可以执行60*2/24=5次复制instance信息的请求，这样就可能避免该问题的发生。

向您推荐>>Eolink开发者社区

权威｜前沿｜技术｜干货｜国内首个API全生命周期开发者社区

更多推荐

深入理解 Mocha 测试框架：从零实现一个 Mocha

前言什么是自动化测试自动化测试在很多团队中都是Devops环节中很难执行起来的一个环节，主要原因在于测试代码的编写工作很难抽象，99%的场景都需要和业务强绑定，而且写测试代码的编写工作量往往比编写实际业务代码的工作量更多。在一些很多业务场景中投入产出比很低，适合写自动化测试的应该是那些中长期业务以及一些诸如组件一样的基础库。自动化测试是个比较大的概念，其中分类也比较多，比如单元测试，端对端测试，集

云原生

ELK实现containerd的容器日志采集展示【基于logging的全栈监测】

企业级ELK Stack构建介绍

云原生

(20200916 Solved)docker-compose up创建容器自动退出

问题描述如题，创建容器后自动退出了。并且docker start container无效解决方案原因是缺失了控制终端的配置，需要在docker-compose.yml中增加tty:true ，有时候这样也不行，需要再增加一个command:/bin/bash，命令不一定是这个，需要是一个不会退出的命令，然后用-d后台启动容器。Referencesdocker-compose启动容器后自动退出...