目录java
版本:git
SpringBoot 1.5.4.RELEASEgithub
SpringCloud Dalston.RELEASEspring
本文主要讨论的是微服务注册到Eureka注册中心,并使用Zuul网关负载访问的状况,如何停机可使用户无感知。api
kill -9
属于强杀进程,首先微服务正在执行的任务被强制中断了;其次,没有经过Eureka注册中心服务下线,Zuul网关做为Eureka Client仍保存这个服务的路由信息,会继续调用服务,Http请求返回500,后台异常是Connection refuse链接拒绝缓存
这种状况默认最长须要等待:安全
90s(微服务在Eureka Server上租约到期)app
+less
30s(Eureka Server服务列表刷新到只读缓存ReadOnlyMap的时间,Eureka Client默认读此缓存)ide
+
30s(Zuul做为Eureka Client默认每30秒拉取一次服务列表)
+
30s(Ribbon默认动态刷新其ServerList的时间间隔)
= 180s,即 3分钟
总结:
此种方式既会致使正在执行中的任务没法执行完,又会致使服务没有从Eureka Server摘除,并给Eureka Client时间刷新到服务列表,致使了经过Zuul仍然调用已停掉服务报500错误的状况,不推荐。
首先,kill
等于kill -15
,根据man kill
的描述信息
The command kill sends the specified signal to the specified process or process group. If no signal is specified, the TERM signal is sent.
即kill没有执行信号等同于TERM(终止,termination)
而kill -l
查看信号编号与信号之间的关系,kill -15
就是 SIGTERM,TERM信号
给JVM进程发送TERM终止信号时,会调用其注册的 Shutdown Hook,当SpringBoot微服务启动时也注册了 Shutdown Hook
而直接调用/shutdown
端点本质和使用 Shutdown Hook是同样的,因此不管是使用kill
或 kill -15
,仍是直接使用/shutdown
端点,都会调用到JVM注册的Shutdown Hook
注意:
启用 /shutdown端点,须要以下配置
endpoints.shutdown.enabled = true
endpoints.shutdown.sensitive = false
全部问题都导向了 Shutdown Hook会执行什么??
经过查询项目组使用Runtime.getRuntime().addShutdownHook(Thread shutdownHook)
的地方,发现ribbon注册了一些Shutdown Hook,但这不是咱们此次关注的,咱们关注的是Spring的应用上下文抽象类AbstractApplicationContext
注册了针对整个Spring容器的Shutdown Hook,在执行Shutdown Hook时的逻辑在 AbstractApplicationContext#doClose()
//## org.springframework.context.support.AbstractApplicationContext#registerShutdownHook /** * Register a shutdown hook with the JVM runtime, closing this context * on JVM shutdown unless it has already been closed at that time. * <p>Delegates to {@code doClose()} for the actual closing procedure. * @see Runtime#addShutdownHook * @see #close() * @see #doClose() */ @Override public void registerShutdownHook() { if (this.shutdownHook == null) { // No shutdown hook registered yet. // 注册shutdownHook,线程真正调用的是 doClose() this.shutdownHook = new Thread() { @Override public void run() { synchronized (startupShutdownMonitor) { doClose(); } } }; Runtime.getRuntime().addShutdownHook(this.shutdownHook); } } //## org.springframework.context.support.AbstractApplicationContext#doClose /** * Actually performs context closing: publishes a ContextClosedEvent and * destroys the singletons in the bean factory of this application context. * <p>Called by both {@code close()} and a JVM shutdown hook, if any. * @see org.springframework.context.event.ContextClosedEvent * @see #destroyBeans() * @see #close() * @see #registerShutdownHook() */ protected void doClose() { if (this.active.get() && this.closed.compareAndSet(false, true)) { if (logger.isInfoEnabled()) { logger.info("Closing " + this); } // 注销注册的MBean LiveBeansView.unregisterApplicationContext(this); try { // Publish shutdown event. // 发送ContextClosedEvent事件,会有对应此事件的Listener处理相应的逻辑 publishEvent(new ContextClosedEvent(this)); } catch (Throwable ex) { logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex); } // Stop all Lifecycle beans, to avoid delays during individual destruction. // 调用全部 Lifecycle bean 的 stop() 方法 try { getLifecycleProcessor().onClose(); } catch (Throwable ex) { logger.warn("Exception thrown from LifecycleProcessor on context close", ex); } // Destroy all cached singletons in the context's BeanFactory. // 销毁全部单实例bean destroyBeans(); // Close the state of this context itself. closeBeanFactory(); // Let subclasses do some final clean-up if they wish... // 调用子类的 onClose() 方法,好比 EmbeddedWebApplicationContext#onClose() onClose(); this.active.set(false); } }
AbstractApplicationContext#doClose()
的关键点在于
而ContextClosedEvent事件的Listener有不少,实现了Lifecycle生命周期接口的bean也不少,但其中咱们只关心一个,即 EurekaAutoServiceRegistration
,它即监听了ContextClosedEvent事件,也实现了Lifecycle接口
//## org.springframework.cloud.netflix.eureka.serviceregistry.EurekaAutoServiceRegistration public class EurekaAutoServiceRegistration implements AutoServiceRegistration, SmartLifecycle, Ordered { // lifecycle接口的 stop() @Override public void stop() { this.serviceRegistry.deregister(this.registration); this.running.set(false); // 设置liffecycle的running标示为false } // ContextClosedEvent事件监听器 @EventListener(ContextClosedEvent.class) public void onApplicationEvent(ContextClosedEvent event) { // register in case meta data changed stop(); } }
如上能够看到,EurekaAutoServiceRegistration
中对 ContextClosedEvent事件 和 Lifecycle接口 的实现都调用了stop()
方法,虽然都调用了stop()
方法,但因为各类对于状态的判断致使不会重复执行,如
EurekaServiceRegistry#deregister()
方法包含将实例状态置为DOWN 和 EurekaClient#shutdown() 两个操做,其中状态置为DOWN一次后,下一次只要状态不变就不会触发状态复制请求;EurekaClient#shutdown() 以前也会判断AtomicBoolean isShutdown
标志位下面具体看看EurekaServiceRegistry#deregister()
方法
//## org.springframework.cloud.netflix.eureka.serviceregistry.EurekaServiceRegistry#deregister @Override public void deregister(EurekaRegistration reg) { if (reg.getApplicationInfoManager().getInfo() != null) { if (log.isInfoEnabled()) { log.info("Unregistering application " + reg.getInstanceConfig().getAppname() + " with eureka with status DOWN"); } // 更改实例状态,会当即触发状态复制请求 reg.getApplicationInfoManager().setInstanceStatus(InstanceInfo.InstanceStatus.DOWN); //TODO: on deregister or on context shutdown // 关闭EurekaClient reg.getEurekaClient().shutdown(); } }
主要涉及两步:
StatusChangeListener
监听器,状态复制器InstanceInfoReplicator
会向Eureka Server发送状态更新请求。实际上状态更新和Eureka Client第一次注册时都是调用的DiscoveryClient.register()
,都是发送POST /eureka/apps/appID
请求到Eureka Server,只不过请求Body中的Instance实例状态不一样。执行完此步骤后,Eureka Server页面上变成EurekaClient.shutdown(): 整个Eureka Client的关闭操做包含如下几步
@PreDestroy @Override public synchronized void shutdown() { if (isShutdown.compareAndSet(false, true)) { logger.info("Shutting down DiscoveryClient ..."); // 一、注销全部 StatusChangeListener if ( statusChangeListener != null && applicationInfoManager != null) { applicationInfoManager.unregisterStatusChangeListener(statusChangeListener.getId()); } // 二、停掉全部定时线程(实例状态复制、心跳、client缓存刷新、监督线程) cancelScheduledTasks(); // If APPINFO was registered // 三、向Eureka Server注销实例 if (applicationInfoManager != null && clientConfig.shouldRegisterWithEureka()) { applicationInfoManager.setInstanceStatus(InstanceStatus.DOWN); unregister(); } // 四、各类shutdown关闭 if (eurekaTransport != null) { eurekaTransport.shutdown(); } heartbeatStalenessMonitor.shutdown(); registryStalenessMonitor.shutdown(); logger.info("Completed shut down of DiscoveryClient"); } }
unregister()
注销,其调用AbstractJerseyEurekaHttpClient#cancel()
方法,向Eureka Server发送DELETE /eureka/v2/apps/appID/instanceID
请求,DELETE请求成功后,Eureka Server页面上服务列表就没有当前实例信息了。注意: 因为在注销上一步已经停掉了定时心跳线程,不然注销后的下次心跳又会致使服务上线使用kill
、kill -15
或 /shutdown
端点都会调用Shutdown Hook,触发Eureka Instance实例的注销操做,这一步是没有问题的,优雅下线的第一步就是从Eureka注册中心注销实例,但关键问题是shutdown操做除了注销Eureka实例,还会立刻中止服务,而此时不管Eureka Server端,Zuul做为Eureka Client端都存在陈旧的缓存还未刷新,服务列表中仍然有注销下线的服务,经过zuul再次调用报500错误,后台是connection refuse链接拒绝异常,故不建议使用
另外,因为unregister
注销操做涉及状态更新DOWN 和 注销下线 两步操做,且是分两个线程执行的,实际注销时,根据两个线程执行完成的前后顺序,最终在Eureka Server上体现的结果不一样,但最终效果是相同的,通过一段时间的缓存刷新后,此服务实例不会再被调用
首先,启用/pause
端点须要以下配置
endpoints.pause.enabled = true endpoints.pause.sensitive = false
PauseEndpoint
是RestartEndPoint
的内部类
//## Restart端点 @ConfigurationProperties("endpoints.restart") @ManagedResource public class RestartEndpoint extends AbstractEndpoint<Boolean> implements ApplicationListener<ApplicationPreparedEvent> { // Pause端点 @ConfigurationProperties("endpoints") public class PauseEndpoint extends AbstractEndpoint<Boolean> { public PauseEndpoint() { super("pause", true, true); } @Override public Boolean invoke() { if (isRunning()) { pause(); return true; } return false; } } // 暂停操做 @ManagedOperation public synchronized void pause() { if (this.context != null) { this.context.stop(); } } }
如上可见,/pause
端点最终会调用Spring应用上下文的stop()
方法
//## org.springframework.context.support.AbstractApplicationContext#stop @Override public void stop() { // 一、全部实现Lifecycle生命周期接口 stop() getLifecycleProcessor().stop(); // 二、触发ContextStoppedEvent事件 publishEvent(new ContextStoppedEvent(this)); }
查看源码,并无发现有用的ContextStoppedEvent事件监听器,故stop的逻辑都在Lifecycle生命周期接口实现类的stop()
而getLifecycleProcessor().stop()
与 方式二中shutdown调用的 getLifecycleProcessor().doClose()
内部逻辑都是同样的,都是调用了DefaultLifecycleProcessor#stopBeans()
,进而调用Lifecycle接口实现类的stop(),以下
//## DefaultLifecycleProcessor @Override public void stop() { stopBeans(); this.running = false; } @Override public void onClose() { stopBeans(); this.running = false; }
因此,执行/pause
端点 和 shutdown时的其中一部分逻辑是同样的,依赖于EurekaServiceRegistry#deregister() 注销
,会依次执行:
DiscoveryClient#register()
,发送POST /eureka/apps/appID
请求到Eureka Server,只不过请求Body中的Instance实例状态不一样。执行完此步骤后,Eureka Server页面上实例状态变成DOWNEurekaClient.shutdown
AbstractJerseyEurekaHttpClient#cancel()
方法,向Eureka Server发送DELETE /eureka/v2/apps/appID/instanceID
请求,DELETE请求成功后,Eureka Server页面上服务列表就没有当前实例信息了。注意: 因为在注销上一步已经停掉了定时心跳线程,不然注销后的下次心跳又会致使服务上线/pause
端点能够用于让服务从Eureka Server下线,且与shutdown不同的是,其不会中止整个服务,致使整个服务不可用,只会作从Eureka Server注销的操做,最终在Eureka Server上体现的是 服务下线 或 服务状态为DOWN,且eureka client相关的定时线程也都中止了,不会再被定时线程注册上线,因此能够在sleep一段时间,待服务实例下线被像Zuul这种Eureka Client刷新到,再中止微服务,就能够作到优雅下线(中止微服务的时候可使用/shutdown端点
或 直接暴利kill -9
)
注意:
我实验的当前版本下,使用/pause
端点下线服务后,没法使用/resume
端点再次上线,即若是发版过程当中想从新注册服务,只有重启微服务。且为了从Eureka Server下线服务,将整个Spring容器stop(),也有点“兴师动众”
/resume
端点没法让服务再次上线的缘由是,虽然此端点会调用AbstractApplicationContext#start()
--> EurekaAutoServiceRegistration#start()
--> EurekaServiceRegistry#register()
,但因为以前已经中止了Eureka Client的全部定时任务线程,好比状态复制 和 心跳线程,从新注册时虽然有maybeInitializeClient(eurekaRegistration)
尝试从新启动EurekaClient,但并无成功(估计是此版本的Bug),致使UP状态并无发送给Eureka Server
可下线,没法从新上线
首先,在我使用的版本 /service-registry
端点默认是启用的,可是是sensitive
的,也就是须要认证才能访问
我试图找一个能够单独将/service-registry
的sensitive
置为false的方式,但在当前我用的版本没有找到,/service-registry
端点是经过 ServiceRegistryAutoConfiguration
自动配置的 ServiceRegistryEndpoint
,而 ServiceRegistryEndpoint
这个MvcEndpoint的isSensitive()
方法写死了返回true,并无给可配置的地方或者自定义什么实现,而后在ManagementWebSecurityAutoConfiguration
这个安全管理自动配置类中,将全部这些sensitive==true
的经过Spring Security的 httpSecurity.authorizeRequests().xxx.authenticated()
设置为必须认证后才能访问,目前我找到只能经过 management.security.enabled=false
这种将全部端点都关闭认证的方式才能够无认证访问
# 无认证访问 /service-registry 端点 management.security.enabled=false
/service-registry端点的实现类是ServiceRegistryEndpoint
,其暴露了两个RequestMapping,分别是GET 和 POST请求的/service-registry,GET请求的用于获取实例本地的status、overriddenStatus,POST请求的用于调用Eureka Server修改当前实例状态
//## org.springframework.cloud.client.serviceregistry.endpoint.ServiceRegistryEndpoint @ManagedResource(description = "Can be used to display and set the service instance status using the service registry") @SuppressWarnings("unchecked") public class ServiceRegistryEndpoint implements MvcEndpoint { private final ServiceRegistry serviceRegistry; private Registration registration; public ServiceRegistryEndpoint(ServiceRegistry<?> serviceRegistry) { this.serviceRegistry = serviceRegistry; } public void setRegistration(Registration registration) { this.registration = registration; } @RequestMapping(path = "instance-status", method = RequestMethod.POST) @ResponseBody @ManagedOperation public ResponseEntity<?> setStatus(@RequestBody String status) { Assert.notNull(status, "status may not by null"); if (this.registration == null) { return ResponseEntity.status(HttpStatus.NOT_FOUND).body("no registration found"); } this.serviceRegistry.setStatus(this.registration, status); return ResponseEntity.ok().build(); } @RequestMapping(path = "instance-status", method = RequestMethod.GET) @ResponseBody @ManagedAttribute public ResponseEntity getStatus() { if (this.registration == null) { return ResponseEntity.status(HttpStatus.NOT_FOUND).body("no registration found"); } return ResponseEntity.ok().body(this.serviceRegistry.getStatus(this.registration)); } @Override public String getPath() { return "/service-registry"; } @Override public boolean isSensitive() { return true; } @Override public Class<? extends Endpoint<?>> getEndpointType() { return null; } }
咱们关注的确定是POST请求的/service-registry,如上能够看到,其调用了 EurekaServiceRegistry.setStatus()
方法更新实例状态
//## org.springframework.cloud.netflix.eureka.serviceregistry.EurekaServiceRegistry public class EurekaServiceRegistry implements ServiceRegistry<EurekaRegistration> { // 更新状态 @Override public void setStatus(EurekaRegistration registration, String status) { InstanceInfo info = registration.getApplicationInfoManager().getInfo(); // 若是更新的status状态为CANCEL_OVERRIDE,调用EurekaClient.cancelOverrideStatus() //TODO: howto deal with delete properly? if ("CANCEL_OVERRIDE".equalsIgnoreCase(status)) { registration.getEurekaClient().cancelOverrideStatus(info); return; } // 调用EurekaClient.setStatus() //TODO: howto deal with status types across discovery systems? InstanceInfo.InstanceStatus newStatus = InstanceInfo.InstanceStatus.toEnum(status); registration.getEurekaClient().setStatus(newStatus, info); } }
EurekaServiceRegistry.setStatus()
方法支持像Eureka Server发送两种请求,分别是经过 EurekaClient.setStatus()
和 EurekaClient.cancelOverrideStatus()
来支持的,下面分别分析:
EurekaClient.setStatus()
:PUT /eureka/apps/appID/instanceID/status?value=xxx
到Eureka Server,这是注册中心对于 Take instance out of service 实例下线
而开放的Rest API,能够作到更新Eureka Server端的实例状态(status 和 overriddenstatus),通常会在发版部署时使用,让服务下线,更新为 OUT_OF_SERVICEEurekaClient.cancelOverrideStatus()
:
DELETE /eureka/v2/apps/appID/instanceID/status
到Eureka Server,用于清除覆盖状态,其实官方给出的是 DELETE /eureka/v2/apps/appID/instanceID/status?value=UP
,其中 value=UP
可选,是删除overriddenstatus为UNKNOWN以后,建议status回滚为何状态,但我当前使用版本里没有这个 value=UP
可选参数,就致使发送后,Eureka Server端 status=UNKNOWN 且 overriddenstatus=UNKNOWN,但UNKNOWN覆盖状态不一样的事,虽然心跳线程仍对其无做用,但注册(等同于UP状态更新)是可让服务上线的/service-registry
端点能够更新服务实例状态为 OUT_OF_SERVICE,再通过一段Server端、Client端缓存的刷新,使得服务不会再被调用,此时再经过/shutdown
端点 或 暴利的kill -9
中止服务进程,能够达到优雅下线的效果/service-registry
端点,只不过状态为 CANCEL_OVERRIDE,具体逻辑在 EurekaServiceRegistry.setStatus()
中,其等同于直接调用Eureka Server API : DELETE /eureka/v2/apps/appID/instanceID/status
,可让Server端 status=UNKNOWN 且 overriddenstatus=UNKNOWN/service-registry
端点,状态为UP,可以使得Server端 status=UP且 overriddenstatus=UP,虽然能够临时起到上线目的,但 overriddenstatus=UP 仍须要上一步的DELETE请求才能清楚,很麻烦,不建议使用DELETE /eureka/apps/appID/instanceID/status?value=UP
实际使用过程当中建议以下顺序
一、调用/service-registry
端点将状态置为 OUT_OF_SERVICE
二、sleep 缓存刷新时间 + 单个请求处理时间
缓存刷新时间 指的是Eureka Server刷新只读缓存、Eureka Client刷新本地服务列表、Ribbon刷新ServerList的时间,默认都是30s,能够适当缩短缓存刷新时间
# Eureka Server端配置 eureka.server.responseCacheUpdateIntervalMs=5000 eureka.server.eviction-interval-timer-in-ms=5000 # Eureka Client端配置 eureka.client.registryFetchIntervalSeconds=5 ribbon.ServerListRefreshInterval=5000
单个请求处理时间 是为了怕服务还有请求没处理完
三、调用 /service-registry
端点将状态置为 CANCEL_OVERRIDE,其实就是向Server端发送DELETE overriddenstatus的请求,这会让Server端 status=UNKNOWN 且 overriddenstatus=UNKNOWN
四、使用 /shutdown
端点 或 暴利kill -9
终止服务
五、发版部署后,启动服务注册到Eureka Server,服务状态变为UP
上面说了这么多,其实这些都是针对Eureka Server Rest API在Eureka客户端上的封装,即经过Eureka Client服务因为引入了actuator,增长了一系列端点,其实一些端点经过调用Eureka Server暴露的Rest API的方式实现Eureka实例服务下线功能
Eureka Rest API包括:
Operation | HTTP action | Description |
---|---|---|
Register new application instance | POST /eureka/apps/appID | Input: JSON/XMLpayload HTTPCode: 204 on success |
De-register application instance | DELETE /eureka/apps/appID/instanceID | HTTP Code: 200 on success |
Send application instance heartbeat | PUT /eureka/apps/appID/instanceID | HTTP Code: * 200 on success * 404 if instanceID doesn’t exist |
Query for all instances | GET /eureka/apps | HTTP Code: 200 on success Output: JSON/XML |
Query for all appID instances | GET /eureka/apps/appID | HTTP Code: 200 on success Output: JSON/XML |
Query for a specific appID/instanceID | GET /eureka/apps/appID/instanceID | HTTP Code: 200 on success Output: JSON/XML |
Query for a specific instanceID | GET /eureka/instances/instanceID | HTTP Code: 200 on success Output: JSON/XML |
Take instance out of service | PUT /eureka/apps/appID/instanceID/status?value=OUT_OF_SERVICE | HTTP Code: * 200 on success * 500 on failure |
Move instance back into service (remove override) | DELETE /eureka/apps/appID/instanceID/status?value=UP (The value=UP is optional, it is used as a suggestion for the fallback status due to removal of the override) | HTTP Code: * 200 on success * 500 on failure |
Update metadata | PUT /eureka/apps/appID/instanceID/metadata?key=value | HTTP Code: * 200 on success * 500 on failure |
Query for all instances under a particular vip address | GET /eureka/vips/vipAddress | * HTTP Code: 200 on success Output: JSON/XML * 404 if the vipAddressdoes not exist. |
Query for all instances under a particular secure vip address | GET /eureka/svips/svipAddress | * HTTP Code: 200 on success Output: JSON/XML * 404 if the svipAddressdoes not exist. |
其中大多数非查询类的操做在以前分析Eureka Client的端点时都分析过了,其实调用Eureka Server的Rest API是最直接的,但因为目前多采用一些相似Jenkins的发版部署工具,其中操做均在脚本中执行,Eureka Server API虽好,但URL中都涉及appID 、instanceID,对于制做通用的脚原本说拼接出调用端点的URL有必定难度,且不像调用本地服务端点IP使用localhost 或 127.0.0.1便可,须要指定Eureka Server地址,因此整理略显复杂。不过在比较规范化的公司中,也是不错的选择
参考: