现象

使用Spring Cloud Gateway时,在线上偶尔出现异常,但是很少且很难复现。

在压测的最后几条日志也会出现,当时没有放在心上,线上是不是会有error告警,故此花时间研究一番。

[4eee5339] Error [reactor.netty.ReactorNetty$InternalNettyException: java.nio.channels.ClosedChannelException] for HTTP GET "/xxx", but ServerHttpResponse already committed (200 OK)

分析

观察日志异常提示为:请求的channel被关闭了,导致异常,但是后端的处理已经返回200了,在流出过程中发现了异常。

处理过程如下:

  • Spring Framework 5开始使用反应式编程(Reactive Programming,也有叫响应式)
  • 当结果处理过程中发现request的channel已被关闭,出现ClosedChannelException异常时,会使用onErrorResume处理异常,调用handleUnresolvedError方法处理,handleUnresolvedError方法中经过检查发现不是内部错误,而且外部请求断开了,输出对应的错误日志。
  • package org.springframework.web.server.adapter;
    ...
    public class HttpWebHandlerAdapter extends WebHandlerDecorator implements HttpHandler {
    ...
    
    @Override
    	public Mono<Void> handle(ServerHttpRequest request, ServerHttpResponse response) {
    		if (this.forwardedHeaderTransformer != null) {
    			request = this.forwardedHeaderTransformer.apply(request);
    		}
    		ServerWebExchange exchange = createExchange(request, response);
    
    		LogFormatUtils.traceDebug(logger, traceOn ->
    				exchange.getLogPrefix() + formatRequest(exchange.getRequest()) +
    						(traceOn ? ", headers=" + formatHeaders(exchange.getRequest().getHeaders()) : ""));
    
    		return getDelegate().handle(exchange)
    				.doOnSuccess(aVoid -> logResponse(exchange))
    				.onErrorResume(ex -> handleUnresolvedError(exchange, ex)) //处理过程发现异常
    				.then(Mono.defer(response::setComplete));
    	}
    
    ..
    
    /**
    * 处理异常过程
    */
    private Mono<Void> handleUnresolvedError(ServerWebExchange exchange, Throwable ex) {
    		ServerHttpRequest request = exchange.getRequest();
    		ServerHttpResponse response = exchange.getResponse();
    		String logPrefix = exchange.getLogPrefix();
    
    		// Sometimes a remote call error can look like a disconnected client.
    		// Try to set the response first before the "isDisconnectedClient" check.
    
    		if (response.setStatusCode(HttpStatus.INTERNAL_SERVER_ERROR)) {
    			logger.error(logPrefix + "500 Server Error for " + formatRequest(request), ex);
    			return Mono.empty();
    		}
    		else if (isDisconnectedClientError(ex)) {
    			if (lostClientLogger.isTraceEnabled()) {
    				lostClientLogger.trace(logPrefix + "Client went away", ex);
    			}
    			else if (lostClientLogger.isDebugEnabled()) {
    				lostClientLogger.debug(logPrefix + "Client went away: " + ex +
    						" (stacktrace at TRACE level for '" + DISCONNECTED_CLIENT_LOG_CATEGORY + "')");
    			}
    			return Mono.empty();
    		}
    		else {
                //日志异常输出的内容
    			// After the response is committed, propagate errors to the server...
    			logger.error(logPrefix + "Error [" + ex + "] for " + formatRequest(request) +
    					", but ServerHttpResponse already committed (" + response.getStatusCode() + ")");
    			return Mono.error(ex);
    		}
    	}
    }
  • 观察nginx日志,发现对应的请求状态码为499
  • nginx状态499表示客户端主动断开了连接,

处理办法

方法1:

参考Nginx状态码499原因分析和处理

方法2:

貌似在reactor-netty项目正在修复中,关注issue:https://github.com/reactor/reactor-netty/pull/1106

PS:待验证……汗

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐