令人头疼的Connection Reset
背景:
要爬取某网站的数据,数据每页10条,有很多页(形式如同table表格)。使用HttpClient 逐行逐页爬取数据,但在循环爬取多次时,总会在不确定的位置报错
在检查代码逻辑无果之后,开始疯狂百度,网上给出的解释:
服务器端因为某种原因关闭了Connection,而客户端依然在读写数据。
给出的解决方案是:
- 客户端和服务器统一使用TCP长连接或者短连接。
- 客户端关闭了连接,检查代码,并无关闭。
以上两种情况均无法解决,于是决定自己看错误源码:
int read(byte b[], int off, int length, int timeout) throws IOException { int n; // EOF already encountered if (eof) { return -1; } // connection reset if (impl.isConnectionReset()) { throw new SocketException("Connection reset"); } // bounds check if (length <= 0 || off < 0 || length > b.length - off) { if (length == 0) { return 0; } throw new ArrayIndexOutOfBoundsException("length == " + length + " off == " + off + " buffer length == " + b.length); } boolean gotReset = false; // acquire file descriptor and do the read FileDescriptor fd = impl.acquireFD(); try { n = socketRead(fd, b, off, length, timeout); if (n > 0) { return n; } } catch (ConnectionResetException rstExc) { gotReset = true; } finally { impl.releaseFD(); } /* * We receive a "connection reset" but there may be bytes still * buffered on the socket */ if (gotReset) { impl.setConnectionResetPending(); impl.acquireFD(); try { n = socketRead(fd, b, off, length, timeout); if (n > 0) { return n; } } catch (ConnectionResetException rstExc) { } finally { impl.releaseFD(); } } /* * If we get here we are at EOF, the socket has been closed, * or the connection has been reset. */ if (impl.isClosedOrPending()) { throw new SocketException("Socket closed"); } if (impl.isConnectionResetPending()) { impl.setConnectionReset(); } if (impl.isConnectionReset()) { throw new SocketException("Connection reset"); } eof = true; return -1;
根据图片中的提示信息,可以找到报错信息在倒数第4行,从后往前看,当n <= 0时,才会报错,然而
n = socketRead(fd, b, off, length, timeout);
认为是超时问题,故在代码中加入
但依旧未解决问题,最终通过手动捕捉SocketException异常,让异常发生时,重新请求该条记录,完成任务。
虽然问题解决了,但本质还是不理解为什么会导致错误,有明白的大佬麻烦指点一二。