近期遇到一次 SNN 的宕机故障,宕机时出现的异常日志如下(敏感信息如用户名、文件名等已隐去):
2016-08-13 02:35:54,472 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/x/x/x/x, replication=3,
mtime=1471026852077, atime=1471024921405, blockSize=268435456, blocks=[blk_1167720789_98154524], permissions=x:x:rw-r-----, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=601867910]
java.io.IOException: File is not under construction: /x/x/x/x
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:797)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:409)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2016-08-13 02:35:54,473 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN.
java.io.IOException: File is not under construction: /x/x/x/x
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:797)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:409)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2016-08-13 02:35:54,474 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-08-13 02:35:54,476 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at x.x.x.x/x.x.x.x
************************************************************/
ANN 表现正常,JN 在此时段里有一个节点出现超时的异常,另外2个节点无异常日志输出
因为这个故障是第一次出现,且根据异常日志来判断是Edit日志有误,故直接重启 SNN 进行恢复并不可行。在Google上查询了一下类似的异常,发现有相关的BUG说明,但不确定是否相关,同时也有人遇到过类似的异常并提供了处理思路。 所以我们先从ANN和JN的edits文件入手进行分析,查看一下 ANN 下的 edits 文件(因为没有保留现场,所以只能凭借印象来给出简单的描述了):
edits_inprogress_0000000000603915223
edits_0000000000601910596-0000000000603915222
edits_0000000000601867340-0000000000601910595
JN 下的 edits文件如下:
edits_inprogress_0000000000603915223
edits_0000000000601910596-0000000000603915222
edits_0000000000601867340-0000000000601910595
根据QJM的机制,两边的edits记录应该是一致的。 SNN 在启动时加载完本地的FSimage文件后,会从 JN 上回放自上次成功checkpoint之后的所有edits记录,而我们的 SNN 正是在回放中的OP操作时出现了异常,进而宕机。 根据 SNN 启动日志里抛出的异常信息可以知道是601867910的txid有问题,而无法回放。该txid正好位于edits_0000000000601867340-0000000000601910595文件之中,所以我们需要手动进行修复该条记录。
该项功能是在2.4引入的,目的是方便HDFS管理员对edits文件进行管理。用法说明如下:
Usage: bin/hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
Offline edits viewer
Parse a Hadoop edits log file INPUT_FILE and save results
in OUTPUT_FILE.
Required command line arguments:
-i,--inputFile <arg> edits file to process, xml (case
insensitive) extension means XML format,
any other filename means binary format
-o,--outputFile <arg> Name of output file. If the specified
file exists, it will be overwritten,
format of the file is determined
by -p option
Optional command line arguments:
-p,--processor <arg> Select which type of processor to apply
against image file, currently supported
processors are: binary (native binary format
that Hadoop uses), xml (default, XML
format), stats (prints statistics about
edits file)
-h,--help Display usage information and exit
-f,--fix-txids Renumber the transaction IDs in the input,
so that there are no gaps or invalid transaction IDs.
-r,--recover When reading binary edit logs, use recovery
mode. This will give you the chance to skip
corrupt parts of the edit log.
-v,--verbose More verbose output, prints the input and
output filenames, for processors that write
to a file, also output to screen. On large
image files this will dramatically increase
processing time (default is false).
这里我们需要先将二进制的edits文件转换成易于理解和处理的xml文件(!!!记得先保存好原始文件)
bin/hdfs oev -p xml -i edits_0000000000601867340-0000000000601910595 -o edits_out.xml
然后我们可以打开edits_out.xml这个文件,定位到txid为601867910的OP记录下:
<RECORD>
<OPCODE>OP_CLOSE</OPCODE>
<DATA>
<TXID>601867910</TXID>
<LENGTH>0</LENGTH>
<INODEID>0</INODEID>
<!-- replace the true path -->
<PATH>/x/x/x/x</PATH>
<REPLICATION>3</REPLICATION>
<MTIME>1471026852077</MTIME>
<ATIME>1471024921405</ATIME>
<BLOCKSIZE>268435456</BLOCKSIZE>
<CLIENT_NAME></CLIENT_NAME>
<CLIENT_MACHINE></CLIENT_MACHINE>
<BLOCK>
<BLOCK_ID>1167720789</BLOCK_ID>
<NUM_BYTES>2430</NUM_BYTES>
<GENSTAMP>98154524</GENSTAMP>
</BLOCK>
<PERMISSION_STATUS>
<USERNAME>datacenter</USERNAME>
<GROUPNAME>hdfs</GROUPNAME>
<MODE>416</MODE>
</PERMISSION_STATUS>
</DATA>
</RECORD>
根据 SNN 回放日志时提供的异常,在回放这条OP_CLOSE的操作时,文件并不是出于under construction的状态,那可能是文件可能已经被关闭或者是处于其他状态。那可以将这个OP类型替换成其他人畜无害的其他类型, 如OP_TIMES(相应的OP类型参见FSEditLogOpCodes.java),将其修改为如下
<RECORD>
<OPCODE>OP_TIMES</OPCODE>
<DATA>
<TXID>601867910</TXID>
<LENGTH>0</LENGTH>
<PATH>/x/x/x/x</PATH>
<MTIME>1471026852077</MTIME>
<ATIME>1471024921405</ATIME>
</DATA>
</RECORD>
然后保存该xml,使用OEV工具将xml转换成二进制的edits文件
bin/hdfs oev -p binary -i edits_out.xml -o edits_binary
将其复制到JN对应的edit目录下面替换掉
cp edits_binary $JOURNAL/currrent/edits_0000000000601867340-0000000000601910595
只需要将3个JN(因为我们的集群只配置了3个JN)上的相应edits文件替换完毕即可。
正常重启 SNN ,当其加载完本地的fsimage文件后会从JN上同步最近尚未合并的edits文件,然后一段时间后,又出现了类似的异常日志
2016-08-13 12:08:59,415 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/x/x/x/x, replication=3,
mtime=1471026921572, atime=1471024921405, blockSize=268435456, blocks=[blk_1167720789_98161049], permissions=x:x:rw-r-----, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=601907651]
java.io.IOException: File is not under construction: /x/x/x/x
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:955)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1407)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1473)
导致出现异常的文件都是一样的,不过这次的txid不同,那么我们还是按照之前的步骤进行处理即可。
再次重启 SNN ,待其回放JN上的edits操作时还是出现了类似的异常,不同的是这次出现的异常的文件不同,txid也不同,如下
2016-08-13 13:14:21,740 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 72081/2004627 transactions completed. (4%)
2016-08-13 13:14:22,144 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/x/x/x/y/y, replication=3, mtime=1471027330308,
atime=1471024801459, blockSize=268435456, blocks=[blk_1167718951_98178216], permissions=x:x:rw-r-----, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=602013871]
java.io.IOException: File is not under construction: /x/x/x/y/y
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:955)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1407)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1473)
2016-08-13 13:14:22,147 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
这次的txid是602013871,我们根据edits文件的首尾序号可以得出,这次的异常OP出现在了另外一个edits文件 edits_0000000000601910596-0000000000603915222 中
同样,我们使用OEV工具将其转换为XML文件,对OP_CLOSE操作进行无害替换后,再将其反转成二进制文件。 但是这次出现了异常
bin/hdfs oev -r -p binary -i edits_new.xml -o edits_binary_new
fromXml error decoding opcode null
{<PATH>{"/a/b/c.gz"}, <RPC_CALLID>{"5979"}, <RPC_CLIENTID>{"9c0d4fd2-901e-4953-ab5a-e6e74b5bf7b6"}, <TXID>{"603485951"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for BLOCK
at org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:940)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:4091)
at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsXmlLoader.endElement(OfflineEditsXmlLoader.java:224)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsXmlLoader.loadEdits(OfflineEditsXmlLoader.java:92)
at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
那我们根据异常信息给出的txid,查到到对应的记录:
<RECORD>
<OPCODE>OP_UPDATE_BLOCKS</OPCODE>
<DATA>
<TXID>603485951</TXID>
<!-- use x to replace the true path -->
<PATH>/x/x/x/x/x</PATH>
<RPC_CLIENTID>9c0d4fd2-901e-4953-ab5a-e6e74b5bf7b6</RPC_CLIENTID>
<RPC_CALLID>5979</RPC_CALLID>
</DATA>
</RECORD>
OP_UPDATE_BLOCKS这个操作需要有对应BLOCK信息才能正常redo,所以怀疑这里是不是遗漏了这些记录。查看XML里面的上下文,可以看到这个文件之前有一个OP_ADD_BLOCK操作
<RECORD>
<OPCODE>OP_ADD_BLOCK</OPCODE>
<DATA>
<TXID>603485950</TXID>
<!-- use x to replace the true path -->
<PATH>/x/x/x/x/x</PATH>
<BLOCK>
<BLOCK_ID>1168011667</BLOCK_ID>
<NUM_BYTES>0</NUM_BYTES>
<GENSTAMP>98411957</GENSTAMP>
</BLOCK>
<RPC_CLIENTID></RPC_CLIENTID>
<RPC_CALLID>-2</RPC_CALLID>
</DATA>
</RECORD>
那么简单地来看,是否可以直接使用这个记录来修复这个异常的OP记录?我首先做了如下尝试,直接将BLOCK信息复制到了OP_UPDATE_BLOCKS中,如下:
<RECORD>
<OPCODE>OP_UPDATE_BLOCKS</OPCODE>
<DATA>
<TXID>603485951</TXID>
<!-- use x to replace the true path -->
<PATH>/x/x/x/x/x</PATH>
<BLOCK>
<BLOCK_ID>1168011667</BLOCK_ID>
<NUM_BYTES>0</NUM_BYTES>
<GENSTAMP>98411957</GENSTAMP>
</BLOCK>
<RPC_CLIENTID>9c0d4fd2-901e-4953-ab5a-e6e74b5bf7b6</RPC_CLIENTID>
<RPC_CALLID>5979</RPC_CALLID>
</DATA>
</RECORD>
保存后,重新执行OEV命令将XML转换成Binary文件,异常消失了。
相关阅读:记一次SNN回放EditsLog异常后的宕机恢复过程(下)
本文来自网易实践者社区,经作者金川授权发布。