记一次SNN回放EditsLog异常后的宕机恢复过程(上)

阿凡达2018-07-04 12:58

异常说明

近期遇到一次 SNN 的宕机故障,宕机时出现的异常日志如下(敏感信息如用户名、文件名等已隐去):

2016-08-13 02:35:54,472 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/x/x/x/x, replication=3, 
mtime=1471026852077, atime=1471024921405, blockSize=268435456, blocks=[blk_1167720789_98154524], permissions=x:x:rw-r-----, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=601867910]
java.io.IOException: File is not under construction: /x/x/x/x
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:797)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:356)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:409)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2016-08-13 02:35:54,473 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN.
java.io.IOException: File is not under construction: /x/x/x/x
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:797)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:356)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:409)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2016-08-13 02:35:54,474 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-08-13 02:35:54,476 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at  x.x.x.x/x.x.x.x
************************************************************/

ANN 表现正常,JN 在此时段里有一个节点出现超时的异常,另外2个节点无异常日志输出

异常分析

因为这个故障是第一次出现,且根据异常日志来判断是Edit日志有误,故直接重启 SNN 进行恢复并不可行。在Google上查询了一下类似的异常,发现有相关的BUG说明,但不确定是否相关,同时也有人遇到过类似的异常并提供了处理思路。 所以我们先从ANN和JN的edits文件入手进行分析,查看一下 ANN 下的 edits 文件(因为没有保留现场,所以只能凭借印象来给出简单的描述了):

edits_inprogress_0000000000603915223
edits_0000000000601910596-0000000000603915222
edits_0000000000601867340-0000000000601910595

JN 下的 edits文件如下:

edits_inprogress_0000000000603915223
edits_0000000000601910596-0000000000603915222
edits_0000000000601867340-0000000000601910595

根据QJM的机制,两边的edits记录应该是一致的。 SNN 在启动时加载完本地的FSimage文件后,会从 JN 上回放自上次成功checkpoint之后的所有edits记录,而我们的 SNN 正是在回放中的OP操作时出现了异常,进而宕机。 根据 SNN 启动日志里抛出的异常信息可以知道是601867910的txid有问题,而无法回放。该txid正好位于edits_0000000000601867340-0000000000601910595文件之中,所以我们需要手动进行修复该条记录。

异常处理

Edits Viewer

该项功能是在2.4引入的,目的是方便HDFS管理员对edits文件进行管理。用法说明如下:

Usage: bin/hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
Offline edits viewer
Parse a Hadoop edits log file INPUT_FILE and save results
in OUTPUT_FILE.
Required command line arguments:
-i,--inputFile <arg>   edits file to process, xml (case
                       insensitive) extension means XML format,
                       any other filename means binary format
-o,--outputFile <arg>  Name of output file. If the specified
                       file exists, it will be overwritten,
                       format of the file is determined
                       by -p option

Optional command line arguments:
-p,--processor <arg>   Select which type of processor to apply
                       against image file, currently supported
                       processors are: binary (native binary format
                       that Hadoop uses), xml (default, XML
                       format), stats (prints statistics about
                       edits file)
-h,--help              Display usage information and exit
-f,--fix-txids         Renumber the transaction IDs in the input,
                       so that there are no gaps or invalid  transaction IDs.
-r,--recover           When reading binary edit logs, use recovery 
                       mode.  This will give you the chance to skip 
                       corrupt parts of the edit log.
-v,--verbose           More verbose output, prints the input and
                       output filenames, for processors that write
                       to a file, also output to screen. On large
                       image files this will dramatically increase
                       processing time (default is false).

这里我们需要先将二进制的edits文件转换成易于理解和处理的xml文件(!!!记得先保存好原始文件)

bin/hdfs oev -p xml -i edits_0000000000601867340-0000000000601910595 -o edits_out.xml

然后我们可以打开edits_out.xml这个文件,定位到txid为601867910的OP记录下:

  <RECORD>
    <OPCODE>OP_CLOSE</OPCODE>
    <DATA>
      <TXID>601867910</TXID>
      <LENGTH>0</LENGTH>
      <INODEID>0</INODEID>
      <!-- replace the true path -->
      <PATH>/x/x/x/x</PATH>
      <REPLICATION>3</REPLICATION>
      <MTIME>1471026852077</MTIME>
      <ATIME>1471024921405</ATIME>
      <BLOCKSIZE>268435456</BLOCKSIZE>
      <CLIENT_NAME></CLIENT_NAME>
      <CLIENT_MACHINE></CLIENT_MACHINE>
      <BLOCK>
        <BLOCK_ID>1167720789</BLOCK_ID>
        <NUM_BYTES>2430</NUM_BYTES>
        <GENSTAMP>98154524</GENSTAMP>
      </BLOCK>
      <PERMISSION_STATUS>
        <USERNAME>datacenter</USERNAME>
        <GROUPNAME>hdfs</GROUPNAME>
        <MODE>416</MODE>
      </PERMISSION_STATUS>
    </DATA>
  </RECORD>

根据 SNN 回放日志时提供的异常,在回放这条OP_CLOSE的操作时,文件并不是出于under construction的状态,那可能是文件可能已经被关闭或者是处于其他状态。那可以将这个OP类型替换成其他人畜无害的其他类型, 如OP_TIMES(相应的OP类型参见FSEditLogOpCodes.java),将其修改为如下

  <RECORD>
    <OPCODE>OP_TIMES</OPCODE>
    <DATA>
      <TXID>601867910</TXID>
      <LENGTH>0</LENGTH>
      <PATH>/x/x/x/x</PATH>
      <MTIME>1471026852077</MTIME>
      <ATIME>1471024921405</ATIME>
    </DATA>
  </RECORD>

然后保存该xml,使用OEV工具将xml转换成二进制的edits文件

bin/hdfs oev -p binary -i edits_out.xml -o edits_binary

将其复制到JN对应的edit目录下面替换掉

cp edits_binary $JOURNAL/currrent/edits_0000000000601867340-0000000000601910595

只需要将3个JN(因为我们的集群只配置了3个JN)上的相应edits文件替换完毕即可。

正常重启 SNN ,当其加载完本地的fsimage文件后会从JN上同步最近尚未合并的edits文件,然后一段时间后,又出现了类似的异常日志

2016-08-13 12:08:59,415 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/x/x/x/x, replication=3, 
mtime=1471026921572, atime=1471024921405, blockSize=268435456, blocks=[blk_1167720789_98161049], permissions=x:x:rw-r-----, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=601907651]
java.io.IOException: File is not under construction: /x/x/x/x
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:955)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1407)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1473)

导致出现异常的文件都是一样的,不过这次的txid不同,那么我们还是按照之前的步骤进行处理即可。

再次重启 SNN ,待其回放JN上的edits操作时还是出现了类似的异常,不同的是这次出现的异常的文件不同,txid也不同,如下

2016-08-13 13:14:21,740 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 72081/2004627 transactions completed. (4%)
2016-08-13 13:14:22,144 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/x/x/x/y/y, replication=3, mtime=1471027330308, 
atime=1471024801459, blockSize=268435456, blocks=[blk_1167718951_98178216], permissions=x:x:rw-r-----, aclEntries=null, clientName=, clientMachine=, opCode=OP_CLOSE, txid=602013871]
java.io.IOException: File is not under construction: /x/x/x/y/y
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:426)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:228)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:137)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:955)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:751)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:735)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1407)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1473)
2016-08-13 13:14:22,147 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage

这次的txid是602013871,我们根据edits文件的首尾序号可以得出,这次的异常OP出现在了另外一个edits文件 edits_0000000000601910596-0000000000603915222

同样,我们使用OEV工具将其转换为XML文件,对OP_CLOSE操作进行无害替换后,再将其反转成二进制文件。 但是这次出现了异常

bin/hdfs oev -r -p binary -i  edits_new.xml -o edits_binary_new
fromXml error decoding opcode null
{<PATH>{"/a/b/c.gz"}, <RPC_CALLID>{"5979"}, <RPC_CLIENTID>{"9c0d4fd2-901e-4953-ab5a-e6e74b5bf7b6"}, <TXID>{"603485951"}}
Encountered exception. Exiting: no entry found for BLOCK
org.apache.hadoop.hdfs.util.XMLUtils$InvalidXmlException: no entry found for BLOCK
    at org.apache.hadoop.hdfs.util.XMLUtils$Stanza.getChildren(XMLUtils.java:242)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$UpdateBlocksOp.fromXml(FSEditLogOp.java:940)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.decodeXml(FSEditLogOp.java:4091)
    at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsXmlLoader.endElement(OfflineEditsXmlLoader.java:224)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsXmlLoader.loadEdits(OfflineEditsXmlLoader.java:92)
    at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
    at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)

那我们根据异常信息给出的txid,查到到对应的记录:

  <RECORD>
    <OPCODE>OP_UPDATE_BLOCKS</OPCODE>
    <DATA>
      <TXID>603485951</TXID>
      <!-- use x to replace the true path -->
      <PATH>/x/x/x/x/x</PATH>
      <RPC_CLIENTID>9c0d4fd2-901e-4953-ab5a-e6e74b5bf7b6</RPC_CLIENTID>
      <RPC_CALLID>5979</RPC_CALLID>
    </DATA>
  </RECORD>

OP_UPDATE_BLOCKS这个操作需要有对应BLOCK信息才能正常redo,所以怀疑这里是不是遗漏了这些记录。查看XML里面的上下文,可以看到这个文件之前有一个OP_ADD_BLOCK操作

  <RECORD>
    <OPCODE>OP_ADD_BLOCK</OPCODE>
    <DATA>
      <TXID>603485950</TXID>
      <!-- use x to replace the true path -->
      <PATH>/x/x/x/x/x</PATH>
      <BLOCK>
        <BLOCK_ID>1168011667</BLOCK_ID>
        <NUM_BYTES>0</NUM_BYTES>
        <GENSTAMP>98411957</GENSTAMP>
      </BLOCK>
      <RPC_CLIENTID></RPC_CLIENTID>
      <RPC_CALLID>-2</RPC_CALLID>
    </DATA>
  </RECORD>

那么简单地来看,是否可以直接使用这个记录来修复这个异常的OP记录?我首先做了如下尝试,直接将BLOCK信息复制到了OP_UPDATE_BLOCKS中,如下:

  <RECORD>
    <OPCODE>OP_UPDATE_BLOCKS</OPCODE>
    <DATA>
      <TXID>603485951</TXID>
      <!-- use x to replace the true path -->
      <PATH>/x/x/x/x/x</PATH>
      <BLOCK>
        <BLOCK_ID>1168011667</BLOCK_ID>
        <NUM_BYTES>0</NUM_BYTES>
        <GENSTAMP>98411957</GENSTAMP>
      </BLOCK>
      <RPC_CLIENTID>9c0d4fd2-901e-4953-ab5a-e6e74b5bf7b6</RPC_CLIENTID>
      <RPC_CALLID>5979</RPC_CALLID>
    </DATA>
  </RECORD>

保存后,重新执行OEV命令将XML转换成Binary文件,异常消失了。

相关阅读:记一次SNN回放EditsLog异常后的宕机恢复过程(下)

本文来自网易实践者社区,经作者金川授权发布。