记一次ORA-7445 绑定参数超 65535个引起的宕库事件

达芬奇密码2018-07-03 09:15

问题背景

随着考拉业务的发展,考拉运营配置的活动方案商品数量愈来愈多;随着数量的变化而来的就是各种性能问题。在考拉运营后台保存一个大量活动商品方案时存在严重超时情况,dubbo超时,浏览器超时问题都时有发生。为了提高用户体验,必然需要对性能问题进行优化。

通过代码分析发现其中存在对单条增、删、改SQL的循环调用!假如一次跟新3000条数据就会,循环调用3000次,即使每次执行时间很短,也无法抵挡这样循环使用;如果必然导致了dubbo调用或浏览器的超时。在这样的背景下提出了对这部分代码对技术优化任务。优化的思路就是将单条SQL操作修改为批量操作。 在ORACLE常用的批量操作实现方式有:

  • IN delete from table where id in (...)
  • UNION ALL 方式 insert into select * from( sql union all sql)
  • PL/SQL 方式 begin .... end;
  • executeBatch

  • 第一种可以解决删除更新,但是有一定的限制,如果每条SQL更新内容都不同则没有办法实现。

  • 第二种可以解决批量插入但是无法解决更新问题。
  • 第三种是PL/SQL方式,则比较灵活,在begin end中间可以放入插入语句也可以放入更新语句。
  • 第四种方式是JDBC 提供的批处理方案,与数据库无关。

在本次优化中采用了比较简单的处理方式,使用的是PL/SQL方式。 在优化完成后,在预发布环境中对3200个左右活动商品进行更新,发布操作时耗时在10秒左右较之前浏览器超时有了比较明显的改善。就在自鸣得意的时候,致命的问题突然爆发了。在发布一个含有较多商品的活动时候导致了主站主库宕掉了。

问题重现日志

     Tue Jul 11 15:41:58 2017
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x186016B, opiaba()+639] [flags: 0x0, count: 1]
Errors in file /home/oracle/app/diag/rdbms/global_xsb/global/trace/global_ora_23410.trc  (incident=403757):
ORA-07445: 出现异常错误: 核心转储 [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x186016B] [SI_KERNEL(general_protection)] []
Incident details in: /home/oracle/app/diag/rdbms/global_xsb/global/incident/incdir_403757/global_ora_23410_i403757.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Jul 11 15:42:20 2017
Dumping diagnostic data in directory=[cdmp_20170711154220], requested by (instance=1, osid=23410), summary=[incident=403757].
Tue Jul 11 15:42:22 2017
Sweep [inc][403757]: completed
Sweep [inc2][403757]: completed
Tue Jul 11 15:43:09 2017
Errors in file /home/oracle/app/diag/rdbms/global_xsb/global/trace/global_pmon_21754.trc  (incident=384021):
ORA-00600: internal error code, arguments: [17147], [0x7001414E8], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/diag/rdbms/global_xsb/global/incident/incdir_384021/global_pmon_21754_i384021.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Jul 11 15:43:11 2017
Dumping diagnostic data in directory=[cdmp_20170711154311], requested by (instance=1, osid=21754 (PMON)), summary=[incident=384021].
Errors in file /home/oracle/app/diag/rdbms/global_xsb/global/trace/global_pmon_21754.trc:
ORA-00600: internal error code, arguments: [17147], [0x7001414E8], [], [], [], [], [], [], [], [], [], []
PMON (ospid: 21754): terminating the instance due to error 472
Tue Jul 11 15:43:11 2017
opiodr aborting process unknown ospid (57181) as a result of ORA-1092
Tue Jul 11 15:43:11 2017
ORA-1092 : opitsk aborting process
Tue Jul 11 15:43:11 2017
opiodr aborting process unknown ospid (17365) as a result of ORA-1092

ORACLE BUG

关于这个BUG ORACLE提供的补丁也只是修复宕库问题,同样无法解决绑定参数超过65535的问题!

最佳实践

   对于数据量较大的批处理按照一定条数进行分割,分批次执行!不管使用IN,UNION ALL还是BEGIN END在数量超过一定限制时都会存在问题。

问题解决方案

在解决问题的同时而且摒弃数据库特性,可供选择方案只有使用jdbc executeBatch方式了。在Mybatis中提供了比较友好使用方式 . 但是批处理有个前提条件是不需要关注返回值。BatchExecutor返回值为固定值:-2147482646。PS:原生的executeBatch()方式返回的也只是-2,并不会返回具体影响条数 如果批处理时没有事务,在执行的时候就自动提交了,退化为循环调用;为了保证批事务性以及效率需要在事务中进行。在flush的时候执行了executeBatch(),

final SqlSession session= getGenericSqlSessionFactory().openSession(ExecutorType.BATCH);
for( ){
session.update();
}
session.commit();

PS:关于事务问题 Mybatis与Spring集成时,如果外面存在事务,则获取到的connection是同一个

  public SqlSession openSession(ExecutorType execType) {
    return openSessionFromDataSource(execType, null, false);
  }

private SqlSession openSessionFromDataSource(ExecutorType execType, TransactionIsolationLevel level, boolean autoCommit) {
    Transaction tx = null;
    try {
      final Environment environment = configuration.getEnvironment();
      final TransactionFactory transactionFactory = getTransactionFactoryFromEnvironment(environment);
      //从environment中获取dataSource,并根据dataSource创建事务
      tx = transactionFactory.newTransaction(environment.getDataSource(), level, autoCommit);
      final Executor executor = configuration.newExecutor(tx, execType);
      return new DefaultSqlSession(configuration, executor, autoCommit);
    } catch (Exception e) {
      closeTransaction(tx); // may have fetched a connection so lets call close()
      throw ExceptionFactory.wrapException("Error opening session.  Cause: " + e, e);
    } finally {
      ErrorContext.instance().reset();
    }
  }

public SpringManagedTransaction(DataSource dataSource) {
    notNull(dataSource, "No DataSource specified");
    this.dataSource = dataSource;
  }




  private void openConnection() throws SQLException {
    //从dataSouce中获取connection
    this.connection = DataSourceUtils.getConnection(this.dataSource);
    this.autoCommit = this.connection.getAutoCommit();
    this.isConnectionTransactional = isConnectionTransactional(this.connection, this.dataSource);

    if (this.logger.isDebugEnabled()) {
      this.logger.debug(
          "JDBC Connection ["
              + this.connection
              + "] will"
              + (this.isConnectionTransactional ? " " : " not ")
              + "be managed by Spring");
    }
  }

最佳实践

这样在需要进行批处理的使用只需要调用基类中的方法,而且不需要关注事务问题,即使在方法内部调用也存在事务问题。

@Repository("batchExecutorDaoImpl")
public class BatchExecutorDaoImpl extends BaseDaoImpl {


    @Transactional(rollbackFor=Exception.class) 
    public <T> void batch(List<T> dataList, int batchSize, final SingleProcessor<T> processor) {
        final SqlSession session = getGenericSqlSessionFactory().openSession(ExecutorType.BATCH);
        try {
            ListUtils.split(dataList, batchSize, new PageProcess<T>() {
                @Override
                public void process(List<T> pageIdList) {
                    for (T data : pageIdList) {
                        processor.process(session, data);
                    }
                    session.commit();            }        
            });
        } finally {
            session.close();
        }
    }

    public static interface SingleProcessor<T>{
        public void process(SqlSession sqlSession,T data);
    }

}


    public <T> void batchProcess(List<T> dataList,SingleProcessor<T> singleProcessor) {
        if(CollectionUtils.isEmpty(dataList)) {
            return ;
        }

        batchExecutorDaoImpl.batch(dataList, 100, singleProcessor);
    }


@Override
    public Map<String, Long> batchInsertActivityGoods(List<ActivityGoods> list) {
        Map<String, Long> resultMap = Maps.newHashMap();
        if (CollectionUtils.isEmpty(list)) {
            return resultMap;
        }

        for (List<ActivityGoods> goodsList : ListUtils.subList(list, 20)) {

            List<Long> activityGoodsIdList = this.getSqlSession()
                    .selectList("promotion.ActivityGoodsEdit.batchSelectActivityGoodsPKID",goodsList.size());
            int i = 0;
            for (ActivityGoods goods : goodsList) {
                goods.setActivityGoodsId(activityGoodsIdList.get(i++));
                resultMap.put(goods.getSkuId(), goods.getActivityGoodsId());
            }
        }

        batchProcess(list, new SingleProcessor<ActivityGoods>() {

            @Override
            public void process(SqlSession sqlSession, ActivityGoods data) {
                sqlSession.insert("promotion.ActivityGoodsEdit.insertActivityGoods",data);
            }

        });

        return resultMap;
    }

本文来自网易实践者社区,经作者张伟授权发布。