记一次Unipack越南文乱码Bug的修复

勿忘初心2018-12-05 13:24

此文已由作者左裕初授权网易云社区发布。

欢迎访问网易云社区,了解更多网易技术产品运营经验


问题

Unipack系统要保存一些越南文的包名,由于这是一个上古系统,处理起来一定要采取庖丁解牛的态度

思路

  1. 数据库支不支持(数据库层)

  2. 浏览器支持不支持(前端,客户端系统)

  3. 应用层支持不支持

数据库层

最重要的当然是我们的字段是采用什么编码的,可以采用如下方法进行查询

How do I see what character set a MySQL database / table / column is?

For Schemas:


SELECT default_character_set_name FROM information_schema.SCHEMATA  WHERE schema_name = "schemaname";

For Tables:


SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,        information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation   AND T.table_schema = "schemaname"   AND T.table_name = "tablename";

For Columns:


SELECT character_set_name FROM information_schema.`COLUMNS`  WHERE table_schema = "schemaname"   AND table_name = "tablename"   AND column_name = "columnname";

查出来是:


mysql> SELECT character_set_name FROM information_schema.`COLUMNS`  WHERE table_schema = "apkpack51"   AND table_name = "apk_params"   AND column_name = "param_value";   

 +--------------------+
| character_set_name |

+--------------------+

| utf8               |

+--------------------+

1 row in set (0.00 sec)

所以就是utf8咯。

下面我们看看MySQL的utf-8包不包含越南文的支持


mysql> show collation;

collation就相当于是指定了一套字符串的比对方法(大小之类的),对于建索引是很有意义的。

结果:


| utf8_general_ci          | utf8     |  33 | Yes     | Yes      |       1 |

| utf8_bin                 | utf8     |  83 |         | Yes      |       1 |

| utf8_unicode_ci          | utf8     | 192 |         | Yes      |       8 |

| utf8_icelandic_ci        | utf8     | 193 |         | Yes      |       8 |

| utf8_latvian_ci          | utf8     | 194 |         | Yes      |       8 |

| utf8_romanian_ci         | utf8     | 195 |         | Yes      |       8 |

| utf8_slovenian_ci        | utf8     | 196 |         | Yes      |       8 |

| utf8_polish_ci           | utf8     | 197 |         | Yes      |       8 |

| utf8_estonian_ci         | utf8     | 198 |         | Yes      |       8 |

| utf8_spanish_ci          | utf8     | 199 |         | Yes      |       8 |

| utf8_swedish_ci          | utf8     | 200 |         | Yes      |       8 |

| utf8_turkish_ci          | utf8     | 201 |         | Yes      |       8 |

| utf8_czech_ci            | utf8     | 202 |         | Yes      |       8 |

| utf8_danish_ci           | utf8     | 203 |         | Yes      |       8 |

| utf8_lithuanian_ci       | utf8     | 204 |         | Yes      |       8 |

| utf8_slovak_ci           | utf8     | 205 |         | Yes      |       8 |

| utf8_spanish2_ci         | utf8     | 206 |         | Yes      |       8 |

| utf8_roman_ci            | utf8     | 207 |         | Yes      |       8 |

| utf8_persian_ci          | utf8     | 208 |         | Yes      |       8 |

| utf8_esperanto_ci        | utf8     | 209 |         | Yes      |       8 |

| utf8_hungarian_ci        | utf8     | 210 |         | Yes      |       8 |

| utf8_sinhala_ci          | utf8     | 211 |         | Yes      |       8 |

| utf8_german2_ci          | utf8     | 212 |         | Yes      |       8 |

| utf8_croatian_ci         | utf8     | 213 |         | Yes      |       8 |

| utf8_unicode_520_ci      | utf8     | 214 |         | Yes      |       8 |

| utf8_vietnamese_ci       | utf8     | 215 |         | Yes      |       8 |

| utf8_general_mysql500_ci | utf8     | 223 |         | Yes      |       1 |

所以毫无疑问是支持越南文的。

浏览器(客户端系统)

这是请求的返回头,可以看出是utf-8编码的,所以从这个角度来说,客户端也是没有问题的


HTTP/1.1 200 OK
Date: Tue, 08 Dec 2015 09:11:09 GMT
Server: Apache/2.4.12 (Ubuntu)
X-Powered-By: PHP/5.6.11-1ubuntu3.1
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Length: 11
Keep-Alive: timeout=5, max=91
Connection: Keep-Alive
Content-Type: application/text;charset=UTF-8

应用层

最麻烦的地方在于应用层,以一个restful接口为例,从数据库取出数据与数据库连接采取的编码有关,生成的json数据又和应用层采用的编码方法有关。确定这个根源的办法其实也很简单,采用控制变量法即可。最后我们发现在建立连接的时候有这么一句:

```php $this->conn = ADONewConnection('mysql');

this>conn>PConnect(db_host, dbuser,db_pass, $db_name);

$this->conn->Execute('SET NAMES GBK'); # 数据库使用GBK编码 ```

所以说,为什么越南文插入失败呢,就是因为数据库连接采用了GBK编码,而GBK编码无法啊编码越南文,所以也就无法以正确的姿态插入数据库。

utf-8(浏览器) -> gbk(应用层) -> utf-8(数据库)

为了不牵一发而动全身,我们还是采用了保守的方法,多取了一个utf-8的连接,这样在有需要的时候使用这个连接。

总结

Web的界面乱码是常见的问题,一般来说,只要抓住这三个点,感觉总能找到问题的根源。


免费领取验证码、内容安全、短信发送、直播点播体验包及云服务器等套餐

更多网易技术、产品、运营经验分享请点击