内存气泡导致文件损坏错误分析

今天测试发现,升级新包后,原来可以插入Elasticsearch的数据,插入不了了,具体错误,我缩减到了一行代码,如下,可重现:

1
python -c "from pandas.formats.format import detect_console_encoding"

输出错误:

1
2
3
4
5
6
7
8
9
10
11
12
File "/home/fantom/share/Python-2.7/lib/site-packages/pandas/__init__.py", line 37, in <module>
import pandas.core.config_init
File "/home/fantom/share/Python-2.7/lib/site-packages/pandas/core/config_init.py", line 18, in <module>
from pandas.formats.format import detect_console_encoding
File "/home/fantom/share/Python-2.7/lib/site-packages/pandas/formats/format.py", line 13, in <module>
from pandas.types.missing import isnull, notnull
File "/home/fantom/share/Python-2.7/lib/site-packages/pandas/types/missing.py", line 12, in <module>
from .common import (is_string_dtype, is_datetimelike,
File "/home/fantom/share/Python-2.7/lib/site-packages/pandas/types/common.py", line 8, in <module>
from pandas import lib, algos
File "pandas/algos.pyx", line 1, in init pandas.algos (pandas/algos.c:155772)
SystemError: Negative size passed to PyString_FromStringAndSize

分析1 - 新改动导致?

由于是新包升级后才发现的,导致将怀疑重心放到了新改动代码,但是SVN上的记录并没有相关联部分。对比了一部分python代码,发现新老版本并没有不同。

分析2 - 从报错源码入手

而报错的 pandas/algos.c:155772 位置,是个编译时工具合并的代码,这库是用包管理器安装的,不是自己编译的,只好从官网找源码,algos.pyx倒是有,但是对应不上algos.c,咱们的编译环境也缺乏依赖,编译麻烦,从源码路径查找线索的方法,先放一放,看有没有更简单的方法。

分析3 - 怀疑加载的库不一样

因为是测试环境,可能会有人安装新库,或者修改环境变量,我输出sys.path,对比了一下,新旧环境是一致的:

1
2
3
import sys;
for i in sys.path:
print i

分析4 - 调试python

用pdb跑语句:

1
python -c "import pdb; pdb.set_trace(); from pandas.formats.format import detect_console_encoding"

用s进入,输出:

1
2
3
4
5
6
7
8
9
10
11
12
> /home/fantom/share/Python-2.7/lib/site-packages/pandas/types/common.py(8)<module>()
-> from pandas import lib, algos
(Pdb) s
--Call--
> /home/fantom/share/Python-2.7/lib/site-packages/six-1.10.0-py2.7.egg/six.py(184)find_module()
-> def find_module(self, fullname, path=None):
(Pdb) n
> /home/fantom/share/Python-2.7/lib/site-packages/six-1.10.0-py2.7.egg/six.py(185)find_module()
-> if fullname in self.known_modules:
(Pdb) n
> /home/fantom/share/Python-2.7/lib/site-packages/six-1.10.0-py2.7.egg/six.py(188)find_module()
-> return None

经过多次试验,在这个地方,再执行下一条,就会报错,打印一下堆栈看,也看不出什么有价值的信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(Pdb) bt
<string>(1)<module>()
/home/fantom/share/Python-2.7/lib/site-packages/pandas/__init__.py(37)<module>()
-> import pandas.core.config_init
/home/fantom/share/Python-2.7/lib/site-packages/pandas/core/config_init.py(18)<module>()
-> from pandas.formats.format import detect_console_encoding
/home/fantom/share/Python-2.7/lib/site-packages/pandas/formats/format.py(13)<module>()
-> from pandas.types.missing import isnull, notnull
/home/fantom/share/Python-2.7/lib/site-packages/pandas/types/missing.py(12)<module>()
-> from .common import (is_string_dtype, is_datetimelike,
/home/fantom/share/Python-2.7/lib/site-packages/pandas/types/common.py(8)<module>()
-> from pandas import lib, algos
> /home/fantom/share/Python-2.7/lib/site-packages/six-1.10.0-py2.7.egg/six.py(188)find_module()
-> return None

继续执行return None:

1
2
3
4
(Pdb) s
--Return--
> /home/fantom/share/Python-2.7/lib/site-packages/six-1.10.0-py2.7.egg/six.py(188)find_module()->None
-> return None

报错了:

1
2
3
4
(Pdb) s
SystemError: 'Negative size passed to PyString_FromStringAndSize'
> /home/fantom/share/Python-2.7/lib/site-packages/pandas/types/common.py(8)<module>()
-> from pandas import lib, algos

这很无奈,我正常的环境,这里执行return None后,是可以执行到下面的代码的:

1
2
3
4
(Pdb) s
--Call--
> /home/fantom/share/Python-2.7/lib/site-packages/numpy/core/getlimits.py(250)__init__()
-> def __init__(self, int_type):

可以看出,pdb的调试,粒度还是比较粗糙,并不能看出是哪儿有错误。不过还是缩短了怀疑点。

分析5 - 调试python配合strace输出

之前用pdb调试到了return None的地方,然后再执行就错误,中间肯定有什么语句,只是pdb粒度不够,显示不出来,于是我想用strace来试试运气,看能不能更深入细节,好在pdb调试不是gdb类似原理,不然,不能既调试又strace。
两边都执行到return None这里,然后将strace attach上python进程,再输入s执行下一条语句,此时,strace终端显示的结果的确有差异,对比一下。

执行正常的情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
stat("/home/fantom/share/Python-2.7/lib/site-packages/pandas/algos", 0x7fffee5b2150) = -1 ENOENT (No such file or directory)
open("/home/fantom/share/Python-2.7/lib/site-packages/pandas/algos.so", O_RDONLY) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=8162301, ...}) = 0
open("/home/fantom/share/Python-2.7/lib/site-packages/pandas/algos.so", O_RDONLY|O_CLOEXEC) = 9
read(9, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200$\1\0\0\0\0\0"..., 832) = 832
fstat(9, {st_mode=S_IFREG|0644, st_size=8162301, ...}) = 0
mmap(NULL, 3853896, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 9, 0) = 0x7f48d4521000
mprotect(0x7f48d46bd000, 2097152, PROT_NONE) = 0
mmap(0x7f48d48bd000, 53248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 9, 0x19c000) = 0x7f48d48bd000
mmap(0x7f48d48ca000, 15944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f48d48ca000
close(9) = 0
brk(0) = 0x121b000
brk(0x1265000) = 0x1265000
write(1, "--Call--\n", 9) = 9
stat("/home/fantom/share/Python-2.7/lib/site-packages/numpy/core/getlimits.py", {st_mode=S_IFREG|0644, st_size=9904, ...}) = 0
open("/home/fantom/share/Python-2.7/lib/site-packages/numpy/core/getlimits.py", O_RDONLY) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=9904, ...}) = 0
fstat(9, {st_mode=S_IFREG|0644, st_size=9904, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f48f1755000
read(9, "\"\"\"Machine limits for Float32 an"..., 8192) = 8192
read(9, "lid integer data type.\")\n\n de"..., 4096) = 1712
read(9, "", 4096) = 0
close(9) = 0
munmap(0x7f48f1755000, 4096) = 0
write(1, "> /home/fantom/share/Python-2.7/"..., 89) = 89
write(1, "-> def __init__(self, int_type):"..., 33) = 33

执行异常的情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
stat("/home/fantom/share/Python-2.7/lib/site-packages/pandas/algos", 0x7ffff01e83c0) = -1 ENOENT (No such file or directory)
open("/home/fantom/share/Python-2.7/lib/site-packages/pandas/algos.so", O_RDONLY) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=8162301, ...}) = 0
open("/home/fantom/share/Python-2.7/lib/site-packages/pandas/algos.so", O_RDONLY|O_CLOEXEC) = 9
read(9, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200$\1\0\0\0\0\0"..., 832) = 832
fstat(9, {st_mode=S_IFREG|0644, st_size=8162301, ...}) = 0
mmap(NULL, 3853896, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 9, 0) = 0x7f25ec120000
mprotect(0x7f25ec2bc000, 2097152, PROT_NONE) = 0
mmap(0x7f25ec4bc000, 53248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 9, 0x19c000) = 0x7f25ec4bc000
mmap(0x7f25ec4c9000, 15944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f25ec4c9000
close(9) = 0
close(8) = 0
write(1, "SystemError: 'Negative size pass"..., 66) = 66
write(1, "> /home/fantom/share/Python-2.7/"..., 86) = 86

可以看出,两者都载入了 /home/fantom/share/Python-2.7/lib/site-packages/pandas/algos.so,正常情况下,有三次read:

1
2
3
read(9, "\"\"\"Machine limits for Float32 an"..., 8192) = 8192
read(9, "lid integer data type.\")\n\n de"..., 4096) = 1712
read(9, "", 4096) = 0

而异常情况下,没有read,直接close句柄报错了。

这下怀疑点放在了文件algos.so上。

对比两个文件的差异,MD5不同,将两个文件下载下来,对比一下,发现,有问题的环境,文件内容有多块置零区域,如图:
diff so

算了一下置零区域,是:0x19fff - 0x16fff = 12K
这个整数让我想起了之前查过的一个虚拟机内存回收导致的文件系统缓存被冲刷掉的情况,12K正好是页大小的整倍数。

询问了一下,测试部的环境的确是放到虚拟机上的,根据当时的判断,这个问题只需要重启一下就会恢复,最后重启的确恢复了。

原BUG分析

我们来回顾一下当时的BUG,之前是在QEMU虚拟化环境下的docker文件损坏,里面的rm命令,一执行就报错:

1
2
docker-host ~ # /var/lib/docker/overlay2/51ad45a29186d1a9b9bed169ff495bc451e6895b8142b21eadfac23fd4789c72/diff/bin/rm
Segmentation fault

docker-host上,磁盘结构如下:

1
2
3
4
5
└─vda4 254:4 0 458.5G 0 part
├─acc--vg-log 253:0 0 32.1G 0 lvm /var/log
├─acc--vg-docker 253:1 0 96.3G 0 lvm /var/lib/docker
├─acc--vg-data 253:2 0 64.2G 0 lvm /sf/k8s/data
└─acc--vg-volume 253:3 0 128.4G 0 lvm /sf/k8s/volume

可以看出,vda4是磁盘分区,vda表示是使用了virtio的磁盘。
将vda4划分为四个LVM逻辑分区,rm文件,位于第二个分区 acc–vg-docker 上。

普通读取rm文件

1
dd if=/var/lib/docker/overlay2/51ad45a29186d1a9b9bed169ff495bc451e6895b8142b21eadfac23fd4789c72/diff/bin/rm of=./read_fs_rm

现在其实是从文件缓存里面读取的,如果加了iflag=direct的话会破坏现场。

从逻辑分区(acc–vg-docker)偏移读取rm文件

当前文件系统为xfs,通过xfs_bmap可以看到,文件的起始位置:

1
2
3
4
docker-host ~ # xfs_bmap -v /var/lib/docker/overlay2/51ad45a29186d1a9b9bed169ff495bc451e6895b8142b21eadfac23fd4789c72/diff/bin/rm
/var/lib/docker/overlay2/51ad45a29186d1a9b9bed169ff495bc451e6895b8142b21eadfac23fd4789c72/diff/bin/rm:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..119]: 151446672..151446791 3 (9360..9479) 120

起始位置 151446672 * 512 = 77540696064,此值便是文件rm在当前分区的offset:

1
dd if=/dev/acc-vg/docker of=./read_lv_rm bs=1 count=60072 skip=77540696064

从物理分区(vda4)偏移读取rm文件

物理分区上vda4上,创建了4个逻辑分区:

1
2
3
4
5
docker-host ~ # dmsetup table
acc--vg-data: 0 134610944 linear 254:4 269223936
acc--vg-docker: 0 201916416 linear 254:4 67307520 <--
acc--vg-volume: 0 269238272 linear 254:4 403834880
acc--vg-log: 0 67305472 linear 254:4 2048

可以看出,逻辑分区acc–vg-docker相对于物理分区vda4的偏移为67307520,那么,结合前面的文件在逻辑分区的偏移为151446672,可以得出,文件rm相对于vda4的偏移为(67307520 + 151446672)*512 = 112002146304

1
dd if=/dev/vda4 bs=1 count=60072 of=./read_pv_rm skip=112002146304

初步结论

对比几个文件的MD5,可以发现,从xfs文件系统读取出来的,是错误rm,而其他地方读取出来的是正确的。

1
2
3
4
5
6
7
docker-host ~ # md5sum *
4ecc7b84fd459f2f8ab1da28bce231d4 read_fs_rm xfs文件系统读出的
2314b8d72ef05e807b99a0963a117692 read_lv_rm 根据逻辑分区acc--vg-docker偏移读出的
2314b8d72ef05e807b99a0963a117692 read_pv_rm 根据物理分区vda4偏移读出的
2314b8d72ef05e807b99a0963a117692 rm 提供的正确的rm
4ecc7b84fd459f2f8ab1da28bce231d4 rm-bug 提供的错误的rm

基本上,错误限定在了文件系统缓存和xfs文件系统BUG上。

虚拟机层次的原因

这个问题,之前出现过一次,由于环境破坏了,没有分析出原因,后来发现一个类似日志:

1
2
docker-host ~ # docker logs sxfdcloud
/sf/k8s/lib/scripts/hostutils.sh: line 5: 9843 Segmentation fault rm -rf "${PREFIX}/var/k8sdns"

通过日志查找蛛丝马迹,最后怀疑点放到了balloon上,如下是日志分析过程。

  1. 17-08-02 17:26 由于balloon_auto导致GUEST可用内存聚然变小

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    [17-08-02 17:26:00]***********************************
    total used free shared buffers cached
    Mem: 31966052 28770748 3195304 0 158244 19168896
    -/+ buffers/cache: 9443608 22522444
    Swap: 31457276 0 31457276
    [17-08-02 17:28:01]***********************************
    total used free shared buffers cached
    Mem: 31966052 30989192 976860 0 159524 20696388
    -/+ buffers/cache: 10133280 21832772
    Swap: 31457276 1270364 30186912
  2. 17-08-02 18:45 又来一次气泡,这次有点过了,导致GUEST分配内存失败

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    Aug 2 18:45:42 localhost kernel: [38623.583998] vballoon: page allocation failure: order:0, mode:0x310da
    Aug 2 18:45:42 localhost kernel: [38623.584823] CPU: 3 PID: 62 Comm: vballoon Tainted: G ------------ T 3.10.0-514.6.1.el7-sxfacc-hcivm-c27a60ea #1
    Aug 2 18:45:42 localhost kernel: [38623.584827] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    Aug 2 18:45:42 localhost kernel: [38623.584829] 0000000000000000 ffff880233703c10 ffffffff817fcf59 ffff880233703c98
    Aug 2 18:45:42 localhost kernel: [38623.584832] ffffffff8115c380 00000000ffffffff ffff880233703c38 ffffffffffffffff
    Aug 2 18:45:42 localhost kernel: [38623.584835] 000310da00000000 0000000000000286 0000000000000003 0000000100000004
    Aug 2 18:45:42 localhost kernel: [38623.584838] Call Trace:
    Aug 2 18:45:42 localhost kernel: [38623.602056] [<ffffffff817fcf59>] dump_stack+0x19/0x1b
    Aug 2 18:45:42 localhost kernel: [38623.604510] [<ffffffff8115c380>] warn_alloc_failed+0xf0/0x160
    Aug 2 18:45:42 localhost kernel: [38623.615100] [<ffffffff81160e49>] __alloc_pages_nodemask+0x9a9/0xb60
    Aug 2 18:45:42 localhost kernel: [38623.615134] [<ffffffff811a310a>] alloc_pages_current+0xaa/0x170
    Aug 2 18:45:42 localhost kernel: [38623.615147] [<ffffffff811ce3ef>] balloon_page_enqueue+0x1f/0x90
    Aug 2 18:45:42 localhost kernel: [38623.616079] [<ffffffff81519d64>] balloon+0x164/0x3d0
    Aug 2 18:45:42 localhost kernel: [38623.618249] [<ffffffff810943b0>] ? wake_up_atomic_t+0x30/0x30
    Aug 2 18:45:42 localhost kernel: [38623.622373] [<ffffffff81519c00>] ? update_balloon_stats+0x110/0x110
    Aug 2 18:45:42 localhost kernel: [38623.622376] [<ffffffff81093390>] kthread+0xc0/0xd0
    Aug 2 18:45:42 localhost kernel: [38623.622378] [<ffffffff810932d0>] ? kthread_create_on_node+0x110/0x110
    Aug 2 18:45:42 localhost kernel: [38623.622391] [<ffffffff8180d298>] ret_from_fork+0x58/0x90
    Aug 2 18:45:42 localhost kernel: [38623.622393] [<ffffffff810932d0>] ? kthread_create_on_node+0x110/0x110

    这个日志我后来重现出来过:

    1
    2
    Aug 8 10:00:44 localhost kernel: [ 5774.504262] virtio_balloon virtio0: Out of puff! Can't get 1 pages
    Aug 8 10:00:46 localhost kernel: [ 5776.965079] sangfor_watchdo[3596]: segfault at 8 ip 000000000042ad48 sp 00007ffdd3fd7b38 error 4 in bash[400000+ab000]
  3. 2017-08-03 02:06 又来一次气泡

    HOST机日志:

    1
    2
    3
    4
    [2017-08-03 02:05:55] BEGIN reclaim_vm_mem 1145875170240, os_type=linux
    [2017-08-03 02:06:02] [OK]/sf/sbin/guest_reduce_physical_mem.sh 1145875170240 5955, ret=0
    [2017-08-03 02:06:03] [OK]TOTAL=7910736, RECLAIMABLE=3066064, USED=3379280, HELD=6445344, REPORTDATA_TIME=65194, CURRENT_TIME=65201
    [2017-08-03 02:06:03] END reclaim_vm_mem 1145875170240, ret=0

    GUEST机,空闲内存突然增加:

    1
    2
    3
    4
    5
    6
    7
    8
    =========2017-08-03 02:06:01===============
    total used free shared buff/cache available
    Mem: 7.4G 2.9G 1.7G 2.9M 2.8G 4.2G
    Swap: 13G 60M 12G
    =========2017-08-03 02:07:01===============
    total used free shared buff/cache available
    Mem: 7.5G 391M 4.5G 592K 2.7G 6.9G
    Swap: 13G 3.7M 13G

    根据blackbox发现,是由于docker相关进程全部结束了。
    推测是内存再次压缩后,导致进程分配内存失败,所以,被KILL掉了。

  4. 2017-08-03 02:07 开始出现多个atools进程现象

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
    root 3251 0.3 0.0 22780 2896 ? S Aug02 3:27 /bin/bash /usr/local/SangforVMSTool/sangfor_vm_proxyd
    root 3444 0.3 0.0 22624 2544 ? S Aug02 4:15 \_ /bin/bash /usr/local/SangforVMSTool/sangfor_vm_proxyd_w
    root 8151 0.0 0.0 7824 624 ? S 02:07 0:00 | \_ sleep 1
    root 3498 0.0 0.0 22112 2156 ? S Aug02 0:14 \_ /bin/bash /usr/local/SangforVMSTool/sangfor_vm_proxyd_r
    root 8064 0.0 0.0 7824 620 ? S 02:06 0:00 \_ sleep 1
    root 3252 0.0 0.0 23088 3268 ? S Aug02 0:12 /bin/bash /usr/local/SangforVMSTool/sangfor_module_update
    root 3446 0.6 0.0 22876 3020 ? S Aug02 7:01 \_ /bin/bash /usr/local/SangforVMSTool/sangfor_update_ipc_callback
    root 8113 0.0 0.0 7824 624 ? S 02:06 0:00 | \_ sleep 1
    root 7948 0.0 0.0 7824 620 ? S 02:06 0:00 \_ sleep 5
    root 3254 0.6 0.0 22708 2868 ? S Aug02 7:02 /bin/bash /usr/local/SangforVMSTool/sangfor_sfping
    root 8126 0.0 0.0 7824 620 ? S 02:07 0:00 \_ sleep 1
    root 5716 0.0 0.0 22780 2852 ? S 02:06 0:00 /bin/bash /usr/local/SangforVMSTool/sangfor_module_update
    root 5885 0.3 0.0 22120 2260 ? S 02:06 0:00 \_ /bin/bash /usr/local/SangforVMSTool/sangfor_update_ipc_callback
    root 8143 0.0 0.0 7824 620 ? S 02:07 0:00 | \_ sleep 1
    root 7753 0.0 0.0 7824 624 ? S 02:06 0:00 \_ sleep 5
    root 5807 0.3 0.0 21988 2140 ? S 02:06 0:00 /bin/bash /usr/local/SangforVMSTool/sangfor_sfping
    root 8101 0.0 0.0 7824 620 ? S 02:06 0:00 \_ sleep 1
    root 5864 0.5 0.0 22120 2260 ? S 02:06 0:00 /bin/bash /usr/local/SangforVMSTool/sangfor_update_ipc_callback
    root 8146 0.0 0.0 7824 620 ? S 02:07 0:00 \_ sleep 1
    root 6013 0.0 0.0 21856 1996 ? S 02:06 0:00 /bin/bash /usr/local/SangforVMSTool/sangfor_vm_proxyd
    root 7382 0.0 0.0 7824 624 ? S 02:06 0:00 \_ sleep 10
  5. 2017-08-03 02:45 出现grep文件错误的日志

    1
    Aug 3 02:45:13 localhost CROND[24320]: (root) CMDOUT (/sf/k8s/etc/cron/minutely/blackbox: line 366: /bin/grep: cannot execute binary file: Exec format error)

    结合第4点多个atools现象的源码分析,其中,依靠kill同名进程来达到互斥目的,而如果grep坏掉,就不会KILL掉同名进程,导致会被多次拉起。

    1
    2
    3
    4
    local ps_list=`ps -Ao pid,command | grep "${procname}\>" | grep -v grep | grep -v sudo | grep -v "\<${pid}\>"`
    if [ ! -z "${ps_list}" ]; then
    echo "${ps_list}" | awk '{print "kill " $1 " 2>/dev/null"}' | sh
    fi

    这里kill逻辑进不来。
    于是,第4点的现象能合理解释通。
    此外,坏掉的不止grep:

    1
    2
    Aug 3 03:10:41 localhost CROND[22055]: (root) CMDOUT (/usr/sbin/savelog: line 132: 1098 Segmentation fault chgrp -- "$group" "$1")
    Aug 3 03:10:41 localhost kernel: [68923.261013] chgrp[1098]: segfault at 1 ip 0000000000409800 sp 00007fffdecce0d8 error 6 in chgrp[400000+e000]
  6. grep为何会坏?

    第一次的环境我并没有参与,目前已经不存在;第二次的环境,是重启后的,也相当于不存在,只剩下日志。
    推测原因是balloon挤占内存逻辑在出现失败后,导致GUEST系统内存紊乱,所以出现只读文件内存数据损坏。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    node-6544842425444 ~ # lspci | grep -i balloon
    00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
    node-6544842425444 ~ # lspci -vv -s 00:03.0
    00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
    Subsystem: Red Hat, Inc Virtio memory balloon
    Physical Slot: 3
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
    Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 11
    Region 0: I/O ports at f120 [size=32]
    Kernel driver in use: virtio-pci

    根据第一次出现4K空数据的推理:xfs文件系统cache里面的数据有问题,而重新direct读入的数据没有问题,正好印证,内存里面的数据被破坏,而磁盘里面是正确的。

  7. 重现印证

    之前出现4K空数据问题的时候,有一条日志记录,是docker打印出来的,如下:

    1
    2
    3
    time="2017-06-21T06:19:43.273510485+08:00" level=error msg="Handler for GET /containers/78742eb6600e8046bafcfeeb02cd0d410510ed1609194d58385b67c10e14fa1c/json returned error: No such container: 78742eb6600e8046bafcfeeb02cd0d410510ed1609194d58385b67c10e14fa1c"
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal 0xb code=0x1 addr=0x0 pc=0x40a5e4]

    看得出,是由于内存不够导致空指针了,所以反推之前4K问题出现的时候,和内存相关。
    后面我自己通过重复验证:

    1
    /sf/sbin/qmpcmd.pl 6544842425444 "balloon_auto 6341"

    发现类似日志:

    1
    2
    3
    4
    2017-08-08 11:10:42.812567 I | http: panic serving @: runtime error: invalid memory address or nil pointer dereference
    goroutine 8557 [running]:
    net/http.(*conn).serve.func1(0xc8236eaf00)
    /usr/local/go/src/net/http/server.go:1389 +0xc1

    两者可以相互印证,遗憾的是,我做此类验证的时候,GUEST系统是直接挂掉,而非仅仅文件损坏。

    同时,还有一些怎么balloon对系统造成影响的日志出现:

    1
    2
    3
    4
    5
    Aug 8 11:06:44 localhost kernel: [ 2126.910391] EXT4-fs error (device vda2): htree_dirblock_to_tree:914: inode #1218848: block 4883835: comm find: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
    Aug 8 11:06:44 localhost kernel: [ 2127.051347] EXT4-fs warning (device vda2): dx_probe:719: dx entry: limit != root limit
    Aug 8 11:06:44 localhost kernel: [ 2127.051351] EXT4-fs warning (device vda2): dx_probe:797: Corrupt dir inode 1218885, running e2fsck is recommended.
    Aug 8 11:06:44 localhost kernel: [ 2127.051358] EXT4-fs error (device vda2): ext4_readdir:227: inode #1218885: block 4883970: comm find: path (unknown): bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
    Aug 8 11:06:44 localhost kernel: [ 2127.113701] EXT4-fs error (device vda2): htree_dirblock_to_tree:914: inode #1245958: block 4984406: comm find: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0

虽然没有100%吻合现象,但是至少有80%左右可以断定,问题来自于内存气泡,后同事发现Centos 7 virtio_balloon驱动新增了migratepage特性,而我们的还不支持。