7.7 KVM Server crashing

General support questions
Post Reply
errors301
Posts: 6
Joined: 2020/10/14 18:25:02

7.7 KVM Server crashing

Post by errors301 » 2020/10/14 18:30:08

Got a number of KVM servers all running 3.10.0-1127.el7.x86_64. They are crashing weekly anyone seen this before ?

crash 7.2.3-10.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [284MB]: patching 87167 gdb minimal_symbol values

KERNEL: /usr/lib/debug/usr/lib/modules/3.10.0-1127.el7.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2020-10-13-08:04:30/vmcore [PARTIAL DUMP]
CPUS: 128
DATE: Tue Oct 13 08:03:39 2020
UPTIME: 28 days, 22:16:41
LOAD AVERAGE: 38.73, 12.88, 5.82
TASKS: 3038
NODENAME: XXXXXX
RELEASE: 3.10.0-1127.el7.x86_64
VERSION: #1 SMP Tue Mar 31 23:36:51 UTC 2020
MACHINE: x86_64 (2500 Mhz)
MEMORY: 255.9 GB
PANIC: "kernel BUG at mm/page_alloc.c:1656!"
PID: 83077
COMMAND: "worker"
TASK: ffff99185b9262a0 [THREAD_INFO: ffff991858048000]
CPU: 69
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 83077 TASK: ffff99185b9262a0 CPU: 69 COMMAND: "worker"
#0 [ffff99185804b120] machine_kexec at ffffffff92c66044
#1 [ffff99185804b180] __crash_kexec at ffffffff92d22ee2
#2 [ffff99185804b250] crash_kexec at ffffffff92d22fd0
#3 [ffff99185804b268] oops_end at ffffffff9338a798
#4 [ffff99185804b290] die at ffffffff92c30a7b
#5 [ffff99185804b2c0] do_trap at ffffffff93389ee0
#6 [ffff99185804b310] do_invalid_op at ffffffff92c2d2a4
#7 [ffff99185804b3c0] invalid_op at ffffffff9339622e
[exception RIP: move_freepages+350]
RIP: ffffffff92dc458e RSP: ffff99185804b470 RFLAGS: 00010006
RAX: ffff99388f359000 RBX: ffffdd1ba13f8000 RCX: 0000000000000001
RDX: ffff99388f35a000 RSI: 0000000000000000 RDI: ffff99388f35a000
RBP: ffff99185804b4c0 R8: 000000000204f380 R9: 000000000184ffff
R10: ffffdd1ba13fffc0 R11: ffffdd1bbeba3b40 R12: 0000000000000001
R13: 0000000000000001 R14: ffff99388f35a0f8 R15: ffffdd1ba13fffc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff99185804b4c8] move_freepages_block at ffffffff92dc4603
#9 [ffff99185804b4d8] __rmqueue at ffffffff92dc6024
#10 [ffff99185804b548] get_page_from_freelist at ffffffff92dc874c
#11 [ffff99185804b660] __alloc_pages_nodemask at ffffffff92dc8e76
#12 [ffff99185804b708] alloc_pages_current at ffffffff92e18e18
#13 [ffff99185804b750] new_slab at ffffffff92e270f3
#14 [ffff99185804b790] ___slab_alloc at ffffffff92e2760c
#15 [ffff99185804b868] __slab_alloc at ffffffff9337c884
#16 [ffff99185804b8a8] kmem_cache_alloc at ffffffff92e287cb
#17 [ffff99185804b8e8] alloc_buffer_head at ffffffff92e85701
#18 [ffff99185804b900] alloc_page_buffers at ffffffff92e85d4a
#19 [ffff99185804b940] create_empty_buffers at ffffffff92e866be
#20 [ffff99185804b968] create_page_buffers at ffffffff92e867d7
#21 [ffff99185804b980] __block_write_begin_int at ffffffff92e87e9f
#22 [ffff99185804ba40] __block_write_begin at ffffffff92e88471
#23 [ffff99185804ba50] ext4_da_write_begin at ffffffffc02b9707 [ext4]
#24 [ffff99185804bad8] generic_file_buffered_write at ffffffff92dbdfef
#25 [ffff99185804bb90] __generic_file_aio_write at ffffffff92dc0872
#26 [ffff99185804bc10] generic_file_aio_write at ffffffff92dc0ae9
#27 [ffff99185804bc50] ext4_file_write at ffffffffc02ae5c8 [ext4]
#28 [ffff99185804bd28] do_sync_readv_writev at ffffffff92e4c72b
#29 [ffff99185804be00] do_readv_writev at ffffffff92e4e31e
#30 [ffff99185804bef0] vfs_writev at ffffffff92e4e545
#31 [ffff99185804bf00] sys_pwritev at ffffffff92e4e942
#32 [ffff99185804bf50] tracesys at ffffffff93393166 (via system_call)
RIP: 00007fffde88357b RSP: 00007ffdb872e9f0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 000055555741bdb0 RCX: ffffffffffffffff
RDX: 000000000000000c RSI: 000055555751f738 RDI: 0000000000000019
RBP: 0000555557144470 R8: 0000000000000000 R9: 00000000ffffffff
R10: 00000006ba850000 R11: 0000000000000293 R12: 0000555557141a40
R13: 00005555571444d8 R14: 0000000000000000 R15: 00007ffdb8732700
ORIG_RAX: 0000000000000128 CS: 0033 SS: 002b
crash>


[2499367.685357] ------------[ cut here ]------------
[2499367.685903] kernel BUG at mm/page_alloc.c:1656!
[2499367.686364] invalid opcode: 0000 [#1] SMP
[2499367.686979] Modules linked in: vhost_net vhost macvtap macvlan xt_nat veth xt_CHECKSUM iptable_mangle nf_conntrack_netlink ipt_MASQUERADE nf_nat_masquerade_ipv4 nfnetlink xt_conntrack ipt_REJECT nf_reject_ipv4 xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c tun br_netfilter bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter overlay(T) bonding amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb drm_panel_orientation_quirks sg ptp pps_core joydev i2c_algo_bit dca ipmi_si k10temp i2c_piix4 ipmi_devintf ipmi_msghandler pinctrl_amd i2c_designware_platform
[2499367.690348] i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ahci libahci libata nvme nvme_core nfit libnvdimm
[2499367.692319] CPU: 69 PID: 83077 Comm: worker Kdump: loaded Tainted: G ------------ T 3.10.0-1127.el7.x86_64 #1
[2499367.693018] Hardware name: Supermicro AS -2124BT-HTR/H12DST-B, BIOS 1.1 01/10/2020
[2499367.693711] task: ffff99185b9262a0 ti: ffff991858048000 task.ti: ffff991858048000
[2499367.694408] RIP: 0010:[<ffffffff92dc458e>] [<ffffffff92dc458e>] move_freepages+0x15e/0x160
[2499367.695121] RSP: 0018:ffff99185804b470 EFLAGS: 00010006
[2499367.695825] RAX: ffff99388f359000 RBX: ffffdd1ba13f8000 RCX: 0000000000000001
[2499367.696537] RDX: ffff99388f35a000 RSI: 0000000000000000 RDI: ffff99388f35a000
[2499367.697252] RBP: ffff99185804b4c0 R08: 000000000204f380 R09: 000000000184ffff
[2499367.697962] R10: ffffdd1ba13fffc0 R11: ffffdd1bbeba3b40 R12: 0000000000000001
[2499367.698676] R13: 0000000000000001 R14: ffff99388f35a0f8 R15: ffffdd1ba13fffc0
[2499367.699390] FS: 00007ffdb8732700(0000) GS:ffff99380eb40000(0000) knlGS:0000000000000000
[2499367.700133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2499367.700875] CR2: 00007fff7ef1e000 CR3: 0000001f79a48000 CR4: 0000000000340fe0
[2499367.701638] Call Trace:
[2499367.702371] [<ffffffff92dc4603>] move_freepages_block+0x73/0x80
[2499367.703111] [<ffffffff92dc6024>] __rmqueue+0x264/0x460
[2499367.703843] [<ffffffff92dc874c>] get_page_from_freelist+0x4dc/0xaa0
[2499367.704581] [<ffffffff92dc8e76>] __alloc_pages_nodemask+0x166/0x450
[2499367.705325] [<ffffffff92e18e18>] alloc_pages_current+0x98/0x110
[2499367.706059] [<ffffffff92e270f3>] new_slab+0x393/0x4e0
[2499367.706789] [<ffffffff92e2760c>] ___slab_alloc+0x3cc/0x520
[2499367.707521] [<ffffffff92e85701>] ? alloc_buffer_head+0x21/0x60
[2499367.708251] [<ffffffff92e85701>] ? alloc_buffer_head+0x21/0x60
[2499367.708987] [<ffffffff9337c884>] __slab_alloc+0x40/0x5c
[2499367.709712] [<ffffffff92e287cb>] kmem_cache_alloc+0x19b/0x1f0
[2499367.710436] [<ffffffff92e85701>] ? alloc_buffer_head+0x21/0x60
[2499367.711155] [<ffffffff92e85701>] alloc_buffer_head+0x21/0x60
[2499367.711899] [<ffffffff92e85d4a>] alloc_page_buffers+0x3a/0xc0
[2499367.712623] [<ffffffff92e866be>] create_empty_buffers+0x1e/0xf0
[2499367.713340] [<ffffffff92e867d7>] create_page_buffers+0x47/0x50
[2499367.714072] [<ffffffff92e87e9f>] __block_write_begin_int+0x8f/0x650
[2499367.714771] [<ffffffff92e287f2>] ? kmem_cache_alloc+0x1c2/0x1f0
[2499367.715477] [<ffffffffc02b2a00>] ? ext4_da_invalidatepage+0x320/0x320 [ext4]
[2499367.716172] [<ffffffffc02b96b9>] ? ext4_da_write_begin+0x119/0x360 [ext4]
[2499367.716865] [<ffffffff92e88471>] __block_write_begin+0x11/0x20
[2499367.717555] [<ffffffffc02b9707>] ext4_da_write_begin+0x167/0x360 [ext4]
[2499367.718224] [<ffffffff92dbdfef>] generic_file_buffered_write+0x10f/0x270
[2499367.718892] [<ffffffff92dc0872>] __generic_file_aio_write+0x1e2/0x400
[2499367.719562] [<ffffffff92dc0ae9>] generic_file_aio_write+0x59/0xa0
[2499367.720221] [<ffffffffc02ae5c8>] ext4_file_write+0x348/0x600 [ext4]
[2499367.720919] [<ffffffff92f8f894>] ? timerqueue_del+0x24/0x70
[2499367.721549] [<ffffffff92cca52f>] ? __remove_hrtimer+0x3f/0xb0
[2499367.722180] [<ffffffff92ccacc8>] ? hrtimer_try_to_cancel+0xd8/0x120
[2499367.722780] [<ffffffff92ccad2a>] ? hrtimer_cancel+0x1a/0x30
[2499367.723353] [<ffffffff92e4c72b>] do_sync_readv_writev+0x7b/0xd0
[2499367.723901] [<ffffffff92e4e31e>] do_readv_writev+0xce/0x260
[2499367.724457] [<ffffffffc02ae280>] ? ext4_write_checks.isra.8+0x150/0x150 [ext4]
[2499367.724975] [<ffffffff92e4c5d0>] ? do_sync_read+0xe0/0xe0
[2499367.725483] [<ffffffff92d5776b>] ? __seccomp_filter+0x5b/0x300
[2499367.725978] [<ffffffff92e4e545>] vfs_writev+0x35/0x60
[2499367.726461] [<ffffffff92e4e942>] SyS_pwritev+0xc2/0xf0
[2499367.726932] [<ffffffff93393166>] tracesys+0xa6/0xcc
[2499367.727393] Code: f5 a0 b4 95 93 e9 26 ff ff ff 0f 1f 00 48 89 d0 48 c1 ea 29 48 c1 e8 36 81 e2 00 18 00 00 48 03 14 c5 a0 b4 95 93 e9 e4 fe ff ff <0f> 0b 66 66 66 66 90 4c 8b 05 54 16 a8 00 48 89 f0 4c 29 c0 48
[2499367.728446] RIP [<ffffffff92dc458e>] move_freepages+0x15e/0x160
[2499367.728941] RSP <ffff99185804b470>

User avatar
TrevorH
Forum Moderator
Posts: 29695
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: 7.7 KVM Server crashing

Post by TrevorH » 2020/10/14 19:08:49

You're several months out of date - the current kernel is kernel-3.10.0-1127.19.1.el7.x86_64 but reading the rpm changelog with rpm -q --changelog kernel-3.10.0-1127.19.1.el7.x86_64 | less doesn't show me anything that looks hopeful for a fix. RHEL 7.9 was released a couple of weeks ago and is being rebuilt for CentOS 7.9 and that has a kernel-3.10.0-1160.el7.x86_64 and there is one fix in that mentioning page_alloc that might help bur even that doesn't look very likely. Your crash does look very similar to https://bugs.centos.org/view.php?id=13964 but that was using xfs and is already fixed.
CentOS 6 will die in November 2020 - migrate sooner rather than later!
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 is dead, do not use it.
Full time Geek, part time moderator. Use the FAQ Luke

errors301
Posts: 6
Joined: 2020/10/14 18:25:02

Re: 7.7 KVM Server crashing

Post by errors301 » 2020/10/14 20:42:51

Good shout just spied the repo is updating but the mirror isnt! Ill get that kernel patched up.

Dont you think issue is closer to this one ? thing is can hardly remove RAM and im not sure about going for elrepo 4.4 kernel.

> https://bugs.centos.org/view.php?id=17369

errors301
Posts: 6
Joined: 2020/10/14 18:25:02

Re: 7.7 KVM Server crashing

Post by errors301 » 2020/10/16 10:45:33

Happened again this morning

# crash /usr/lib/debug/usr/lib/modules/3.10.0-1127.el7.x86_64/vmlinux 127.0.0.1-2020-10-16-08:16:50/vmcore

crash 7.2.3-10.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [8MB]: patching 87167 gdb minimal_symbol values

KERNEL: /usr/lib/debug/usr/lib/modules/3.10.0-1127.el7.x86_64/vmlinux
DUMPFILE: 127.0.0.1-2020-10-16-08:16:50/vmcore [PARTIAL DUMP]
CPUS: 128
DATE: Fri Oct 16 08:16:36 2020
UPTIME: 15 days, 21:50:38
LOAD AVERAGE: 35.63, 13.17, 9.05
TASKS: 3036
NODENAME: XXXXXXXX
RELEASE: 3.10.0-1127.el7.x86_64
VERSION: #1 SMP Tue Mar 31 23:36:51 UTC 2020
MACHINE: x86_64 (2499 Mhz)
MEMORY: 255.9 GB
PANIC: "kernel BUG at mm/page_alloc.c:1656!"
PID: 29508
COMMAND: "worker"
TASK: ffff9a02987462a0 [THREAD_INFO: ffff99f4b8c74000]
CPU: 65
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 29508 TASK: ffff9a02987462a0 CPU: 65 COMMAND: "worker"
#0 [ffff99f4b8c77030] machine_kexec at ffffffff81866044
#1 [ffff99f4b8c77090] __crash_kexec at ffffffff81922ee2
#2 [ffff99f4b8c77160] crash_kexec at ffffffff81922fd0
#3 [ffff99f4b8c77178] oops_end at ffffffff81f8a798
#4 [ffff99f4b8c771a0] die at ffffffff81830a7b
#5 [ffff99f4b8c771d0] do_trap at ffffffff81f89ee0
#6 [ffff99f4b8c77220] do_invalid_op at ffffffff8182d2a4
#7 [ffff99f4b8c772d0] invalid_op at ffffffff81f9622e
[exception RIP: move_freepages+350]
RIP: ffffffff819c458e RSP: ffff99f4b8c77380 RFLAGS: 00010006
RAX: ffff9a024f359000 RBX: ffffe20ca13f8000 RCX: 0000000000000001
RDX: ffff9a024f35a000 RSI: 0000000000000000 RDI: ffff9a024f35a000
RBP: ffff99f4b8c773d0 R8: 000000000204f380 R9: 000000000184ffff
R10: ffffe20ca13fffc0 R11: ffffffffffffffff R12: 0000000000000001
R13: 0000000000000001 R14: ffff9a024f35a0f8 R15: ffffe20ca13fffc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff99f4b8c773d8] move_freepages_block at ffffffff819c4603
#9 [ffff99f4b8c773e8] __rmqueue at ffffffff819c6024
#10 [ffff99f4b8c77458] get_page_from_freelist at ffffffff819c874c
#11 [ffff99f4b8c77570] __alloc_pages_slowpath at ffffffff81f7b0c2
#12 [ffff99f4b8c77660] __alloc_pages_nodemask at ffffffff819c9146
#13 [ffff99f4b8c77708] alloc_pages_current at ffffffff81a18e18
#14 [ffff99f4b8c77750] new_slab at ffffffff81a270f3
#15 [ffff99f4b8c77790] ___slab_alloc at ffffffff81a2760c
#16 [ffff99f4b8c77868] __slab_alloc at ffffffff81f7c884
#17 [ffff99f4b8c778a8] kmem_cache_alloc at ffffffff81a287cb
#18 [ffff99f4b8c778e8] alloc_buffer_head at ffffffff81a85701
#19 [ffff99f4b8c77900] alloc_page_buffers at ffffffff81a85d4a
#20 [ffff99f4b8c77940] create_empty_buffers at ffffffff81a866be
#21 [ffff99f4b8c77968] create_page_buffers at ffffffff81a867d7
#22 [ffff99f4b8c77980] __block_write_begin_int at ffffffff81a87e9f
#23 [ffff99f4b8c77a40] __block_write_begin at ffffffff81a88471
#24 [ffff99f4b8c77a50] ext4_da_write_begin at ffffffffc03a4707 [ext4]
#25 [ffff99f4b8c77ad8] generic_file_buffered_write at ffffffff819bdfef
#26 [ffff99f4b8c77b90] __generic_file_aio_write at ffffffff819c0872
#27 [ffff99f4b8c77c10] generic_file_aio_write at ffffffff819c0ae9
#28 [ffff99f4b8c77c50] ext4_file_write at ffffffffc03995c8 [ext4]
#29 [ffff99f4b8c77d28] do_sync_readv_writev at ffffffff81a4c72b
#30 [ffff99f4b8c77e00] do_readv_writev at ffffffff81a4e31e
#31 [ffff99f4b8c77ef0] vfs_writev at ffffffff81a4e545
#32 [ffff99f4b8c77f00] sys_pwritev at ffffffff81a4e942
#33 [ffff99f4b8c77f50] tracesys at ffffffff81f93166 (via system_call)
RIP: 00007fffde88357b RSP: 00007ffec34779f0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000555557a91bd0 RCX: ffffffffffffffff
RDX: 00000000000000fb RSI: 00005555579b7000 RDI: 0000000000000017
RBP: 0000555557144470 R8: 0000000000000000 R9: 00000000ffffffff
R10: 00000005b08f8000 R11: 0000000000000293 R12: 000055555840a930
R13: 00005555571444d8 R14: 0000000000000000 R15: 00007ffec347b700
ORIG_RAX: 0000000000000128 CS: 0033 SS: 002b



[1374695.158571] ------------[ cut here ]------------
[1374695.159333] kernel BUG at mm/page_alloc.c:1656!
[1374695.159920] invalid opcode: 0000 [#1] SMP
[1374695.160488] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache veth vhost_net vhost macvtap macvlan xt_nat xt_CHECKSUM iptable_mangle nf_conntrack_netlink ipt_MASQUERADE nfnetlink nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c tun br_netfilter bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter overlay(T) bonding amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb ptp drm_panel_orientation_quirks pps_core joydev sg i2c_algo_bit dca i2c_piix4 ipmi_si
[1374695.164473] k10temp ipmi_devintf ipmi_msghandler pinctrl_amd i2c_designware_platform i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ahci libahci libata nvme nvme_core nfit libnvdimm
[1374695.166680] CPU: 65 PID: 29508 Comm: worker Kdump: loaded Tainted: G ------------ T 3.10.0-1127.el7.x86_64 #1
[1374695.167446] Hardware name: Supermicro AS -2124BT-HTR/H12DST-B, BIOS 1.1 01/10/2020
[1374695.168221] task: ffff9a02987462a0 ti: ffff99f4b8c74000 task.ti: ffff99f4b8c74000
[1374695.169001] RIP: 0010:[<ffffffff819c458e>] [<ffffffff819c458e>] move_freepages+0x15e/0x160
[1374695.169795] RSP: 0018:ffff99f4b8c77380 EFLAGS: 00010006
[1374695.170584] RAX: ffff9a024f359000 RBX: ffffe20ca13f8000 RCX: 0000000000000001
[1374695.171383] RDX: ffff9a024f35a000 RSI: 0000000000000000 RDI: ffff9a024f35a000
[1374695.172180] RBP: ffff99f4b8c773d0 R08: 000000000204f380 R09: 000000000184ffff
[1374695.172979] R10: ffffe20ca13fffc0 R11: ffffffffffffffff R12: 0000000000000001
[1374695.173777] R13: 0000000000000001 R14: ffff9a024f35a0f8 R15: ffffe20ca13fffc0
[1374695.174577] FS: 00007ffec347b700(0000) GS:ffff9a01cea40000(0000) knlGS:0000000000000000
[1374695.175387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1374695.176197] CR2: 00007f1d3b0ba008 CR3: 0000001f40ef6000 CR4: 0000000000340fe0
[1374695.177015] Call Trace:
[1374695.177836] [<ffffffff819c4603>] move_freepages_block+0x73/0x80
[1374695.178666] [<ffffffff819c6024>] __rmqueue+0x264/0x460
[1374695.179488] [<ffffffff819c874c>] get_page_from_freelist+0x4dc/0xaa0
[1374695.180315] [<ffffffff81f7b0c2>] __alloc_pages_slowpath+0x2dd/0x729
[1374695.181138] [<ffffffff819c9146>] __alloc_pages_nodemask+0x436/0x450
[1374695.181961] [<ffffffff81a18e18>] alloc_pages_current+0x98/0x110
[1374695.182782] [<ffffffff81a270f3>] new_slab+0x393/0x4e0
[1374695.183599] [<ffffffff81a2760c>] ___slab_alloc+0x3cc/0x520
[1374695.184422] [<ffffffff81a85701>] ? alloc_buffer_head+0x21/0x60
[1374695.185242] [<ffffffff81f7b0c2>] ? __alloc_pages_slowpath+0x2dd/0x729
[1374695.186058] [<ffffffff818e8ae4>] ? find_busiest_group+0x144/0x990
[1374695.186879] [<ffffffff81a85701>] ? alloc_buffer_head+0x21/0x60
[1374695.187678] [<ffffffff81f7c884>] __slab_alloc+0x40/0x5c
[1374695.188482] [<ffffffff81a287cb>] kmem_cache_alloc+0x19b/0x1f0
[1374695.189284] [<ffffffff81a85701>] ? alloc_buffer_head+0x21/0x60
[1374695.190069] [<ffffffff81a85701>] alloc_buffer_head+0x21/0x60
[1374695.190840] [<ffffffff81a3e202>] ? __mem_cgroup_commit_charge+0xe2/0x2f0
[1374695.191619] [<ffffffff81a85d4a>] alloc_page_buffers+0x3a/0xc0
[1374695.192394] [<ffffffff81a866be>] create_empty_buffers+0x1e/0xf0
[1374695.193166] [<ffffffff81a867d7>] create_page_buffers+0x47/0x50
[1374695.193937] [<ffffffff81a87e9f>] __block_write_begin_int+0x8f/0x650
[1374695.194716] [<ffffffff81a287f2>] ? kmem_cache_alloc+0x1c2/0x1f0
[1374695.195499] [<ffffffffc039da00>] ? ext4_da_invalidatepage+0x320/0x320 [ext4]
[1374695.196284] [<ffffffffc03a46b9>] ? ext4_da_write_begin+0x119/0x360 [ext4]
[1374695.197061] [<ffffffff81a88471>] __block_write_begin+0x11/0x20
[1374695.197833] [<ffffffffc03a4707>] ext4_da_write_begin+0x167/0x360 [ext4]
[1374695.198578] [<ffffffff819bdfef>] generic_file_buffered_write+0x10f/0x270
[1374695.199307] [<ffffffff819c0872>] __generic_file_aio_write+0x1e2/0x400
[1374695.200014] [<ffffffff819c0ae9>] generic_file_aio_write+0x59/0xa0
[1374695.200707] [<ffffffffc03995c8>] ext4_file_write+0x348/0x600 [ext4]
[1374695.201380] [<ffffffff81b8f894>] ? timerqueue_del+0x24/0x70
[1374695.202030] [<ffffffff818ca52f>] ? __remove_hrtimer+0x3f/0xb0
[1374695.202658] [<ffffffff818cacc8>] ? hrtimer_try_to_cancel+0xd8/0x120
[1374695.203271] [<ffffffff818cad2a>] ? hrtimer_cancel+0x1a/0x30
[1374695.203858] [<ffffffff81a4c72b>] do_sync_readv_writev+0x7b/0xd0
[1374695.204429] [<ffffffff81a4e31e>] do_readv_writev+0xce/0x260
[1374695.204994] [<ffffffffc0399280>] ? ext4_write_checks.isra.8+0x150/0x150 [ext4]
[1374695.205553] [<ffffffff81a4c5d0>] ? do_sync_read+0xe0/0xe0
[1374695.206103] [<ffffffff8195776b>] ? __seccomp_filter+0x5b/0x300
[1374695.206653] [<ffffffff81a4e545>] vfs_writev+0x35/0x60
[1374695.207194] [<ffffffff81a4e942>] SyS_pwritev+0xc2/0xf0
[1374695.207733] [<ffffffff81f93166>] tracesys+0xa6/0xcc
[1374695.208267] Code: f5 a0 b4 55 82 e9 26 ff ff ff 0f 1f 00 48 89 d0 48 c1 ea 29 48 c1 e8 36 81 e2 00 18 00 00 48 03 14 c5 a0 b4 55 82 e9 e4 fe ff ff <0f> 0b 66 66 66 66 90 4c 8b 05 54 16 a8 00 48 89 f0 4c 29 c0 48
[1374695.209461] RIP [<ffffffff819c458e>] move_freepages+0x15e/0x160
[1374695.210020] RSP <ffff99f4b8c77380>




We have a number of servers, 50% have been patched to 3.10.0-1127.19.1.el7.x86_64, this happened on one still on 3.10.0-1127.el7.x86_64. Im hopeful but still looking for a resolution.

One thing i have noticed is the network was at 100% receive 1Gbit at the time in monitoring, this to me points at the network buffer issue found here https://bugs.centos.org/view.php?id=17369

User avatar
TrevorH
Forum Moderator
Posts: 29695
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: 7.7 KVM Server crashing

Post by TrevorH » 2020/10/16 18:33:52

Your stacktrace is full of mentions of ext4 which makes it far more likely that it's a filesystem problem than a network one.
CentOS 6 will die in November 2020 - migrate sooner rather than later!
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 is dead, do not use it.
Full time Geek, part time moderator. Use the FAQ Luke

errors301
Posts: 6
Joined: 2020/10/14 18:25:02

Re: 7.7 KVM Server crashing

Post by errors301 » 2020/10/19 15:12:13

It is true that ext4 FS is mentioned. We had another one today on a kernel patched box. This time our monitoring is showing no real network traffic at the time.

Crash dump below for latest incident today. Not sure how to mitigate this or investigate further. Seems like only option is try another kernel.

# uname -r
3.10.0-1127.19.1.el7.x86_64


[350816.734253] ------------[ cut here ]------------
[350816.734828] kernel BUG at mm/page_alloc.c:1656!
[350816.735304] invalid opcode: 0000 [#1] SMP
[350816.735763] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 tun xt_nat veth xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter overlay(T) bonding amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm drm_panel_orientation_quirks joydev sg ipmi_si i2c_piix4 k10temp ipmi_devintf ipmi_msghandler pinctrl_amd i2c_designware_platform i2c_designware_core acpi_cpufreq
[350816.739137] nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci igb libahci libata ptp nvme pps_core nvme_core i2c_algo_bit dca nfit libnvdimm
[350816.741001] CPU: 25 PID: 115405 Comm: worker Kdump: loaded Tainted: G ------------ T 3.10.0-1127.19.1.el7.x86_64 #1
[350816.742292] Hardware name: Supermicro AS -2124BT-HTR/H12DST-B, BIOS 1.1 01/10/2020
[350816.743070] task: ffff8deac5270000 ti: ffff8ded562cc000 task.ti: ffff8ded562cc000
[350816.743750] RIP: 0010:[<ffffffff8e5c463e>] [<ffffffff8e5c463e>] move_freepages+0x15e/0x160
[350816.744446] RSP: 0018:ffff8ded562cf470 EFLAGS: 00010006
[350816.745128] RAX: ffff8e0a0f359000 RBX: fffff584013f8000 RCX: 0000000000000001
[350816.745819] RDX: ffff8e0a0f35a000 RSI: 0000000000000000 RDI: ffff8e0a0f35a000
[350816.746525] RBP: ffff8ded562cf4c0 R08: 000000000204f380 R09: 000000000104ffff
[350816.747240] R10: fffff584013fffc0 R11: fffff58437e193c0 R12: 0000000000000001
[350816.747931] R13: 0000000000000003 R14: ffff8e0a0f35a1c8 R15: fffff584013fffc0
[350816.748638] FS: 00007ffeb1ee4700(0000) GS:ffff8e098e840000(0000) knlGS:0000000000000000
[350816.749340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[350816.750116] CR2: 00007fdae3040018 CR3: 0000001f72712000 CR4: 0000000000340fe0
[350816.750940] Call Trace:
[350816.751755] [<ffffffff8e5c46b3>] move_freepages_block+0x73/0x80
[350816.752580] [<ffffffff8e5c60d4>] __rmqueue+0x264/0x460
[350816.753404] [<ffffffff8e5c87fc>] get_page_from_freelist+0x4dc/0xaa0
[350816.754231] [<ffffffff8e5c8f26>] __alloc_pages_nodemask+0x166/0x450
[350816.755056] [<ffffffff8e618ea8>] alloc_pages_current+0x98/0x110
[350816.755886] [<ffffffff8e627183>] new_slab+0x393/0x4e0
[350816.756706] [<ffffffff8e62769c>] ___slab_alloc+0x3cc/0x520
[350816.757525] [<ffffffff8e685751>] ? alloc_buffer_head+0x21/0x60
[350816.758255] [<ffffffff8e685751>] ? alloc_buffer_head+0x21/0x60
[350816.759016] [<ffffffff8eb7c8a4>] __slab_alloc+0x40/0x5c
[350816.759828] [<ffffffff8e62885b>] kmem_cache_alloc+0x19b/0x1f0
[350816.760634] [<ffffffff8e685751>] ? alloc_buffer_head+0x21/0x60
[350816.761437] [<ffffffff8e685751>] alloc_buffer_head+0x21/0x60
[350816.762239] [<ffffffff8e685d9a>] alloc_page_buffers+0x3a/0xc0
[350816.763038] [<ffffffff8e68670e>] create_empty_buffers+0x1e/0xf0
[350816.763841] [<ffffffff8e686827>] create_page_buffers+0x47/0x50
[350816.764631] [<ffffffff8e687eef>] __block_write_begin_int+0x8f/0x650
[350816.765421] [<ffffffff8e628882>] ? kmem_cache_alloc+0x1c2/0x1f0
[350816.766211] [<ffffffffc037ea00>] ? ext4_da_invalidatepage+0x320/0x320 [ext4]
[350816.767001] [<ffffffffc03855e9>] ? ext4_da_write_begin+0x119/0x360 [ext4]
[350816.767790] [<ffffffff8e6884c1>] __block_write_begin+0x11/0x20
[350816.768573] [<ffffffffc0385637>] ext4_da_write_begin+0x167/0x360 [ext4]
[350816.769349] [<ffffffff8e5be09f>] generic_file_buffered_write+0x10f/0x270
[350816.770126] [<ffffffff8e5c0922>] __generic_file_aio_write+0x1e2/0x400
[350816.770888] [<ffffffff8e5c0b99>] generic_file_aio_write+0x59/0xa0
[350816.771635] [<ffffffffc037a5c8>] ext4_file_write+0x348/0x600 [ext4]
[350816.772359] [<ffffffff8e78f994>] ? timerqueue_del+0x24/0x70
[350816.773061] [<ffffffff8e4ca52f>] ? __remove_hrtimer+0x3f/0xb0
[350816.773743] [<ffffffff8e4cacc8>] ? hrtimer_try_to_cancel+0xd8/0x120
[350816.774405] [<ffffffff8e4cad2a>] ? hrtimer_cancel+0x1a/0x30
[350816.775045] [<ffffffff8e64c7db>] do_sync_readv_writev+0x7b/0xd0
[350816.775666] [<ffffffff8e64e3ce>] do_readv_writev+0xce/0x260
[350816.776269] [<ffffffffc037a280>] ? ext4_write_checks.isra.8+0x150/0x150 [ext4]
[350816.776861] [<ffffffff8e64c680>] ? do_sync_read+0xe0/0xe0
[350816.777433] [<ffffffff8e55770b>] ? __seccomp_filter+0x5b/0x300
[350816.778001] [<ffffffff8e64e5f5>] vfs_writev+0x35/0x60
[350816.778553] [<ffffffff8e64e9f2>] SyS_pwritev+0xc2/0xf0
[350816.779211] [<ffffffff8eb93166>] tracesys+0xa6/0xcc
[350816.779740] Code: f5 60 b5 15 8f e9 26 ff ff ff 0f 1f 00 48 89 d0 48 c1 ea 29 48 c1 e8 36 81 e2 00 18 00 00 48 03 14 c5 60 b5 15 8f e9 e4 fe ff ff <0f> 0b 66 66 66 66 90 4c 8b 05 a4 15 a8 00 48 89 f0 4c 29 c0 48
[350816.780937] RIP [<ffffffff8e5c463e>] move_freepages+0x15e/0x160
[350816.781507] RSP <ffff8ded562cf470>


crash 7.2.3-10.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [212MB]: patching 87187 gdb minimal_symbol values

KERNEL: /usr/lib/debug/usr/lib/modules/3.10.0-1127.19.1.el7.x86_64/vmlinux
DUMPFILE: 127.0.0.1-2020-10-19-13:19:43/vmcore [PARTIAL DUMP]
CPUS: 128
DATE: Mon Oct 19 13:19:10 2020
UPTIME: 4 days, 01:26:40
LOAD AVERAGE: 15.67, 13.69, 17.66
TASKS: 2649
NODENAME: XXXXXXXXXXXXXXXX
RELEASE: 3.10.0-1127.19.1.el7.x86_64
VERSION: #1 SMP Tue Aug 25 17:23:54 UTC 2020
MACHINE: x86_64 (2500 Mhz)
MEMORY: 255.9 GB
PANIC: "kernel BUG at mm/page_alloc.c:1656!"
PID: 115405
COMMAND: "worker"
TASK: ffff8deac5270000 [THREAD_INFO: ffff8ded562cc000]
CPU: 25
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 115405 TASK: ffff8deac5270000 CPU: 25 COMMAND: "worker"
#0 [ffff8ded562cf120] machine_kexec at ffffffff8e466254
#1 [ffff8ded562cf180] __crash_kexec at ffffffff8e522ef2
#2 [ffff8ded562cf250] crash_kexec at ffffffff8e522fe0
#3 [ffff8ded562cf268] oops_end at ffffffff8eb8a798
#4 [ffff8ded562cf290] die at ffffffff8e430a7b
#5 [ffff8ded562cf2c0] do_trap at ffffffff8eb89ee0
#6 [ffff8ded562cf310] do_invalid_op at ffffffff8e42d2a4
#7 [ffff8ded562cf3c0] invalid_op at ffffffff8eb9622e
[exception RIP: move_freepages+350]
RIP: ffffffff8e5c463e RSP: ffff8ded562cf470 RFLAGS: 00010006
RAX: ffff8e0a0f359000 RBX: fffff584013f8000 RCX: 0000000000000001
RDX: ffff8e0a0f35a000 RSI: 0000000000000000 RDI: ffff8e0a0f35a000
RBP: ffff8ded562cf4c0 R8: 000000000204f380 R9: 000000000104ffff
R10: fffff584013fffc0 R11: fffff58437e193c0 R12: 0000000000000001
R13: 0000000000000003 R14: ffff8e0a0f35a1c8 R15: fffff584013fffc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff8ded562cf4c8] move_freepages_block at ffffffff8e5c46b3
#9 [ffff8ded562cf4d8] __rmqueue at ffffffff8e5c60d4
#10 [ffff8ded562cf548] get_page_from_freelist at ffffffff8e5c87fc
#11 [ffff8ded562cf660] __alloc_pages_nodemask at ffffffff8e5c8f26
#12 [ffff8ded562cf708] alloc_pages_current at ffffffff8e618ea8
#13 [ffff8ded562cf750] new_slab at ffffffff8e627183
#14 [ffff8ded562cf790] ___slab_alloc at ffffffff8e62769c
#15 [ffff8ded562cf868] __slab_alloc at ffffffff8eb7c8a4
#16 [ffff8ded562cf8a8] kmem_cache_alloc at ffffffff8e62885b
#17 [ffff8ded562cf8e8] alloc_buffer_head at ffffffff8e685751
#18 [ffff8ded562cf900] alloc_page_buffers at ffffffff8e685d9a
#19 [ffff8ded562cf940] create_empty_buffers at ffffffff8e68670e
#20 [ffff8ded562cf968] create_page_buffers at ffffffff8e686827
#21 [ffff8ded562cf980] __block_write_begin_int at ffffffff8e687eef
#22 [ffff8ded562cfa40] __block_write_begin at ffffffff8e6884c1
#23 [ffff8ded562cfa50] ext4_da_write_begin at ffffffffc0385637 [ext4]
#24 [ffff8ded562cfad8] generic_file_buffered_write at ffffffff8e5be09f
#25 [ffff8ded562cfb90] __generic_file_aio_write at ffffffff8e5c0922
#26 [ffff8ded562cfc10] generic_file_aio_write at ffffffff8e5c0b99
#27 [ffff8ded562cfc50] ext4_file_write at ffffffffc037a5c8 [ext4]
#28 [ffff8ded562cfd28] do_sync_readv_writev at ffffffff8e64c7db
#29 [ffff8ded562cfe00] do_readv_writev at ffffffff8e64e3ce
#30 [ffff8ded562cfef0] vfs_writev at ffffffff8e64e5f5
#31 [ffff8ded562cff00] sys_pwritev at ffffffff8e64e9f2
#32 [ffff8ded562cff50] tracesys at ffffffff8eb93166 (via system_call)
RIP: 00007fffde88357b RSP: 00007ffeb1ee09f0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000555557fc9c20 RCX: ffffffffffffffff
RDX: 0000000000000058 RSI: 0000555558098398 RDI: 0000000000000017
RBP: 0000555557144470 R8: 0000000000000000 R9: 00000000ffffffff
R10: 000000055884f000 R11: 0000000000000293 R12: 0000555557140690
R13: 00005555571444d8 R14: 0000000000000000 R15: 00007ffeb1ee4700
ORIG_RAX: 0000000000000128 CS: 0033 SS: 002b

errors301
Posts: 6
Joined: 2020/10/14 18:25:02

Re: 7.7 KVM Server crashing

Post by errors301 » 2020/10/19 15:13:08

Thanks for you help TrevorH, looking at the forums you are a one man army! :)

User avatar
TrevorH
Forum Moderator
Posts: 29695
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: 7.7 KVM Server crashing

Post by TrevorH » 2020/10/19 16:37:32

What I'd do is wait a day or two and hope that 7.9 has hit the 7.8 CR repo by then and then update to the latest 7.9 kernel (31.0.0.1160*) and see if it helps. If it doesn't then it looks like time to raise a ticket on bugzilla.redhat.com to report the problem, reference the xfs bz we looked at earlier.
CentOS 6 will die in November 2020 - migrate sooner rather than later!
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 is dead, do not use it.
Full time Geek, part time moderator. Use the FAQ Luke

errors301
Posts: 6
Joined: 2020/10/14 18:25:02

Re: 7.7 KVM Server crashing

Post by errors301 » 2020/10/22 09:56:25

I see the new kernel is out now! Trying on one host then upgrading 50% of the cluster.

# yum check-update --disablerepo=\* --enablerepo=cr kernel
Loaded plugins: fastestmirror, priorities, versionlock
Loading mirror speeds from cached hostfile
cr | 2.9 kB 00:00:00
cr/7/x86_64/primary_db | 3.5 MB 00:00:00

kernel.x86_64 3.10.0-1160.2.2.el7 cr

Post Reply

Return to “CentOS 7 - General Support”