User space process coredumps via systemd-coredump

Issues related to applications and software problems
Post Reply
gooditot
Posts: 1
Joined: 2023/01/06 03:29:49

User space process coredumps via systemd-coredump

Post by gooditot » 2023/01/06 04:14:23

Dears,

On centos 7.9, i have configured the system to generate coredumps via systemd-coredump.
For test apps with coredump sizes of few GBs, this works fine. But for larger coredumps (> 60GB), there is no coredump file left in the system at the end, even though there is sufficient disk space left.

Here is what has been done:

Configure coredump via systemd-coredump
$ sysctl -n kernel.core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %e

Symlinked /var/lib/systemd/coredump to another directory that has space to store coredumps
$ ls -al /var/lib/systemd/coredump
lrwxrwxrwx 1 root root 24 Dec 23 09:24 /var/lib/systemd/coredump -> /export/content/coredump
$ ls -al /export/content/coredump
total 8
drwxr-xr-x 2 root root 4096 Jan 6 03:16 .
drwxr-xr-x. 21 root root 4096 Dec 23 09:24 ..
$ df -h /export/content/coredump
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 845G 87G 715G 11% /export

Editing /etc/security/limits.conf to contain unlimited size for core files for all users
$ cat /etc/security/limits.conf
...
#<domain> <type> <item> <value>
* - core unlimited
* - nofile 40000

Configure systemd-coredump via /etc/systemd/coredump.conf
$ cat /etc/systemd/coredump.conf
[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=250G
ExternalSizeMax=infinity
#JournalSizeMax=767M
MaxUse=500G
#KeepFree=

Also did a 'sudo systemctl daemon-reload' after the above edits.

What I observe is

if the process has a Virtual memory size > 60GB reported either via top or in the total field output of 'sudo pmap <process-id>' e.g.
$ sudo pmap 24833
24833: /opt/local/bin/app1 ...(app start command options)....
0000000000400000 21748K r-x-- app1
0000000001b3c000 276K r---- app1
0000000001b81000 32K rw--- app1
0000000001b89000 12788K rw--- [ anon ]
00007f7eaef80000 16785920K rw--- [ anon ]
00007f82af880000 29136384K rw--- [ anon ]
00007f89a1f00000 7996416K rw--- [ anon ]
...
00007f907cb80000 42170880K rw--- [ anon ]
00007f9a8ab80000 9519616K rw--- [ anon ]
00007f9ccfd00000 42415104K rw--- [ anon ]
...
00007fa9e34ae000 1808K r-x-- libc-2.17.so
00007fa9e3672000 2044K ----- libc-2.17.so
...
00007fff439d4000 4K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 181228584K

when killing the process
$ sudo kill -s SIGSEGV 24833

I do see a corefile being created in /export/content/coredump that keeps increasing in size (running a 'ls -lh' & 'df' for /export/content/coredump in a loop) and then it disappears
e.g.

10
total 161G
drwxr-xr-x 2 root root 4.0K Jan 5 03:09 .
drwxr-xr-x. 21 root root 4.0K Dec 23 09:24 ..
-rw-r----- 1 root root 161G Jan 5 03:15 .#core.app1.2147483006.67f39a0339634816bd4273a13898da75.24833.16728881820000003db0edf91876626d
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 845G 248G 554G 31% /export
11
total 170G
drwxr-xr-x 2 root root 4.0K Jan 5 03:09 .
drwxr-xr-x. 21 root root 4.0K Dec 23 09:24 ..
-rw-r----- 1 root root 170G Jan 5 03:16 .#core.app1.2147483006.67f39a0339634816bd4273a13898da75.24833.16728881820000003db0edf91876626d
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 845G 256G 546G 32% /export
12
total 8.0K
drwxr-xr-x 2 root root 4.0K Jan 5 03:16 .
drwxr-xr-x. 21 root root 4.0K Dec 23 09:24 ..
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 845G 87G 715G 11% /export

Interestingly enough /var/log/messages does show the coredump being taken along with a stack trace

Jan 5 03:16:16 <serverXXXX> systemd-coredump: Process 24833 (app1 of user 2147483006 dumped core.#012#012Stack trace of thread 24833:#012#0 0x00007fa9e35a6e29 syscall (libc.so.6)#012#1 0x00007fa9e3e618e9 _ZNSt28__atomic_futex_unsigned_base19_M_futex_wait_untilEPjjbNSt6chrono8durationIlSt5ratioILl1ELl1EEEENS2_IlS3_ILl1ELl1000000000EEEE (libstdc++.so.6)#012#2 0x00000000004cd5a0 _ZNKSt14__basic_futureIiE13_M_get_resultEv (app1)#012#3 0x000000000048e9bc _ZNSt6futureIiE3getEv (app1)#012#4 0x0000000000486275
...

Further running `sudo coredumpctl list` will show the listing for the core file, but without a '*" indicating the coredump file is there.
Also `sudo journalctl` output will show the coredump event along with the stack traces of each of the process threads at the time of coredump.
But the coredump file itself is not preserved.

So what I suspect, is the the coredump is being fully taken, the stack traces being recorded, and then coredump file deleted for some reason.

Further i have also tried editing /etc/systemd/system.conf to specify infinity for DefaultLimitCORE, followed by 'sudo systemctl daemon-reexec'
$ cat /etc/systemd/system.conf
# This file is part of systemd.
#
#....

[Manager]
#LogLevel=info
#LogTarget=journal-or-kmsg
....
#DefaultLimitSTACK=
DefaultLimitCORE=infinity
#DefaultLimitRSS=
...

This does not change the behavior.

Appreciate any pointers on how to get the core file to persist for the apps when their virtual memory size grows to their typical sizes > 60GB.

Thanks

Post Reply