Analyzing "kernel: audit: auditd disappeared"s

Support for security such as Firewalls and securing linux
Post Reply
bern
Posts: 4
Joined: 2016/03/23 12:19:45

Analyzing "kernel: audit: auditd disappeared"s

Post by bern » 2016/03/23 12:56:26

Hello everyone, I'm having a problem with the audit system that I don't know very well, so I'ld be grateful if someone could point me to hints how to analyze it.

We have a platform running

Code: Select all

[bern@smtpout01 ~]$ uname -rmi
3.10.0-327.10.1.el7.x86_64 x86_64 x86_64
[bern@smtpout01 ~]$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core) 
[bern@smtpout01 ~]$ getenforce
Enforcing
on the hosts and the VMs with widely varying applications on them. A common factor is that all machines are being monitored, which means that there's a root cron job doing

Code: Select all

[root@smtpout01 ~]# grep '^[^#]*sbin' bin/Nagios-audit-State 
KERNELPARAMS=`/sbin/auditctl -s | sed -e 's/ /=/' -e 's/ .*//' | tr '\n' ' '`
AUDITFILES="`LANG=C LC_TIME=C TZ=UTC /sbin/aureport --input-logs -t | grep ' - '`"
EVENTS=`/sbin/aureport --event --summary --interpret --input /var/log/audit/audit.log.1 | \
every five minutes, plus a boatload of Nagios checks that individually SSH in and then go through "sudo" to provide a target-side control over what Nagios is allowed to execute.

The problem is that the running auditd gets disconnected from its input, the kernel, as per the following log lines:

Code: Select all

Mar 23 01:31:55 smtpout01 kernel: audit: *NO* daemon at audit_pid=3240
Mar 23 01:31:55 smtpout01 kernel: audit_log_lost: 186 callbacks suppressed
Mar 23 01:31:55 smtpout01 kernel: audit: audit_lost=962140 audit_rate_limit=0 audit_backlog_limit=320
Mar 23 01:31:55 smtpout01 kernel: audit: auditd disappeared
though the auditd process is, in fact, still present:

Code: Select all

[bern@smtpout01 ~]$ ps auwwwx | grep auditd
root       137  0.0  0.0      0     0 ?        S    Mar04   0:53 [kauditd]
bern       899  0.0  0.0 112648   960 pts/0    R+   12:18   0:00 grep --color=auto auditd
root      3240  0.0  0.0 116744  1572 ?        S<sl Mar22   0:05 /sbin/auditd -n
and was not rotating the log at the time.

Google pointed me at several sources from a couple years ago suggesting that the "disappeared" detection is in fact a bit fickle, but so far I could not find instructions how to verify that my systems do not have some other cause at work, or how often to expect this event to happen at random.

aks
Posts: 3073
Joined: 2014/09/20 11:22:14

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by aks » 2016/03/23 16:42:23

I don't know off the top of my head. DO you have many simultaneous connections to the audit system (perhaps many nagios servers checking at the same time)? Could it be a load/CPU/swapping issue? Could you simplify the audit query that nagios performs?
I take it auditd does recover by itself - do you loose any events?

bern
Posts: 4
Joined: 2016/03/23 12:19:45

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by bern » 2016/03/23 17:24:33

There's only one Nagios and the abovementioned root cron job is the only place where data is extracted from the audit system (to be fed back into Nagios, which is how I become aware of the auditd having been disconnected). For the opposite direction, the SSH logins and "sudo"s that Nagios does certainly do create a certain load on the audit system; as a ballpark figure, /var/log/audit/audit.log typically grows by ~5 lines/second.

However, the machines aren't under productive load yet and I don't see them being strapped for any resources I can think of (as in, avg 3.5% CPU, 2 IOPS on the /var FS, RAM is 600 MB free + 1.5 GB disk cache, ...).

I do not yet have a procedure to repair the problem other than to "kill -TERM" the still-present auditd and "systemctl start auditd" a new one, and "auditctl -s" counts the events occuring between problem and repair as "lost" - though they do get logged into /var/log/messages instead and, thus, technically aren't lost, I suppose.

Code: Select all

[bern@smtpout01 ~]$ sudo auditctl -s
enabled 1
flag 1
pid 0
rate_limit 0
backlog_limit 320
lost 1123105
backlog 0
loginuid_immutable 0 unlocked
[bern@smtpout01 ~]$ ps auwwwx | grep auditd
root       137  0.0  0.0      0     0 ?        S    Mar04   0:53 [kauditd]
bern       899  0.0  0.0 112648   960 pts/0    R+   12:18   0:00 grep --color=auto auditd
root      3240  0.0  0.0 116744  1572 ?        S<sl Mar22   0:05 /sbin/auditd -n
[bern@smtpout01 ~]$ sudo kill -TERM 3240
[bern@smtpout01 ~]$ ps auwwwx | grep auditd
root       137  0.0  0.0      0     0 ?        S    Mar04   0:53 [kauditd]
bern       981  0.0  0.0 112648   960 pts/0    R+   12:18   0:00 grep --color=auto auditd
[bern@smtpout01 ~]$ sudo systemctl start auditd
[bern@smtpout01 ~]$ ps auwwwx | grep auditd
root       137  0.0  0.0      0     0 ?        S    Mar04   0:53 [kauditd]
root      1061  1.0  0.0  51184  1612 ?        S<sl 12:18   0:00 /sbin/auditd -n
bern      1082  0.0  0.0 112648   956 pts/0    R+   12:18   0:00 grep --color=auto auditd
[bern@smtpout01 ~]$ sudo auditctl -s
enabled 1
flag 1
pid 1061
rate_limit 0
backlog_limit 320
lost 1123414
backlog 0
loginuid_immutable 0 unlocked
Two things I forgot to mention in the initial post:
-- The entire Nagios & audit-scanning cron job setup was copied from another platform running CentOS 6, which does not exhibit the problem.
-- Most of the machines have /etc/audit/auditd.conf edited from the CentOS default so as to "num_logs = 99" and, where that doesn't suffice to provide one week of logs, have "max_log_file" raised appropriately (but as I said, the problem doesn't occur at log rotation).

aks
Posts: 3073
Joined: 2014/09/20 11:22:14

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by aks » 2016/03/24 16:43:00

So something is "gumming" up the audit report?
Does it run okay as that cron job user, in the foreground at the command line (you didn't post the whole command line) ?
I'm not aware of any major changes in auditd from 6 to 7.
As another guess, could it be a SELinux problem (try running in Permissive mode)?

bern
Posts: 4
Joined: 2016/03/23 12:19:45

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by bern » 2016/03/24 20:01:23

aks wrote:So something is "gumming" up the audit report?
Define "audit report", as I said, I'm not too versed with the entire auditing system ... I do not have any indication that there's anything wrong besides the kernel (wrongly) assuming that the auditd went away, and redirecting the data away from the auditd (thus /var/log/audit/) and into the normal syslog (specifically, /var/log/messages).

In particular, the cron job does not run into any actual problems running "auditctl" and "aureport", there's just the info that the kernel forgot about the running auditd process in the former's output and no incoming new data to be seen in the latter (because it analyzes the files in /var/log/audit/).
aks wrote:As another guess, could it be a SELinux problem (try running in Permissive mode)?
Neither logfile contains an AVC logged anywhere near the time of the problem ...

[Checks Nagios logs for the multiple VMs that do still run in Permissive mode]
... and I'm afraid that those had their part of the problem, too.

aks
Posts: 3073
Joined: 2014/09/20 11:22:14

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by aks » 2016/03/28 17:49:38

Define "audit report"
man aureport

Code: Select all

I do not have any indication that there's anything wrong besides the kernel (wrongly) assuming that the auditd went away
I'm guessing the program did not respond to the kernel in the time the kernel "thinks" it should respond.
In particular, the cron job does not run into any actual problems running "auditctl" and "aureport"
Okay the script is correct, bit how long does it take?
Neither logfile contains an AVC logged anywhere near the time of the problem
Hopefullly no AVCs at all....

bern
Posts: 4
Joined: 2016/03/23 12:19:45

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by bern » 2016/03/30 16:20:37

aks wrote:
Define "audit report"
man aureport
The only "aureport" commands I use are the two in the cron job. More importantly, "aureport" works only on the files in /var/log/audit/, which is later than the kernel-to-auditd interface in the data flow IIUC ...
aks wrote:
I do not have any indication that there's anything wrong besides the kernel (wrongly) assuming that the auditd went away
I'm guessing the program did not respond to the kernel in the time the kernel "thinks" it should respond.
The VMs have two cores each, so it would be conceivable that the kernel and something else could lock the auditd out. I'ld be astonished to hear that the audit system requires a machine to have 3+ cores, though ...
aks wrote:
In particular, the cron job does not run into any actual problems running "auditctl" and "aureport"
Okay the script is correct, bit how long does it take?

Code: Select all

[root@smtpout01 ~]# time bin/Nagios-audit-State 

real    0m2.464s
user    0m2.203s
sys     0m0.280s
aks wrote:
Neither logfile contains an AVC logged anywhere near the time of the problem
Hopefullly no AVCs at all....
(A couple, from postfix trying to access mechanisms pertaining to local delivery, which we don't want to take place, anyway. But the last VM - of three so far this week - to repeat the problem is in Permissive mode, so ...)

aks
Posts: 3073
Joined: 2014/09/20 11:22:14

Re: Analyzing "kernel: audit: auditd disappeared"s

Post by aks » 2016/03/30 17:15:41

The only "aureport" commands I use are the two in the cron job.
Yes and isn't that where the problem is?
The VMs have two cores each, so it would be conceivable that the kernel and something else could lock the auditd out. I'ld be astonished to hear that the audit system requires a machine to have 3+ cores, though ...
It doesn't I have it quite happily working on a single core VM.
Permissive mode
Permissive mode should still post AVCs, it just won't act upon them.

If you like, post your cron job and I'll setup a nagios system and give it a go, but it would take me a couple of days....

Post Reply