ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Issues related to applications and software problems
Post Reply
haleakala269
Posts: 4
Joined: 2022/12/14 18:17:09

ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Post by haleakala269 » 2022/12/14 18:25:20

Hi Team,
we're using FreeIPA 4.6.8 on CentOS 7.9 to secure an hadoop cluster.
But, regularly, we have FreeIPA that is having trouble : We lost SLAPD.
Then production jobs are failing.

Here the logs we have :

/var/log/dirsrv/slapd-DATA-GPS/errors :

[10/Dec/2022:02:07:18.716900616 +0100] - ERR - libdb - BDB2520 /var/lib/dirsrv/slapd-DATA-GPS/db/log.0000009532: log file unreadable: Timer expired
[10/Dec/2022:02:07:24.713406478 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:07:25.712244912 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:07:25.721098652 +0100] - ERR - libdb - BDB0061 PANIC: Timer expired
[10/Dec/2022:02:07:26.695199448 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:07:49.714916490 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:07:51.691315167 +0100] - ERR - NSMMReplicationPlugin - changelog program - _cl5TrimFile - Failed to begin transaction; db error - -30973 BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
[10/Dec/2022:02:08:04.697071614 +0100] - CRIT - deadlock_threadmain - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery)
[10/Dec/2022:02:08:11.716912038 +0100] - ERR - _entryrdn_get_elem - Failed to position cursor at the key: C2: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:13.714014929 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:14.690723999 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:14.704784746 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:14.716276504 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:15.693782836 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:15.704032836 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:15.716094943 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:16.691143768 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:16.698865243 +0100] - ERR - entryrdn_index_read_ext - Failed to close cursor: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:16.712038126 +0100] - ERR - entryrdn_index_read_ext - Failed to make a cursor: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:16.723118384 +0100] - ERR - dn2entry_ext - Failed to get id for uid=admin,cn=users,cn=accounts,dc=data,dc=gps from entryrdn index (-30973)
[10/Dec/2022:02:08:17.695288503 +0100] - ERR - entryrdn_index_read_ext - Failed to make a cursor: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:17.709081100 +0100] - ERR - dn2entry_ext - Failed to get id for uid=admin,cn=users,cn=accounts,dc=data,dc=gps from entryrdn index (-30973)
[10/Dec/2022:02:08:18.691609144 +0100] - ERR - entryrdn_index_read_ext - Failed to make a cursor: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:18.703574454 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:18.718376566 +0100] - ERR - dn2entry_ext - Failed to get id for uid=admin,cn=users,cn=accounts,dc=data,dc=gps from entryrdn index (-30973)
[10/Dec/2022:02:08:19.701335563 +0100] - ERR - entryrdn_index_read_ext - Failed to make a cursor: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:20.689598420 +0100] - ERR - dn2entry_ext - Failed to get id for uid=admin,cn=users,cn=accounts,dc=data,dc=gps from entryrdn index (-30973)
[10/Dec/2022:02:08:20.697340414 +0100] - ERR - NSMMReplicationPlugin - changelog program - _cl5TrimFile - Failed to begin transaction; db error - -30973 BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
[10/Dec/2022:02:08:20.713649234 +0100] - ERR - dn2entry_ext - Failed to get id for uid=admin,cn=users,cn=accounts,dc=data,dc=gps from entryrdn index (-30973)
[10/Dec/2022:02:08:21.699048233 +0100] - ERR - entryrdn_index_read_ext - Failed to close cursor: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery(-30973)
[10/Dec/2022:02:08:22.704354449 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:23.689845685 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:23.700780870 +0100] - ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
[10/Dec/2022:02:08:24.713601675 +0100] - CRIT - deadlock_threadmain - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery)


grep -i ipa /var/log/messages | grep "Dec 10 02"

Dec 10 02:00:04 xxxxxx-vkb001 ns-slapd: [10/Dec/2022:02:00:04.911094133 +0100] - INFO - task_export_thread - Beginning export of 'ipaca'
Dec 10 02:00:16 xxxxxx-vkb001 ns-slapd: [10/Dec/2022:02:00:13.733393357 +0100] - INFO - ldbm_back_ldbm2ldif - export ipaca: Processed 167 entries (100%).
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: ipa-dnskeysyncd: ERROR syncrepl_poll: LDAP error ({'desc': "Can't contact LDAP server"})
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: Traceback (most recent call last):
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: File "/usr/libexec/ipa/ipa-dnskeysyncd", line 116, in <module>
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: while ldap_connection.syncrepl_poll(all=1, msgid=ldap_search):
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: File "/usr/lib64/python2.7/site-packages/ldap/syncrepl.py", line 348, in syncrepl_poll
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: add_intermediates=1, add_ctrls=1, all = 0
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 476, in result4
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 99, in _ldap_call
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: result = func(*args,**kwargs)
Dec 10 02:09:25 xxxxxx-vkb001 ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}
Dec 10 02:09:25 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service: main process exited, code=exited, status=1/FAILURE
Dec 10 02:09:25 xxxxxx-vkb001 systemd: Unit ipa-dnskeysyncd.service entered failed state.
Dec 10 02:09:25 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service failed.
Dec 10 02:10:25 xxxxxx-vkb001 named-pkcs11[28447]: Failed to get initial credentials (TGT) using principal 'DNS/xxxxxx-vkb001.data.gps' and keytab 'FILE:/etc/named.keytab' (Generic error (see e-text))
Dec 10 02:10:25 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service holdoff time over, scheduling restart.
Dec 10 02:10:25 xxxxxx-vkb001 systemd: Stopped IPA key daemon.
Dec 10 02:10:25 xxxxxx-vkb001 systemd: Started IPA key daemon.
Dec 10 02:10:28 xxxxxx-vkb001 ipa-dnskeysyncd: ipa-dnskeysyncd: CRITICAL Kerberos authentication failed: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (2529638972): Generic error (see e-text)
Dec 10 02:10:28 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service: main process exited, code=exited, status=1/FAILURE
Dec 10 02:10:28 xxxxxx-vkb001 systemd: Unit ipa-dnskeysyncd.service entered failed state.
Dec 10 02:10:28 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service failed.
Dec 10 02:11:25 xxxxxx-vkb001 named-pkcs11[28447]: Failed to get initial credentials (TGT) using principal 'DNS/xxxxxx-vkb001.data.gps' and keytab 'FILE:/etc/named.keytab' (Generic error (see e-text))
Dec 10 02:11:28 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service holdoff time over, scheduling restart.
Dec 10 02:11:28 xxxxxx-vkb001 systemd: Stopped IPA key daemon.
Dec 10 02:11:28 xxxxxx-vkb001 systemd: Started IPA key daemon.
Dec 10 02:11:30 xxxxxx-vkb001 ipa-dnskeysyncd: ipa-dnskeysyncd: CRITICAL Kerberos authentication failed: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (2529638972): Generic error (see e-text)
Dec 10 02:11:30 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service: main process exited, code=exited, status=1/FAILURE
Dec 10 02:11:30 xxxxxx-vkb001 systemd: Unit ipa-dnskeysyncd.service entered failed state.
Dec 10 02:11:30 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service failed.
Dec 10 02:12:25 xxxxxx-vkb001 named-pkcs11[28447]: Failed to get initial credentials (TGT) using principal 'DNS/xxxxxx-vkb001.data.gps' and keytab 'FILE:/etc/named.keytab' (Generic error (see e-text))
Dec 10 02:12:30 xxxxxx-vkb001 systemd: ipa-dnskeysyncd.service holdoff time over, scheduling restart.
Dec 10 02:12:30 xxxxxx-vkb001 systemd: Stopped IPA key daemon.
Dec 10 02:12:30 xxxxxx-vkb001 systemd: Started IPA key daemon.

=>
vkb001-slapd :: Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery)

Do you have already encountered this problem ?

Thanks in advance for your support.
Regards.
Éric

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Post by TrevorH » 2022/12/14 18:57:06

What sort of filesystem does /var/lib/dirsrv/slapd-DATA-GPS/db/log.0000009532 reside on?

Also:
we're using FreeIPA 4.6.8 on CentOS 7.9 to secure an hadoop cluster.
4.6.8 covers many releases, what is the output from rpm -q ipa-common ?
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

haleakala269
Posts: 4
Joined: 2022/12/14 18:17:09

Re: ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Post by haleakala269 » 2022/12/14 20:02:13

Hello,
/var filesystem is XFS.
And ip-common package is ipa-common-4.6.8-5.el7.centos.6.noarch

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Post by TrevorH » 2022/12/14 20:18:47

And ip-common package is ipa-common-4.6.8-5.el7.centos.6.noarch
That's more than a year old so you probably should update. There are about 63 lines in the rpm changelog since that version. None of those specifically call out the errors that you have but still a good idea to update.

Your BDB database is corrupt so you will need to fix that. I do not know ipa so have no idea how you should do that.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

haleakala269
Posts: 4
Joined: 2022/12/14 18:17:09

Re: ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Post by haleakala269 » 2022/12/14 20:42:59

OK. thanks.
As it's a production cluster, it's not so simple to upgrade but we'll plan to do it asap.
If anybody could advice how to run a database recovery, I'd be very interested ;-)

haleakala269
Posts: 4
Joined: 2022/12/14 18:17:09

Re: ipa-dnskeysyncd: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

Post by haleakala269 » 2022/12/15 09:14:28

Meanwhile the FreeIPA update, any idea about the problem that occured on production ?
And any workaround to apply ?

Post Reply