Page 1 of 1

pg_restore and disconnection

Posted: 2020/06/03 08:09:23
by thomasp
Hi all,

Here is the context :

Before centos 8 migration :

We have a PostgreSQL database (v 10) running on centos7 (aka DB-7).
we have a machine in centos 7 for refreshing the database in centos 7 (aka M-7)
We refresh thanks to pg_restore without any problem.


Now:

We still have DB-7.
we have a new machine in centos 8 for refresh the DB (aka M-8).
M-8 is the same than M-7 except M-8 is running on centos-8.

Now, refresh mostly failed and time to time a refresh succeed.
When it failed, we got this meesage :

Code: Select all

pg_restore: [archiver (db)] could not execute query: no connection to the server
This occurs after ~ 3h30 of running.

We have done this folowing tests :
- (on M-8) with iptables disabled (14 refresh succeed without a failed)
- (on M-8) we copied sysctl conf from M-7 to appy it on M-8 -> refresh failed
- (on M-8) we set net.ipv4.tcp_limit_output_bytes = 262144 --> net.ipv4.tcp_limit_output_bytes = 1048576 -> refresh failed
- (on M-8) we upgraded kernel from 4.18 to 5.6.14 -> failed (we restored kernel 4.18)
- (on M-8) There are no output filter -> refresh failed
- (on M-8) install/remove NetworkManager-config-server package -> refresh failed
- (on M-8) install/remove et Nework Manager -> refresh failed
- (on M-8) use tcp keepalive configured on pg_restore -> refresh failed

We are convinced it is related to the iptables on M-8 but we do not know what :-/

Anybody has an idea what is going on ?

Thanks in advance

Re: pg_restore and disconnection

Posted: 2020/06/03 09:07:17
by TrevorH
If it runs for 3.5 hours before it goes wrong then I'd guess your database server is crashing. Review your logs and see what's happening to it and fix whatever the problem is.

Re: pg_restore and disconnection

Posted: 2020/06/03 11:29:42
by thomasp
TrevorH wrote:
2020/06/03 09:07:17
If it runs for 3.5 hours before it goes wrong then I'd guess your database server is crashing. Review your logs and see what's happening to it and fix whatever the problem is.
Log on database simply says :

Code: Select all

LOG:  XX000: could not receive data from client: No route to host