Page 1 of 1

ssh connection issue

Posted: 2020/06/09 17:10:43
by phil.e
I have a user getting some weird ssh problem. She was just recently upgraded from CentOS 7.6 to 7.7. After the upgrade, she started having this issue.

When she tries to ssh to a remote site, using "ssh -Y user@remotesite", the connection hangs at one point and won't complete. It happens with several different sites she tries to connect to. However, once she reboots everything works correctly for the rest of the day, and the problem repeats the next morning - can't connect in the morning, reboot, everything works for the rest of the day.

I had her make a connection using -vvv to see what was happening under the hood. This is the stuff that came up just before it hung (the system isn't internet connected so I had to hand jam this in)

debug3: record_hostkey: found key type RSA in file /home/user/.ssh/known_hosts:50
debug3: load_hostkeys: loaded 1 key from <remote-host>
debug3: hostkeys_foreach: reading file /home/user/.ssh/known_hosts
debug3: record_hostkey: found key type RSA in file /home/user/.ssh/known_hosts:50
debug3: load_hostkeys: loaded 1 keys from <ip address>
debug1: host '<remote-host>' is known and matches the RSA host key
debug1: Found key in /home/user/.ssh/known_hosts:50
debug3: send packet: type 21
debug2: set_newkeys: mode 1
debug1: rekey after 4294967296 block
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: receive packet: type 21
debug2: set_newkeys: mode 0
debug1: rekey after 4294967296 blocks
debug1: SSH2_MSG_NEWKEYS received

After this point is where the connection hangs. GSSAPI authentication is disabled in both ssh_config and sshd_config

In the 2nd log, where the connection succeeds, just after this point, it starts checking user public keys for publickey login:

debug2: key: /home/user/.ssh/id_rsa (0x452b2450ac20), agent
debug2: key: /home/user/.ssh/id_dsa ((nil))
debug2: key: /home/user/.ssh/id_ecdsa ((nil))
debug2: key: /home/user/.ssh/id_ed25519 ((nil))
debug3: send packet: type 5
debug receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<rsa-sha2-256,rsa-sha2-512>

The user has just an id_rsa publickey - that gets accepted several lines down and then it goes through

Not sure if it's relevant or not but after the upgrade, the user had their own local host in the ~/.ssh/known_hosts files but the key was from the previous install, so I deleted it and ssh-ed to localhost against to get the current key installed to known_hosts.

Home drives are on a remote host, so all of their home directory data stays the same between upgrades.

Sorry for the incompleteness of the logs - the log file is too long to have to hand jam it into this post.

Re: ssh connection issue

Posted: 2020/06/11 08:47:57
by afewgoodman

How about regenerate key with ssh-keygen -R username@server_address?


Re: ssh connection issue

Posted: 2020/06/11 17:42:20
by phil.e
Well, the odd thing is that if the user reboots, they don't have any issues using the same key to get in - that would seem to indicate there's nothing wrong with the public key

Re: ssh connection issue

Posted: 2020/06/12 14:41:39
by phil.e
Here's some additional info that might be relevant.

/var/log/messages is filled with event logs referring to syscall=2 and syscall=87, which is "open" and "unlink" respectively. There's 50k - 60k lines in /var/log/messages.

We have fairly heavy duty auditing requirements that have caused various problems in the past, where we either had to roll back on the audit.rules so that auditing didn't consume so many system resources, or just disable the auditd service altogether.

Is there anything about ssh connections that would trigger an abnormal amount of syscalls to "open" or "unlink"?

Re: ssh connection issue

Posted: 2020/06/15 15:27:46
by phil.e
Some more info.
I can ssh into these users machines and ssh out to anywhere. I can also ssh into these machines, go to root, then "su -" to the users username and ssh out to the same locations they can't reach.
The difference between my connection and theirs is that they are using a GUI manager (either gnome, kde, or xfce), and I'm coming in over a runlevel 3 connection.
I did a tcpdump while they were trying to set up an ssh session, and you can see there's communication going on between the two machines.
Any ideas?

Re: ssh connection issue

Posted: 2020/06/15 18:00:21
by phil.e
A little more info -
If the user goes to terminal 2 (CTRL-ALT-F2) and does ssh to any host, it appears to work fine.
When they're using a GUI console, it appears to hang, whether or not they're trying to get an X display back. That seems to imply something going on with the desktop manager

I did some network testing to rule out network issues - they get the same errors whether the remote host is on a remote network, or on a local subnet. Even on the same subnet, no firewalls and routers in between, they still get the same problem.

Re: ssh connection issue

Posted: 2020/06/16 17:08:17
by aks
Does the environment use the ssh-agent? Is it running (and working!) when the user fails to ssh? Is it mandatory for your ssh configuration?
What's the state of play (response wise) of the (NFS?) remote home directory host? Is there a difference between when it works and doesn't (could be a socket opening on the the remotely mounted home directory - which would manifest as a file). I'd also check the window sizes of the packets when the ssh fails.

I'd temporarily make the user's home directory local and see if the problem happens (although if they login to multiple separate machine and expect the same home directory that would be a problem).

And (as always) check for AVCs (SELinux) alerts in the audit.log.

Re: ssh connection issue

Posted: 2020/06/22 17:21:14
by phil.e
I see a ssh-agent process running, but, to tell the truth I'm not that familiar with ssh-agent. I thought it was maybe just some background ssh helper process or something. I've never tried to configure it for anything. What is it supposed to be doing?
SELinux is disabled in this environment, so that shouldn't be a factor.
It's not really practical to modify their home directory - authentication is through a Windows/Centrify domain. If I change their home directory, it will break a lot of things.