Page 1 of 1

NFS hard mounts vs soft mounts

Posted: 2013/08/22 14:35:58
by talz
Several times over the past few years, I have had the situation where I had been using an NFS server when something happened, the client lost connection with the server, and the entire system froze.

Recently I found out that this is by design. Apparently this is a result of using hard mounts, which are the default in most cases. This is what I understand about how hard mounts and soft mounts work:

a) Hard mounts

Advantages: If the connection is lost and it is a minor problem, and you are ok with having all your NFS clients have frozen applications, and possibly have their entire systems frozen and useless until the NFS server comes back online, you may not lose any data when the NFS share becomes available again.

Disadvantages: If an application freezes and you can't bring up the NFS server, your only option appears to be to kill that application, even if it could have easily survived write errors. Also, a simple NFS share where you dump files once in a while, and that is completely unnecessary for the system to function, can freeze the entire system indefinitely if the server loses connection to the client.

b) Soft mounts

Advantages: They work as expected (for the most part) - if the server fails, the application gets an I/O error, and keeps going.

Disadvantage: According to the nfs man page, and every other source on the internet, this leads to silent data corruption because applications get told prematurely that a write was successful when in fact the data is still in cache, unable to be written to the NFS server that we just lost connection to.


What I don't quite understand is this:

1) How can the most widely accepted solution to using NFS (as far as I can tell) be to use NFS hard mounts, and if the server ever dies, kill the application that is frozen?

Example: If I had been working on a gedit document for 30 minutes and wanted to save it when the NFS mount was down, this is what would happen on:

Soft mounts - gedit would get an I/O error and asks you where else you want to save your work

Hard mounts - gedit would freeze indefinitely, forcing you to kill it and lose all your data


2) According to https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html :

"NFS will not use the cache unless explicitly instructed."

So how can silent data corruption be a problem if cache is never used? Is it only a problem if caching for NFS is enabled?

And why does samba not face these problems? Or does it? I've never had the system freeze on me when a samba share was unavailable. Does samba simply use soft mounts and risk silent data corruption?

Re: NFS hard mounts vs soft mounts

Posted: 2013/08/23 00:09:23
by simon_matthews
I think that the reason hard mounts are recommended is that this covers the case where the user's home directory is on an NFS server. If the server goes down, then the user can accomplish nothing anyway, so it's better to freeze the applications and wait for the server to come back up than to cause all the applications to fail, with resulting loss of data.

Re: NFS hard mounts vs soft mounts

Posted: 2013/08/23 01:05:39
by talz
[quote]
simon_matthews wrote:
I think that the reason hard mounts are recommended is that this covers the case where the user's home directory is on an NFS server. If the server goes down, then the user can accomplish nothing anyway, so it's better to freeze the applications and wait for the server to come back up than to cause all the applications to fail, with resulting loss of data.[/quote]

Ok - I can see that as being useful. How about for cases where the NFS mount is non-essential (not /boot, /home, /usr, or anything else important) - just simple storage for files (documents, music, photos, etc.)? It seems like the best way to go in that case would be soft mounts, if it wasn't for all the warnings of doom from every source I've come across.

Is there really no way to prevent soft mounts from causing silent data corruption? From what I understand about how soft mounts behave (the fact that the system prematurely tells applications that the write was successful when it's still in cache), it seems like making sure that caching is disabled when dealing with NFS mounts (which apparently is the default anyway) would solve the problem completely. If a write failed, applications would know, because they would be writing directly to the NFS share and not to cache.

I've tested soft mounts, and they seem to work great. The only issue I have with them is the threat of silent data corruption (as the nfs man page put it). Can anyone confirm if making sure that cache is disabled for NFS, that this is no longer a problem? Or confirm that this is not how it works?

Re: NFS hard mounts vs soft mounts

Posted: 2013/08/23 13:55:29
by talz
After a bit more reading, I think I might have a slightly better understanding of what's going on. This is what I think is happening:

So basically, while the caching is not a problem if it is disabled (still not sure about this part), NFS is designed to seamlessly fit onto the filesystem. Most applications don't know, or care that they are writing to an NFS share rather than the hard drive. That means that when they write data, they expect it to succeed, and don't always check if it has. Hard mounts guarantee that a write over NFS will succeed (eventually). Soft mounts do not, since they time out. If this is true, a hard mount is just as likely to cause silent data corruption as a soft mount, if the hard mount freezes an application (when connection to the NFS server is lost), and you forcefully unmount the NFS share, causing applications to get I/O errors and continue.

Can anyone confirm any of this?

Re: NFS hard mounts vs soft mounts

Posted: 2014/01/07 09:53:43
by ranl
Hi Talz

after reading this link http://nfs.sourceforge.net/#faq_e4
I think the problem of silent data corruption is being cause in the scenario that if your application
began to write a file of lets say 10GB, and after transferring 5GB there was a network issue and the soft mount returned an I/O error.

after the NFS server will be available again the 10GB file will be corrupted at the NFS server,
you need to make sure that your application could deal with these kind of situations.
if not the next time someone will try to access this file he will get a corrupted copy..

this is how understand it ..