Question about bad sectors on mechanically rotating hard drives

alpha754293 · Post by **alpha754293** » 2019/09/14 04:36:22

I have some HGST 6 TB Desktop NAS drives where SMART is reporting uncorrectable/bad sector issues.

So, my question is if I do a scan for bad sectors and the OS finds them and marks them as bad -- is that managed at the OS level or is that actually marked into whatever data management table the drive actually has so that regardless of what OS is installed or reading the drive, it knows not to try to use those bad sectors?

(e.g. Can I run the scan in Windows and then format as XFS and deploy the drive in CentOS?)

Or do I have to run the scan and mark it in the same OS as the one that's going to be hosting the drive?

I was doing some research about running the scan in CentOS, but I haven't quite decided if I am going to keep using a drive with bad sectors on it in CentOS or if I am going to re-deploy it into a Windows machine.

Thanks.

jlehtone · Post by **jlehtone** » 2019/09/14 08:25:36

If you seek "bad blocks" from https://linux.die.net/man/8/mke2fs you see that the filesystem is told to avoid certain sectors.
In other words, "managed OS level".

However, in the physical disk you have both physical and logical blocks. OS uses logical. The disk maps logical to physical.

Initially, the disk has some reserve of unmapped physical sectors. When certain errors do occur on logical sector, the drive remaps that logical to new physical sector and the erroneus physical sector is unmapped, unused, retired. From OS point of view the logical sector remains "good".

The "bad blocks" option in OS is from era when the disks did not remap sectors.

If your SMART reports "uncorrectable", then it probably shows also that the disk has already remapped several sectors.
Either the disk is out of reserve and thus cannot remap any more, or for other reason fails to remap failing sectors.

Paraphrasing Bond and Goldfinger:
'You expect me to work?'
'No, disk, I expect you to die.'

avij · Post by **avij** » 2019/09/14 08:54:45

If you do a dd if=/dev/sdb of=/dev/null conv=sync,noerror the disk will attempt to read all the sectors the disk thinks are okay (obviously replace /dev/sdb with your disk device name). If the disk encounters a sector which it can't read, it will make an internal note of this. This will be shown in smartctl -a output in Current_Pending_Sector. dd will show a read error, but thanks to "conv=sync,noerror" dd will not stop at that read error.

Now, how to "fix" this. Note that this process is destructive, ie. the data you have on the disk will be wiped out (*). If you run dd if=/dev/zero of=/dev/sdb the disk will notice that some of the writes would be made to sectors that are currently unreadable. As jlehtone described above, the sector will be retired and the write will be made to an alternative location on the physical disk. This is transparent to the operating system. smartctl's Reallocated_Sector_Ct shows the number of sectors that have been reallocated.

If the hard disk runs out of reserve sectors, the write will fail.

Personally, if there are bad sectors on a drive, I would not trust storing my data on such a drive. My experience is that if there are bad sectors, more bad sectors are likely to show up later on.

(*) You can zap a single sector only (thus losing only a tiny block of data) if you know exactly what you are doing. I have done this a few times in the past, but it requires rather intimate knowledge of filesystem internals if you wish to know which file will be affected by clearing a single sector.

alpha754293 · Post by **alpha754293** » 2019/09/14 15:22:45

So, in other words, I can't correct it within one OS and then re-deploy the drive running another OS.

Would that be a fair and correct statement/assumption for me to make?

Post by **TrevorH** » 2019/09/14 15:54:58

If you are seeing bad sectors being reported then the chances are that the drive is already along the way to being junk.

The drive has a pool of spare sectors that it assigns as existing ones become unreadable. It mostly does that transparently so that you never know about it. Sometimes it will attempt to read a sector and it fails. It will then mark that sector as "pending" and it will stay that way until something writes to that sector again, at which point it will assign a new one from the spare pool and carry on. If that's happening frequently then the drive is already on its way out. If you start seeing uncorrectable sectors then the drive has run out of the pool of spares and now it's really time to replace it.

For the pending sector case, you can find out which sector it is and the use dd to write to that specific sector and force the drive to re-assign a spare. Or you can run the badblocks command in non-destructive write mode and that will read all sectors on the drive and then rewrite them with the contents it just read. That will not only force all pending ones to be re-assigned, it will also test every sector to make sure it can be read.

Mostly though, it you see bad blocks appearing, it means the drive is already on its way out.

CentOS

Question about bad sectors on mechanically rotating hard drives

Question about bad sectors on mechanically rotating hard drives

Re: Question about bad sectors on mechanically rotating hard drives

Re: Question about bad sectors on mechanically rotating hard drives

Re: Question about bad sectors on mechanically rotating hard drives

Re: Question about bad sectors on mechanically rotating hard drives