Troubleshooting Disk Lifetime¶
An important part of keeping a firewall running reliably is to ensure its storage is in good condition.
If the disk in a firewall fails, it may continue to run in a reduced capacity until the system restarts. Exactly which parts may fail depends on the services and packages in use and what roles the firewall is performing. Packet filtering may continue to function indefinitely, but it may not be able to update rules, for example. Certain types of disks, such as SSD and eMMC disks, may fail into a read only state where disk writes fail or are discarded, but data can still be read.
Checking Disk Health & Lifetime¶
Contrary to popular rumors, the vast majority of modern flash storage found in SSD and eMMC drives is very resilient. Early SSDs were more prone to failure or only supported a comparatively small number of writes, but the technology has improved vastly over the years.
Traditional spinning platter hard drives have their own problems, such as failure of mechanical moving parts, which are harder to predict.
Over the lifetime of any disk, they typically will encounter failing spots and remap them to spares. This happens on traditional HDDs as well as SSDs. It’s normal to see a small number of these over time, but if the pool of spares is almost consumed, the disk should be replaced.
Depending on the hardware it may be possible to query the disk for information about its health. Not all platforms support each method, and disk OEMs track data in different ways. When in doubt, contact the manufacturer of the disk for details.
SSD¶
Many SSDs can report their health through S.M.A.R.T. data as described in S.M.A.R.T. Hard Disk Status as well as perform disk tests. Disks connected using SATA, mSATA, M.2, and other similar methods can typically use the same methods as traditional HDDs, but may have SSD-specific health information in their S.M.A.R.T. data. For example, some SSDs will include a life time estimate or media wear indication level.
NVMe disks require special handling, but can also be queried through S.M.A.R.T., though the data they report may be in a different format from typical SSDs.
eMMC¶
eMMC disks are unique in that they do not support S.M.A.R.T. but hardware which supports the correct revisions of the eMMC specification are capable of reporting health in their own way.
Install MMC Utilities¶
The first step is to install the mmc-utils
package from an SSH or console
shell prompt:
# pkg install -y mmc-utils; rehash
Note
This package is currently only available on pfSense® Plus software and does not have a GUI component. It must be run from an SSH or console shell prompt.
Check MMC Health Status¶
To check the health of the first MMC disk, run the following command:
# mmc extcsd read /dev/mmcsd0rpmb
If the disk supports reporting its health, it should return output. For additional MMC disks, increase the device number in the command.
Interpreting MMC Health Data¶
The primary fields to look at in MMC health are the life time estimations and Pre-EOL estimation.
Note
Not all disks support all of these fields.
: mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x02
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
- Type A:
An estimate for life time of SLC (and pseudo-SLC) eraseblocks in steps of 10%.
- Type B:
An estimate for life time of MLC eraseblocks in steps of 10%.
- Type A and B Values:
The values of the A and B life time estimations are in 10% increments based on the hexadecimal value returned by the disk. This is only an estimate and the value can exceed 100%.
Possible values include:
Value
Meaning
0x00
Not defined
0x01
The disk has used 0%-10% of its estimated life time
0x02
The disk has used 10%-20% of its estimated life time
0x03
The disk has used 20%-30% of its estimated life time
0x04
The disk has used 30%-40% of its estimated life time
0x05
The disk has used 40%-50% of its estimated life time
0x06
The disk has used 50%-60% of its estimated life time
0x07
The disk has used 60%-70% of its estimated life time
0x08
The disk has used 70%-80% of its estimated life time
0x09
The disk has used 80%-90% of its estimated life time
0x0a
The disk has used 90%-100% of its estimated life time
0x0b
The disk has used 100%-110% of its estimated life time
Warning
This is only an estimation. Though useful as a general guideline, it does not necessarily indicate that a disk will fail at a given time.
- Pre-EOL:
Pre EOL information is an overall status for reserved blocks on the disks.
Possible values are:
Value
Severity
Meaning
0x00
Not defined.
0x01
Normal
The disk has consumed less than 80% of its reserved blocks
0x02
Warning
The disk has consumed more than 80% of its reserved blocks
0x03
Urgent
The disk has consumed more than 90% of its reserved blocks
HDD¶
Most HDDs can report their health through S.M.A.R.T. data as described in S.M.A.R.T. Hard Disk Status as well as perform disk tests. S.M.A.R.T. may need to be enabled both in the system BIOS and on the disk, though in modern systems these tend to both be enabled by default.
USB Disks¶
Disks connected through USB controllers are unlikely to support health queries, no matter what type of disk they are.
Taking Action¶
If a disk is showing signs that it might be failing, the safest action is to replace the disk. If a device has a built-in disk like an eMMC disk that cannot be replaced, it may be capable of taking an additional drive using another means such as NVMe, M.2, mSATA, or SATA. When in doubt, contact the device OEM for guidance.
If the disk is showing some wear but still has a lot of life left, consider making changes to reduce disk writes to potentially extend its remaining lifetime.