S.M.A.R.T. Hard Disk Status¶
The firewall can monitor the health of hard drives that support Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.). This mechanism is intended to allow drives to test and track their own performance and reliability, with the ultimate goal of identifying a failing drive before it suffers data loss or causes an outage.
Support for S.M.A.R.T. varies by drive and BIOS, but it is fairly well supported in modern SSDs and hard drives. S.M.A.R.T. may need to be enabled in the BIOS and on the drive.
S.M.A.R.T. is not a perfect metric of locating a failed drive; Many drives that have failed still pass a S.M.A.R.T. test, but generally speaking if S.M.A.R.T. does locate a problem, one does exist, so it is useful to identify disk failures.
The Diagnostics > SMART Status page obtains and displays information from drives, performs or aborts drive tests, and displays drive logs.
In every section of the page, a Device must be selected before choosing an option. This Device is the disk to be tested by S.M.A.R.T.
If a drive is not listed in the Device list, it either does not support S.M.A.R.T. or it is connected to a controller that is not supported for this purpose. In the case of RAID controllers, the controller itself may offer similar functionality or reporting via controller-specific utilities in the shell.
Viewing Drive Information¶
To view information about a drive:
Navigate to Diagnostics > SMART Status
Locate the Information panel on the page
Select the Device to view
Select the Information Type
After reviewing the output, click Back to return to the list of options.
The information types are explained in the next subsections.
The Device Information option shows information about the drive itself, including the make, model, serial number, and other technical information about the drive capabilities, connection, and operation.
Model Family: Intel 53x and Pro 1500/2500 Series SSDs Device Model: INTEL SSDSC2BW120A4 Serial Number: XXXXXXXXXXXX LU WWN Device Id: 5 5cd2e4 0003ae43c Firmware Version: DC32 User Capacity: 120,034,123,776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device TRIM Command: Available, deterministic Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Feb 9 13:26:15 2022 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled
The Health option gives a brief pass/fail status of the drive.
SMART overall-health self-assessment test result: PASSED
The SMART Capabilities choice gives a report about features and tests the drive supports.
General SMART Values: Offline data collection status: (0x05) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Disabled. Self-test execution status: ( 33) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: ( 2930) seconds. Offline data collection capabilities: (0x7f) SMART execute Offline immediate. Auto Offline data collection on/off support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 48) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x0025) SCT Status supported. SCT Data Table supported.
The SMART Attributes view is the most useful screen in the majority of cases, but it can also be one of the trickiest to interpret. There are several values displayed but the number and values vary widely by make and model.
The following output is from a 2.5 inch traditional HDD:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 099 099 062 Pre-fail Always - 65537 2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 136 136 033 Pre-fail Always - 2 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 96 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 061 061 000 Old_age Always - 17502 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 96 191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 37 193 Load_Cycle_Count 0x0012 093 093 000 Old_age Always - 77869 194 Temperature_Celsius 0x0002 152 152 000 Old_age Always - 36 (Min/Max 19/41) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 223 Load_Retry_Count 0x000a 100 100 000 Old_age Always - 0
There is a thorough article on Wikipedia for S.M.A.R.T. that includes a guide for interpreting the values. Some values are more obvious than others, for example the counts for reallocated sectors should be at or near zero. Others can be harder such as the Raw Read Error Rate, which on most drives should be low, but there are Seagate and similar drives that output gibberish or a random high number in that field that makes it useless on those disks.
A few of the values are informational, such as the Start/Stop Count, Power Cycle Count, and Power On Hours which give a sense of the overall age and usage for the drive. A high value isn’t necessarily bad for those, but if the drive is extraordinarily old, or has been power cycled a great many times, then have a plan prepared to replace the disk in the near future. The drive’s Temperature can give an indication of its environment, and if the temperature is too high, it can lead to stability issues.
The Load Cycle Count is a special value for spinning disks, since it indicates the number of times the heads have been parked. Some laptop drives will automatically park the heads after a short time, but an OS like pfSense® software will want to write periodically, which brings the heads out again. The head parking only makes sense in a mobile device that moves a lot so the heads have less chance of impacting the platter; In a server/firewall situation, it’s completely unnecessary. Drives are only capable of 100,000-300,000 load cycles in their lifetime, which means the count gets run through quickly if the heads are continually parked and unparked. pfSense software attempts to disable the power management features of hard drives at boot time because otherwise the drive could fail prematurely after running this count up high. This cycling happening is typically audible on drives as a soft clicking noise.
To contrast the above, the following output is from an SSD:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 40524h+33m+23.020s 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 81 170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0 174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 18 183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 1 184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 036 041 000 Old_age Always - 36 (Min/Max 20/41) 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 18 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 978116 226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 65535 227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 2 228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 65535 232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0 233 Media_Wearout_Indicator 0x0032 073 073 000 Old_age Always - 0 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 978116 242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 3326 249 NAND_Writes_1GiB 0x0032 100 100 000 Old_age Always - 80031
The metrics for an SSD can be significantly different, as seen above. In particular, SSDs can give an estimate of their remaining lifetime, writes of various sizes, errors rates, write failures, and other SSD-specific values in place of the other values that do not apply to an SSD.
All SMART Information¶
Selecting All SMART Information shows all of the information above and also includes the drive logs and self-test results.
All SMART and Non-SMART Information¶
This choice shows all of the information above, plus more information that can be gathered from the drive. This includes alternate formatting for the attribute list with even greater detail about attribute meanings, a list of available SMART data logs, drive temperature details, additional device statistics, and disk event log content.
The View Logs section displays the content of various drive logs. These logs contain information and errors, usually related to self-tests and potentially other errors encountered by the disk.
To view drive logs:
Navigate to Diagnostics > SMART Status
Locate the View Logs panel on the page
Select the Device to view
Select the Log Type
There are numerous logs available, but some logs are only present on specific types of devices, and some devices may not support certain logs even if they are the correct type.
- Summary Error Log
The Error log on a drive contains a record of errors encountered during the drive’s operation, such as read errors, uncorrectable errors, CRC errors, and so on. Running an Offline test will also make the drive print more errors here if they are found during the test.
- Extended Error Log
Similar to the summary error log but allows for longer and more detailed error messages.
- SMART Self-Test Log
The Self-test logs contain a record of several recent self-tests run on the drive. It shows the type of test, the results of the test, and in the case of tests that were stopped prematurely, it shows the percentage of the test remaining.
If an error is encountered during a test, the first logical block address (LBA) is printed to help determine where in the disk the problem lies.
- Extended Self-Test Log
Similar to the SMART self-test log but allows for longer and more detailed error messages.
- Selective Self-Test Log
Shows the results of recent selective self-tests and the min/max LBA sets which were included in the test.
- Log Directory
Prints the contents of the device log directory, which includes a list of logs and their current sizes.
- Device Temperature Log (ATA Only)
The disk temperature information log from the SMART command transport. Prints both the current temperature and a temperature history with an ASCII graph.
- Device Statistics (ATA Only)
Values and descriptions of ATA device statistics logged by the drive.
- SATA PHY Events (SATA Only)
Values and descriptions of SATA PHY events logged by the drive.
- SAS PHY Events (SAS Only)
Values and descriptions of SAS PHY events logged by the drive.
- NVMe Log (NVMe Only)
Prints the contents of the NVMe drive log.
- SSD Device Statistics (ATA/SCSI)
Prints either the device statistics or a media percentage used endurance indicator.
To perform a test on a drive:
Navigate to Diagnostics > SMART Status
Locate the Perform Self-Tests panel on the page
Select the Device to test
Select the Test Type
The types of tests are described in the following subsections.
An Offline test is called so because it is done while the disk is idle. This test can make accessing the drive slow while it is happening, but if there is a lot of disk activity, the drive may delay the test until the disk becomes idle again. Because of this variability, the exact time the test takes is hard to predict. An estimate of the time to complete an offline test for a given disk is shown in the S.M.A.R.T. Capabilities. An offline test will also cause the drive to update several of the S.M.A.R.T. attributes to indicate the results. After running a test and checking the results, review the S.M.A.R.T. Attributes again as well as the Error log.
The Short test takes around ten minutes and checks the drive’s mechanics and reading performance. A more accurate estimate of the length the test will take on a drive can be seen in the S.M.A.R.T. Capabilities. To see the results of this test, view the Self-test Logs. It can be run at any time and it does not typically impact performance.
The Long test is similar to the Short test but is more thorough. The time taken by the test depends on the size of the disk, but it is much longer than the short test on its own. A more accurate estimate of the length the test will take on a drive can be seen in the S.M.A.R.T. Capabilities. As with the short test, the results end up in the Self-test Logs.
This test is not supported by all drives. Its primary purpose is to test the drive after it has been physically relocated to determine if any components have been damaged by the move. In most cases it only takes a few minutes to complete. To determine if a drive supports a conveyance test, refer to the S.M.A.R.T. Capabilities output.
Canceling Active Tests¶
To cancel an active test on a drive:
Navigate to Diagnostics > SMART Status
Locate the Abort panel on the page
Select the Device currently running a test
Any active tests on the drive will be stopped.