S.M.A.R.T. Hard Disk Status

The firewall can monitor the health of hard drives that support Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.). This mechanism is intended to allow drives to test and track their own performance and reliability, with the ultimate goal of identifying a failing drive before it suffers data loss or causes an outage.

Support for S.M.A.R.T. varies by drive and BIOS, but it is fairly well supported in modern SSDs and hard drives. S.M.A.R.T. may need to be enabled in the BIOS and on the drive.

Note

S.M.A.R.T. is not a perfect metric of locating a failed drive; Many drives that have failed still pass a S.M.A.R.T. test, but generally speaking if S.M.A.R.T. does locate a problem, one does exist, so it is useful to identify disk failures.

The Diagnostics > SMART Status page obtains and displays information from drives, performs or aborts drive tests, and displays drive logs.

In every section of the page, a Device must be selected before choosing an option. This Device is the disk to be tested by S.M.A.R.T.

Warning

If a drive is not listed in the Device list, it either does not support S.M.A.R.T. or it is connected to a controller that is not supported for this purpose. In the case of RAID controllers, the controller itself may offer similar functionality or reporting via controller-specific utilities in the shell.

Viewing Drive Information

To view information about a drive:

  • Navigate to Diagnostics > SMART Status

  • Locate the Information panel on the page

  • Select the Device to view

  • Select the Information Type

  • Click fa-file-lines View

After reviewing the output, click fa-undo Back to return to the list of options.

The information types are explained in the next subsections.

Device Information

The Device Information option shows information about the drive itself, including the make, model, serial number, and other technical information about the drive capabilities, connection, and operation.

Model Family:     Intel 53x and Pro 1500/2500 Series SSDs
Device Model:     INTEL SSDSC2BW120A4
Serial Number:    XXXXXXXXXXXX
LU WWN Device Id: 5 5cd2e4 0003ae43c
Firmware Version: DC32
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Feb  9 13:26:15 2022 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Device Health

The Health option gives a brief pass/fail status of the drive.

SMART overall-health self-assessment test result: PASSED

SMART Capabilities

The SMART Capabilities choice gives a report about features and tests the drive supports.

General SMART Values:
Offline data collection status:  (0x05)      Offline data collection activity
                                     was aborted by an interrupting command from host.
                                     Auto Offline Data Collection: Disabled.
Self-test execution status:      (  33)      The self-test routine was interrupted
                                     by the host with a hard or soft reset.
Total time to complete Offline
data collection:             ( 2930) seconds.
Offline data collection
capabilities:                         (0x7f) SMART execute Offline immediate.
                                     Auto Offline data collection on/off support.
                                     Abort Offline collection upon new
                                     command.
                                     Offline surface scan supported.
                                     Self-test supported.
                                     Conveyance Self-test supported.
                                     Selective Self-test supported.
SMART capabilities:            (0x0003)      Saves SMART data before entering
                                     power-saving mode.
                                     Supports SMART auto save timer.
Error logging capability:        (0x01)      Error logging supported.
                                     General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     (  48) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x0025) SCT Status supported.
                                     SCT Data Table supported.

SMART Attributes

The SMART Attributes view is the most useful screen in the majority of cases, but it can also be one of the trickiest to interpret. There are several values displayed but the number and values vary widely by make and model.

The following output is from a 2.5 inch traditional HDD:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   062    Pre-fail  Always       -       65537
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   136   136   033    Pre-fail  Always       -       2
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       96
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   061   061   000    Old_age   Always       -       17502
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       96
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       37
193 Load_Cycle_Count        0x0012   093   093   000    Old_age   Always       -       77869
194 Temperature_Celsius     0x0002   152   152   000    Old_age   Always       -       36 (Min/Max 19/41)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0

There is a thorough article on Wikipedia for S.M.A.R.T. that includes a guide for interpreting the values. Some values are more obvious than others, for example the counts for reallocated sectors should be at or near zero. Others can be harder such as the Raw Read Error Rate, which on most drives should be low, but there are Seagate and similar drives that output gibberish or a random high number in that field that makes it useless on those disks.

A few of the values are informational, such as the Start/Stop Count, Power Cycle Count, and Power On Hours which give a sense of the overall age and usage for the drive. A high value isn’t necessarily bad for those, but if the drive is extraordinarily old, or has been power cycled a great many times, then have a plan prepared to replace the disk in the near future. The drive’s Temperature can give an indication of its environment, and if the temperature is too high, it can lead to stability issues.

The Load Cycle Count is a special value for spinning disks, since it indicates the number of times the heads have been parked. Some laptop drives will automatically park the heads after a short time, but an OS like pfSense® software will want to write periodically, which brings the heads out again. The head parking only makes sense in a mobile device that moves a lot so the heads have less chance of impacting the platter; In a server/firewall situation, it’s completely unnecessary. Drives are only capable of 100,000-300,000 load cycles in their lifetime, which means the count gets run through quickly if the heads are continually parked and unparked. pfSense software attempts to disable the power management features of hard drives at boot time because otherwise the drive could fail prematurely after running this count up high. This cycling happening is typically audible on drives as a soft clicking noise.

To contrast the above, the following output is from an SSD:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always       -       40524h+33m+23.020s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       81
170 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       18
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       1
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   036   041   000    Old_age   Always       -       36 (Min/Max 20/41)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       18
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       978116
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       65535
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       2
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       65535
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   073   073   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       978116
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       3326
249 NAND_Writes_1GiB        0x0032   100   100   000    Old_age   Always       -       80031

The metrics for an SSD can be significantly different, as seen above. In particular, SSDs can give an estimate of their remaining lifetime, writes of various sizes, errors rates, write failures, and other SSD-specific values in place of the other values that do not apply to an SSD.

All SMART Information

Selecting All SMART Information shows all of the information above and also includes the drive logs and self-test results.

All SMART and Non-SMART Information

This choice shows all of the information above, plus more information that can be gathered from the drive. This includes alternate formatting for the attribute list with even greater detail about attribute meanings, a list of available SMART data logs, drive temperature details, additional device statistics, and disk event log content.

Drive Logs

The View Logs section displays the content of various drive logs. These logs contain information and errors, usually related to self-tests and potentially other errors encountered by the disk.

To view drive logs:

  • Navigate to Diagnostics > SMART Status

  • Locate the View Logs panel on the page

  • Select the Device to view

  • Select the Log Type

  • Click fa-file-lines View

There are numerous logs available, but some logs are only present on specific types of devices, and some devices may not support certain logs even if they are the correct type.

Summary Error Log:

The Error log on a drive contains a record of errors encountered during the drive’s operation, such as read errors, uncorrectable errors, CRC errors, and so on. Running an Offline test will also make the drive print more errors here if they are found during the test.

Extended Error Log:

Similar to the summary error log but allows for longer and more detailed error messages.

SMART Self-Test Log:

The Self-test logs contain a record of several recent self-tests run on the drive. It shows the type of test, the results of the test, and in the case of tests that were stopped prematurely, it shows the percentage of the test remaining.

If an error is encountered during a test, the first logical block address (LBA) is printed to help determine where in the disk the problem lies.

Extended Self-Test Log:

Similar to the SMART self-test log but allows for longer and more detailed error messages.

Selective Self-Test Log:

Shows the results of recent selective self-tests and the min/max LBA sets which were included in the test.

Log Directory:

Prints the contents of the device log directory, which includes a list of logs and their current sizes.

Device Temperature Log (ATA Only):

The disk temperature information log from the SMART command transport. Prints both the current temperature and a temperature history with an ASCII graph.

Device Statistics (ATA Only):

Values and descriptions of ATA device statistics logged by the drive.

SATA PHY Events (SATA Only):

Values and descriptions of SATA PHY events logged by the drive.

SAS PHY Events (SAS Only):

Values and descriptions of SAS PHY events logged by the drive.

NVMe Log (NVMe Only):

Prints the contents of the NVMe drive log.

SSD Device Statistics (ATA/SCSI):

Prints either the device statistics or a media percentage used endurance indicator.

Drive Self-tests

To perform a test on a drive:

  • Navigate to Diagnostics > SMART Status

  • Locate the Perform Self-Tests panel on the page

  • Select the Device to test

  • Select the Test Type

  • Click fa-wrench Test

The types of tests are described in the following subsections.

Offline

An Offline test is called so because it is done while the disk is idle. This test can make accessing the drive slow while it is happening, but if there is a lot of disk activity, the drive may delay the test until the disk becomes idle again. Because of this variability, the exact time the test takes is hard to predict. An estimate of the time to complete an offline test for a given disk is shown in the S.M.A.R.T. Capabilities. An offline test will also cause the drive to update several of the S.M.A.R.T. attributes to indicate the results. After running a test and checking the results, review the S.M.A.R.T. Attributes again as well as the Error log.

Short

The Short test takes around ten minutes and checks the drive’s mechanics and reading performance. A more accurate estimate of the length the test will take on a drive can be seen in the S.M.A.R.T. Capabilities. To see the results of this test, view the Self-test Logs. It can be run at any time and it does not typically impact performance.

Long

The Long test is similar to the Short test but is more thorough. The time taken by the test depends on the size of the disk, but it is much longer than the short test on its own. A more accurate estimate of the length the test will take on a drive can be seen in the S.M.A.R.T. Capabilities. As with the short test, the results end up in the Self-test Logs.

Conveyance

This test is not supported by all drives. Its primary purpose is to test the drive after it has been physically relocated to determine if any components have been damaged by the move. In most cases it only takes a few minutes to complete. To determine if a drive supports a conveyance test, refer to the S.M.A.R.T. Capabilities output.

Canceling Active Tests

To cancel an active test on a drive:

  • Navigate to Diagnostics > SMART Status

  • Locate the Abort panel on the page

  • Select the Device currently running a test

  • Click fa-times Abort

Any active tests on the drive will be stopped.