S.M.A.R.T. Hard Disk Status¶
The firewall can monitor the health of hard drives that support Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.). This mechanism is intended to allow drives to test and track their own performance and reliability, with the ultimate goal of identifying a failing drive before it suffers data loss or causes an outage.
Support for S.M.A.R.T. varies by drive and BIOS, but it is fairly well supported in modern ATA hard drives and SSDs. S.M.A.R.T. may need to be enabled in the BIOS and on the drive.
Note
S.M.A.R.T. is not a perfect metric of locating a failed drive; Many drives that have failed still pass a S.M.A.R.T. test, but generally speaking if S.M.A.R.T. does locate a problem, one does exist, so it is useful to identify disk failures.
The Diagnostics > SMART Status page obtains and displays information from drives, performs or aborts drive tests, and displays drive logs.
In every section of the page, a Device must be selected before choosing an option. This Device is the disk to be tested by S.M.A.R.T.
Warning
If a drive is not listed in the Device list, it either does not support S.M.A.R.T. or it is connected to a controller that is not supported for this purpose. In the case of RAID controllers, the controller itself may offer similar functionality or reporting via controller-specific utilities in the shell.
Viewing Drive Information¶
To view information about a drive:
Navigate to Diagnostics > SMART Status
Locate the Information panel on the page
Select the Device to view
Select the Info Type
Click
View
After reviewing the output, click Back to return to the list of
options.
The information types are explained in the next subsections.
Info¶
The Info option shows information about the drive itself, including the make, model, serial number, and other technical information about the drive’s capabilities, connection, and operation.:
Model Family: Hitachi Travelstar 5K500.B
Device Model: Hitachi HTS545050B9A300
Serial Number: 090630PB4400XXXXXXXX
LU WWN Device Id: 5 000cca 597XXXXXX
Firmware Version: PB4OC64G
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Fri Oct 7 16:31:20 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Health¶
The Health option gives a brief pass/fail status of the drive:
SMART overall-health self-assessment test result: PASSED
SMART Capabilities¶
The SMART Capabilities choice gives a report about features and tests the drive supports, as in this output:
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 645) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 158) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
Attributes¶
The Attributes view is the most useful screen in the majoriy of cases, but it can also be one of the trickiest to interpret. There are several values displayed but the number and values vary widely by make and model.
The following output is from a laptop size traditional HDD:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 099 099 062 Pre-fail Always - 65537
2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 136 136 033 Pre-fail Always - 2
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 96
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 061 061 000 Old_age Always - 17502
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 96
191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 37
193 Load_Cycle_Count 0x0012 093 093 000 Old_age Always - 77869
194 Temperature_Celsius 0x0002 152 152 000 Old_age Always - 36 (Min/Max 19/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
223 Load_Retry_Count 0x000a 100 100 000 Old_age Always - 0
There is a thorough article on Wikipedia for S.M.A.R.T. that includes a guide for interpreting the values. Some values are more obvious than others, for example the counts for reallocated sectors should be at or near zero. Others can be harder such as the Raw Read Error Rate, which on most drives should be low, but there are Seagate and similar drives that output gibberish or a random high number in that field that makes it useless on those disks.
A few of the values are informational, such as the Start/Stop Count, Power Cycle Count, and Power On Hours which give a sense of the overall age and usage for the drive. A high value isn’t necessarily bad for those, but if the drive is extraordinarily old, or has been power cycled a great many times, then have a plan prepared to replace the disk in the near future. The drive’s Temperature can give an indication of its environment, and if the temperature is too high, it can lead to stability issues.
The Load Cycle Count is a special value for spinning disks, since it indicates the number of times the heads have been parked. Some laptop drives will automatically park the heads after a short time, but an OS like pfSense® will want to write periodically, which brings the heads out again. The head parking only makes sense in a mobile device that moves a lot so the heads have less chance of impacting the platter; In a server/firewall situation, it’s completely unnecessary. Drives are only capable of 100,000-300,000 load cycles in their lifetime, which means the count gets run through quickly if the heads are continually parked and unparked. pfSense attempts to disable the power management features of hard drives at boot time because otherwise the drive could fail prematurely after running this count up high. This cycling happening is typically audible on drives as a soft clicking noise.
To contrast the above, the following output is from an SSD:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 4198h+12m+31.560s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 219
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 184
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 050 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 053 065 000 Old_age Always - 53 (Min/Max 19/65)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 184
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 21422
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 65535
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 17
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 21422
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 4635
249 NAND_Writes_1GiB 0x0013 100 100 000 Pre-fail Always - 595
The metrics shown for an SSD can be significantly different, as seen above. In particular, SSDs can give an estimate of their remaining lifetime, writes of various sizes, errors rates, write failures, and other SSD-specific values in place of the other values that do not apply to an SSD.
All¶
Selecting All will, as the name implies, show all of the information above and also includes the drive logs and self-test results.
Drive Self-tests¶
To perform a test on a drive:
Navigate to Diagnostics > SMART Status
Locate the Perform Self-Tests panel on the page
Select the Device to test
Select the Test Type
Click
Test
The types of tests are described in the following subsections.
Offline¶
An Offline test is called so because it is done while the disk is idle. This test can make accessing the drive slow while it is happening, but if there is a lot of disk activity, the drive may delay the test until the disk becomes idle again. Because of this variability, the exact time the test takes is hard to predict. An estimate of the time to complete an offline test for a given disk is shown in the S.M.A.R.T. Capabilities. An offline test will also cause the drive to update several of the S.M.A.R.T. attributes to indicate the results. After running a test and checking the results, review the S.M.A.R.T. Attributes again as well as the Error log.
Short¶
The Short test takes around ten minutes and checks the drive’s mechanics and reading performance. A more accurate estimate of the length the test will take on a drive can be seen in the S.M.A.R.T. Capabilities. To see the results of this test, view the Self-test Logs. It can be run at any time and it does not typically impact performance.
Long¶
The Long test is similar to the Short test but is more thorough. The time taken by the test depends on the size of the disk, but it is much longer than the short test on its own. A more accurate estimate of the length the test will take on a drive can be seen in the S.M.A.R.T. Capabilities. As with the short test, the results end up in the Self-test Logs.
Conveyance¶
This test is not supported by all drives. Its primary purpose is to test the drive after it has been physically relocated to determine if any components have been damaged by the move. In most cases it only takes a few minutes to complete. To determine if a drive supports a conveyance test, refer to the S.M.A.R.T. Capabilities output.
Canceling Active Tests¶
To cancel an active test on a drive:
Navigate to Diagnostics > SMART Status
Locate the Abort panel on the page
Select the Device currently running a test that must be canceled
Click
Abort
Any active tests on the drive will be stopped.
Drive Logs¶
The Drive Logs contain information and errors, usually related to self-tests and potentially other errors encountered.
To view drive logs:
Navigate to Diagnostics > SMART Status
Locate the View Logs panel on the page
Select the Device to view
Select the Log Type
Click
View
Error Log¶
The Error log on a drive contains a record of errors encountered during the drive’s operation, such as read errors, uncorrectable errors, CRC errors, and so on. Running an Offline test will also make the drive print more errors here if they are found during the test.
Self-test Logs¶
The Self-test logs contain a record of several recent self-tests run on the drive. It shows the type of test, the results of the test, and in the case of tests that were stopped prematurely, it shows the percentage of the test remaining.
If an error is encountered during a test, the first logical block address (LBA) is printed to help determine where in the disk the problem lies.