Troubleshooting Multiple Disks

If a system has multiple disks and pfSense software has been installed on both, it is possible they may conflict in one or more ways. For example, this can happen if an older disk was left in place after adding a new disk of a different type and reinstalling to the new disk.

In these situations best practice is to remove the unused disk but that is not always possible. For example, if the original installation was using an embedded disk such as eMMC. If the disk cannot be removed, then the next best solution is to clear the metadata from the unused disk.

A common way multiple disks conflict is if they both use the same ZFS label. In that case it is unpredictable which ZFS pool will be used by the OS and it may change depending on the boot order. Another way is if the OS boots the kernel from one disk but mounts the other disk in the operating system, leading to a situation where the installed OS appears up-to-date but is booting with an outdated kernel.

Identify the Disk

To clear the metadata safely, first identify the unused disk. This may take some investigation, but typically disks are listed in the full boot log output in /var/log/dmesg.boot, the output of sysctl kern.disks, along with other OS commands such as geom list disk, gpart list, and geom -t. In some cases it’s clear which is which, such as when using an add-on SSD instead of eMMC, where the eMMC disk is named mmcsdX and the SSD is nvdX or adaX.

It’s also possible that the drive that loaded the kernel at boot time is different from the drive mounted as the root of the filesystem (/). Once booted, it’s not possible to determine which drive loaded the kernel, but it is possible to determine which drive holds the root filesystem.

Note

If the unused disk cannot be definitively identified, take a backup, clear the data from all disks, and then reinstall.

When a system has multiple disks, odds are high that the disk holding the live root filesystem is the intended disk and whichever disk is not used for the root filesystem is the one that should be wiped to avoid conflicts.

UFS

On UFS systems, look at the output of df / in the first column (Filesystem):

  • If it’s a disk device (e.g. /dev/ada0s2a), then note it and move on. The filesystem device name will include a slice or partition identifier at the end (e.g. s2a in the previous example) but it should be possible to match the disk name against the list in sysctl kern.disks.

    In this case, it is located on ada0s2a which is the first filesystem in the second slice of the disk ada0.

  • If it contains a label (e.g. /dev/diskid/, /dev/gpt/, or /dev/ufs/), look at the output of glabel status and match the start of the filesystem label with the Name column and find the disk device in the corresponding Components column.

    $ df /
    Filesystem                   1K-blocks    Used   Avail Capacity  Mounted on
    /dev/diskid/DISK-9D1CEC59s2a   7353532 1664148 5101104    25%    /
    $ glabel status
                    Name  Status  Components
    diskid/DISK-9D1CEC59     N/A  ada0
    

    In this case, the root filesystem is located on /dev/diskid/DISK-9D1CEC59s2a which is the first filesystem in the second slice of disk ID DISK-9D1CEC59, which corresponds to the disk ada0.

    Other label types may not always include a slice or partition identifier.

ZFS

For systems using ZFS, check output of zpool status and look at the disk names in the output:

$ zpool status
  pool: pfSense
 state: ONLINE
  scan: scrub repaired 0B in 00:00:17 with 0 errors on Wed Feb 22 11:03:52 2023
config:

     NAME        STATE     READ WRITE CKSUM
     pfSense     ONLINE       0     0     0
       nvd0p4    ONLINE       0     0     0

errors: No known data errors

In this output, the ZFS pool is located on nvd0p4 which is the fourth partition on the disk nvd0.

Using the Geom Tree

If there is any doubt about the devices in question based on the filesystem device, run geom -t. That command outputs a tree style view of all disks and their components, such as slices/partitions. This can make it relatively simple to narrow down the disk which contains a given ID, partition, or slice with minimal searching through command output:

$ geom -t
Geom                                 Class      Provider
nvd0                                 DISK       nvd0
  nvd0                               PART       nvd0p1
    nvd0p1                           LABEL      gpt/efiboot0
      msdosfs.gpt/efiboot0           VFS
      gpt/efiboot0                   DEV
    nvd0p1                           DEV
  nvd0                               PART       nvd0p2
    nvd0p2                           LABEL      gpt/gptboot0
      gpt/gptboot0                   DEV
    nvd0p2                           DEV
  nvd0                               PART       nvd0p3
    swap                             SWAP
    nvd0p3                           DEV
  nvd0                               PART       nvd0p4
    nvd0p4                           DEV
    zfs::vdev                        ZFS::VDEV
  nvd0                               DEV
mmcsd0                               DISK       mmcsd0
  mmcsd0                             DEV
  mmcsd0                             LABEL      diskid/DISK-9D1CEC59
    diskid/DISK-9D1CEC59             DEV
    diskid/DISK-9D1CEC59             PART       diskid/DISK-9D1CEC59s1
      diskid/DISK-9D1CEC59s1         DEV
      msdosfs.diskid/DISK-9D1CEC59s1 VFS
    diskid/DISK-9D1CEC59             PART       diskid/DISK-9D1CEC59s2
      diskid/DISK-9D1CEC59s2         DEV
      diskid/DISK-9D1CEC59s2         PART       diskid/DISK-9D1CEC59s2a
        diskid/DISK-9D1CEC59s2a      DEV
        ffs.diskid/DISK-9D1CEC59s2a  VFS
mmcsd0boot0                          DISK       mmcsd0boot0
  mmcsd0boot0                        DEV
mmcsd0boot1                          DISK       mmcsd0boot1
  mmcsd0boot1                        DEV

Clear the Disk

In these examples the unused disk is mmcsd0.

The commands in these examples must be run from a console or SSH shell prompt. Do not attempt to execute these commands from the GUI. The best practice is to run them from the console and to have installation media on hand in case a reinstall is necessary.

Tip

If any of the commands generate an error, boot pfSense software installation media and perform the commands from a shell launched through the installer menu . When booted from install media, the disks in the device will not be mounted and can be safely cleared. For ARM devices, boot the recovery installer and use Ctrl-Z to suspend the recovery process and reach a shell prompt to run the commands.

Wipe Metadata

The quickest and easiest way to wipe a disk is to clear its metadata.

The following commands clear the disk partition metadata, ZFS metadata, and also wipe the start of the disk to clear the partition table and other data at the beginning of the disk. Depending on the situation it may only be necessary to clear the ZFS metadata but it’s safer to clear it all.

### Stop a legacy style GEOM mirror and clear its metadata from all disks
### Mirror name may vary, check "gmirror status" output.
# gmirror destroy -f pfSenseMirror

### Clear the ZFS label (exact partition may vary)
# zpool labelclear -f /dev/mmcsd0p4

### Clear the partition metadata
# gpart destroy -F mmcsd0

### Wipe the first 1MB of the disk
# dd if=/dev/zero of=/dev/mmcsd0 bs=1M count=1 status=progress

Note

Alternately, skip the first two commands and omit the count=1 on dd to wipe the entire target disk from start to end.

Wipe Start and End of Disk

Another tactic is to wipe only the start and end of the disk. However, this approach is a much more complicated process as it involves calculations based on the sector size and number of sectors on the disk:

### Wipe the first 1MB of the disk
# dd if=/dev/zero of=/dev/mmcsd0 bs=1M count=1 status=progress

### Wipe the last 1MB of the disk
# dd bs=`diskinfo mmcsd0 | awk '{print $2}'` \
     if=/dev/zero \
     of=/dev/mmcsd0 \
     count=`diskinfo mmcsd0 | awk '{print ((1024 * 1024) / $2)}'` \
     seek=`diskinfo mmcsd0 | awk '{print $4 - ((1024 * 1024) / $2)}'` \
     status=progress

Note

Be sure to replace every instance of the target disk in each command, as the disk is referenced numerous times to obtain the necessary calculation numbers.