March 9, 2017

Monitor SATA and SSD Health with SMART

dead-sata-drive.jpg

dead SATA drive
Smartmontools can help you continually monitor your drives and predict imminent failure.

Smartmontools helps you keep an eye on the health of your hard disk and SSD drives. SMART is the Self-Monitoring, Analysis and Reporting Technology built-in to modern drives, and smartmontools reads the SMART data. It's not 100 percent accurate at predicting imminent drive failure, so, as you should always do, keep current backups.

Whatever Linux you use, the package name is probably smartmontools. The main command that you will use is smartctl. Install it and then query basic information about one of your drives:

$ sudo smartctl -i /dev/sda

This should be uneventful, as it prints basic information about your drive including model number, serial number, firmware version, size, sector size, and if it is SMART-enabled. But I got this little surprise:

==> WARNING: Using smartmontools or hdparm with this
drive may result in data loss due to a firmware bug.
****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******
Buggy and fixed firmware report same version number!
See the following web pages for details:
http://knowledge.seagate.com/articles/en_US/FAQ/223571en
http://www.smartmontools.org/wiki/SamsungF4EGBadBlocks

Just what everyone needs, an ambiguous warning that you may have just wrecked your hard drive. The Seagate article is enlightening:

"If the host issues an identify command during an action of writing data in NCQ, the data's writing can be destabilized, and can lead to data loss."

I was being sarcastic when I said it was enlightening. What does that even mean? You can download a firmware patch from that page, but it's an .exe file, which only runs in Windows. There are ways to extract an image from an .exe file that you can use in Linux, but it makes me tired and exasperated even thinking about it. Check out Flashing BIOS from Linux in the Arch Linux Wiki to learn more about forcing .exe files to be usable on real operating systems.

So, getting back to the warning. The Smartmontools Wiki page offers actual information:

"Problem: If the system writes to this disk and smartctl -a (5.40) is used at the same time, write errors are reported and bad blocks appear on the disk."

This is an issue with the disk firmware and not smartmontools, and it applies to hdparm as well. Chances are it's not an issue anymore:

"Update: According to Samsung Support, HD204UI drives manufactured December 2010 or later include the firmware patch...The warning will also be printed when the patch is already installed!"

So how do you know if the patch is installed? Check the label on your hard drive, which should have the date of manufacture. If it's an older drive then you must decide if you want to apply the patch just to keep smartmontools happy.

One More Time

Using computers involves a lot of detours. Let's get back on track and look at the basic information that smartctl prints, using a non-Samsung drive:

$ sudo smartctl -i /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1CH164
Serial Number:    Z240S0F3
LU WWN Device Id: 5 000c50 05080924c
Firmware Version: CC26
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  6 10:56:00 2017 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

This is a nice bundle of useful information, containing everything but the date of manufacture. Sometimes, you can visit the manufacturer's site and use the serial number to learn if the drive is still under warranty, and decode the date information. If SMART support is not enabled then enable it:

$ sudo smartctl -s on /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

smartctl -s off [device] disables it.

Want to see a complete data dump? Use the -x option:

$ sudo smartctl -x /dev/sdb

Health Check

Let's run a quick health check:

$ sudo smartctl -H /dev/sdb                                                                                                   
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build)                                                  
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org                                                  
                                                                                                                             
=== START OF READ SMART DATA SECTION ===                                                                                     
SMART overall-health self-assessment test result: PASSED

Hurrah! It passed. Use -Hc to see more details. Now, just for fun, check the logfile for errors:

$ sudo smartctl -l error /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged

Well, this sure is turning into a boring exercise. Which is fine with me. Some forms of boredom are good. What do you do if there are errors? Consult the Smartmontools FAQ which ones are significant.

Running Self-Tests

You can run a short and a long self-test: smartctl -t short /dev/sdb and smartctl -t long /dev/sdb. smartctl tells you how long the test will run. You won't get any notifications when it's finished. Check the results by reading the log:

$ sudo smartctl -l selftest /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description Status                Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline   Completed without error       00%      5357

smartd Daemon and Notifications

You can run smartd, the SMART daemon, to continually monitor your drives and email you to report possible troubles. On Debian/Ubuntu/etc. you'll edit /etc/default/smartmontools to automatically launch smartd at startup, and edit /etc/smartd.conf to configure monitoring and notifications. Most distros install /etc/smartd.conf, and you'll use your distro's method of launching smartd at boot.

Learn more about Linux through the free "Introduction to Linux" course from The Linux Foundation and edX.

Click Here!