February 11, 2013

Query on linux libsas/libata releases/updates




We have a driver which is connected with SCSI upper layers libsas/libata drivers. When we use the updated libsas/libata libraries like those in RHEL 6.3, medium error handling works fine. When we use kernels with older libsas/libata libraries with medium error (details below) the system crashes

My question is:
- Is there a recommend way to release our driver with these updated libraries?
- If there are none, is there an easy solution for customers’ to update only these components instead of the entire kernel?

Thanks in advance for your help!


Details of the issue:
1. If a target/drive has medium error and IO has been aborted, during this phase LibATA has some issues in this Error Handling Path and system eventually crashes.
a. This is very consistent with SUSE11SP2 (3.0.13) Kernel
b. This very same issue with Debian 6.0.3 till 6.0.6
c. With RHEL6.3 everything is working fine, since the Libsas/LibATA changes are back-ported from 3.4 kernels to their RHEL6.3 Kernel (2.6.32-279).
- Medium Error Reported by drive for an IO either Read/Write_FPDMA (NCQ Command)
- Firmware Raise NCQ Event
- Holds the IO expects RLE and puts the drive into Error State
- Internally driver is issuing RLE because we don’t have the IO Context
- FW/Drive processes RLE
- Driver Receives RLE Response
- Issues Abort ALL (as per SATA Spec)
- FW releases all IO’s by completing as IO Aborted
- Driver Completes these to Midlayer

In the Successful case the sequence follows:
- Receives RLE, but driver is faking it now
- then receives Hard-Resetting Link
- Domain Revalidation
- Rediscover
- IO’s Successfully restarted.

In the Failure case the sequence follows:
- System hangs

