Fixing the Linux Graphics Kernel for True DisplayPort Compliance, Or: How to Upstream a Patch
If you’ve ever hooked up a Linux computer to a DisplayPort monitor and encountered only a flickering or blank screen, we’ve got good news for you. A graphics kernel developer at Intel's Open Source Technology Center has solved the problem with a patch that will go into Linux 4.12. Manasi Navare’s patch modifies Atomic Kernel Mode Setting (KMS) technology to gracefully drop down to a lower resolution to display the image.
“Someone had to fix this problem, so I said okay, I have the knowledge and I have the community to help me,” said Navare at Embedded Linux Conference.
To hear Navare tell it, the hard part was not so much developing the fix as fully understanding the inner workings of DisplayPort (DP) compliance, the Linux graphics stack, and Atomic KMS. The task of upstreaming the patch was perhaps even more challenging, calling upon the right mix of persuasion, persistence, charm, and flexibility. At the end of the talk, Navare offered some tips for anyone pushing a patch upstream toward an eventual kernel merge.
Negotiating with DisplayPort
Navare started by explaining how a computer (the DP source) negotiates with a display (DP sink) to enable the desired resolution and other properties. When you connect the cable, the sink sends a signal that informs the source about the maximum link-outs and link rates supported by the sink. The source then initiates a DPCD (DisplayPort Configuration Data) read on the sink’s AUX channel, performs a calibration routine, and then launches a handshaking process called DP link training. This configures the main link out of the four possible DP links, each of which has different channel capacities.
“The first phase is clock recovery, sending out the known training packet sequence onto the main link,” said Navare. “The receiver extracts the block information to find if it can work at that linkage. Next comes channel equalization where the receiver tries to understand the link mapping. If these are successful, the link is ready, and the sink is set to receive the data at a specific link-out and link rate.”
Despite all these steps, the link training can still result in a blank or flickering display. “The link training could fail because you haven’t tested the physical capability of the cable until the very end of the process,” said Navare. “There is no way to send this information back to userspace because the commit phase was never expected to fail. It’s a dead end.”
To find a solution, Navare needed to test DP compliance. She used a Unigraf DPR 120 device, which has been certified by Mesa. The device sits between the source and sink and requests specific data or video packets to be sent to the DP monitor. “It maps those values onto the AUX channel and monitors all the transactions on the display cables,” said Navare. “It compares that data to the reference values, and if it matches, the device is compliant.”
Navare also needed to improve her understanding of the complex Linux graphics stack. The base level consists of an Intel Integrated Graphics Device layer -- a hardware layer for rendering the display and doing graphics acceleration. “On top of this sits the Linux kernel with the I19 Intel graphics driver, which knows how to configure the hardware according to userspace commands,” explained Navare.
At a higher layer within the same Linux kernel subsystem is the DRM (Direct Rendering Manager), which implements the part of the kernel that is common to different hardware specific drivers. “The DRM exposes the APIs to userspace, which sends information down to the hardware to request a specific display for rendering,” said Navare.
She also further explored KMS, which, among other things, scans the RGB pixel data in the plane buffers using the cathode ray tube controller (CRTC), which decides whether to generate DVI, HDMI, or DP signals.
“The CTRC generates the bitstream according to the video timings and sends the data to the encoder, which modifies the bitstream and generates the analog signals based on the connector type,” says Navare. “Then it goes to the connector and lights up the display.”
Once into the project, Navare realized her solution would need to support the new Atomic KMS version of KMS, which implements a secondary process that Navare called the two step. “When you connect the source with the sink, userspace creates a list of parameters that it wants to change on the hardware, and sends this out to the kernel using a DRM_IOCTL_MODE_ATOMIC call. The first step is the atomic check phase where it forms the state of the device and its structure for the different DRM mode objects: the plane, CRTC, or connector. It validates the mode requested by Userspace, such as 4K, to see if the display is capable.”
If successful, the process advances to the next stage -- atomic commit -- which sends the data to the hardware. “The expectation is that it will succeed because it has already been validated,” said Navare.
Yet even with Atomic KMS, you can still end up with a blank screen. Navare determined that the problem happened within Atomic KMS between the check and commit stages, where link training occurred.
Navare’s solution was to introduce a new property for the connector called link status. “If a commit fails, the kernel now tags the connect property as BAD,” she explained. “It sends the HPD back to the userspace, which requests another modeset, but at lower resolution. The kernel repeats the check and commit, and retrains the link at a lower rate.”
If the test passes, the link status switches to GOOD, and the display works, although at a lower resolution. “Atomic check is never supposed to fail, but link training is the exception because it depends on the physical cable,” said Navare. “The link might fail after a successful modeset because something can go wrong with the cable between initial hookup and test. This patch provides a way for the kernel to send that notification back to userspace. You have to go back to userspace because you have to repeat the process of setting the clock and rate, which you can’t do at the point of failure.”
A few tips on upstreaming
Navare added the new link status connector property to the DRM layer as part of an Upstream I915 driver patch, and submitted it to her manager at Intel. “I said, ‘It’s working now. What can I work on next?’ He replied: ‘Have you sent it upstream?’”
Navare submitted the patch to the public mailing list for the graphics driver, thereby beginning a journey that took almost a year. “It took a long time to convince the community that this would fix the problem,” said Navare. “You get constant feedback and review comments. I think I submitted 15 or 20 revisions before it was accepted. But you keep on submitting patch revisions until you get the ‘reviewed by’ and that’s the day you go party, right?”
Not exactly. The patch then gets merged into an internal DRM tree, where much more testing transpires. It finally gets merged into the main DRM tree where it’s sorted into DRM fixes or DRM next.
“Linus [Torvalds] pulls the patches from this DRM tree on a weekly basis and announces his release candidates,” said Navare. “It goes through the cycle of release candidates for a long time until it’s stable, and it finally becomes part of the next Linux release.”
Torvalds finally approved the patch for merger, and the champagne cork popped.
Navare also offered some general tips for the upstreaming process, which she calls Linus’s Rules. The first rule is “No regressions,” that is, no GPU hangs or blanks screens. “If you submit a patch it should not break something else in the driver, or else the review cycle can get really aggressive,” said Navare. “I had to leverage the community’s knowledge about other parts of the graphics driver.”
The second rule is “Never blame userspace, it’s always kernel’s fault.” In other words, “If the hardware doesn’t work as expected then the kernel developer is the one to blame,” she added.
The problem here is that kernel patches require changes in userspace drivers, which leads to “a chicken and egg situation,” said Navare. “It’s hard to upstream kernel changes without testing userspace… You can’t merge the kernel patches until you’ve tested the userspace, but you can’t merge userspace because the kernel changes have not yet landed. It’s very complicated.”
To prove her solution would not break userspace, Navare spent a lot of time interacting with userspace community and involved them in testing and submitting patches.
Another rule is that “Feedback is always constructive.” In other words, “don’t take it as criticism, and don’t take it personally,” she said. “I got reviews that said: ‘This sucks. It’s going to break link training, which is very fragile -- don’t touch that part of the driver.’ It was frustrating, but it really helped. You have to ask them why they think it’s going to break the code, and how they would fix it.”
The final rule is persistence. “You just have to keep pinging the maintainers and bugging them on IRC,” said Navare. “You will see the finish line, so don’t give up.”
Connect with the Linux community at Open Source Summit North America on September 11-13. Linux.com readers can register now with the discount code, LINUXRD5, for 5% off the all-access attendee registration price. Register now to save over $300!