How People Collaborate on Linux Kernel Mailing Lists
Linux is one of the largest and most successful open source projects in history. According to a 2016 report from The Linux Foundation, more than 13,500 developers from more than 1,300 companies have contributed to the Linux kernel since tracking began 11 years ago.
At Open Source Summit in Los Angeles, Dawn Foster, a part-time consultant at The Scale Factory and a PhD student at the University of Greenwich in London, will share her research into how these many developers and contributors collaborate on the Linux kernel mailing lists, including network visualizations of mailing list interactions between contributors.
With more than 20 years of experience in business and technology with expertise in open source software, community building and management, market research, and more, Foster says that while there is quite a bit of data about the people and companies who commit Linux kernel code, there isn't much data about how people work together on the mailing lists where they decide what patches will be accepted.
In this interview, Foster — who is passionate about bringing people together through a combination of online communities and real-world events — shares some insight about her research and her upcoming talk. In her presentation at OS Summit, you can expect to learn more about the people, their employers, and other data that impacts participation on the Linux kernel mailing lists.
Linux.com: What value/benefit do you see in learning how these developers work together?
Foster: When I worked in the Open Source Technology Center at Intel, we had quite a few kernel developers on the team, and I was always interested in how they worked closely together with other people from a wide variety of companies, including our competitors.
One of the interesting things about the Linux kernel is that the vast majority of people who contribute to the kernel are employed by companies to do this work. However, most of the academic research on open source software assumes that participants are volunteers. Based on my experience, I know that this assumption isn't valid for many projects, and I wanted to use my time as a PhD student to do research that takes corporate involvement in open source projects into account. The hope is that after I finish my PhD and go back to work in the technology sector, I will have encouraged more researchers to include the employer relationship in their open source research.
However, looking at this employer relationship is a bit tricky. When I talk to kernel developers, they almost always say that it's the contributions that are important, not your employer. As a part of my PhD research, I interviewed 16 kernel developers and most of them said that they really don't pay much attention to where someone works. So, from a more practical standpoint, I wanted to test this idea out a bit and explore whether where you work, along with several other factors, influences how people work together in the kernel.
Linux.com: Don’t threads on mailing lists provide enough data?
Foster: The focus of my research is on collaboration, and looking at which people tend to work together on the kernel. With the Linux kernel, discussions about patches happen on various mailing lists, so it's really the best place to look if you want to understand how people are working together.
The hard part is that there are well over 100 unique mailing lists for various subsystems, so the mistake I see some researchers making is picking just LKML (the main Linux Kernel Mailing List) to study. However, this ignores the fact that most of the work and the collaboration on patches happens on subsystem mailing lists, not LKML. Since I'm interested in how people work together and because people tend to work together in subsystems, I've been picking a few subsystems for my research, USB for example.
This doesn't mean that I'm ignoring the source code. One of the things that I include in my research as something that might influence how people work together is whether someone is an active contributor (has recently made code commits) or is a maintainer for some part of the code.
Linux.com: The discussions seen on mailing lists are purely about that patch/project. Is it possible that non-public discussions within companies or group of developers play a much more important role in how these people work together?
Foster: The reality is that there is no way to get perfect data when trying to understand how humans work with each other. It would never be practical to get access to the content of every hallway discussion, conference, meeting, private chat or internal email. So, we work with what we have, which is mailing list data. The kernel developers that I talked to said that all of the discussions should happen on the mailing lists, but most of them also admitted to having informal discussions with friends and coworkers about patches, especially when they wanted advice. It's also quite common for new contributors to get initial feedback and coaching outside of the public mailing lists while they are developing a patch, and some people collaborate internally with other employees when working on patches.
As I mentioned earlier, most of the discussions don't really happen on LKML, but on the various mailing lists for each subsystem. Since the patches eventually end up on one of the subsystem mailing lists where they can be discussed, I can learn about the formal collaboration that happens around these patches. All research is based on assumptions and limitations. I know that there are certainly some informal discussions that aren't included in my analysis, which is a limitation, but my assumption is that the important discussions happen on the mailing list. The idea is that enough of the important discussions happen on the subsystem mailing lists to make it a feasible way to look at how people work together within that subsystem.
Linux.com: Who will benefit from this data? Will it help developers or companies and organizations become more efficient in ensuring their participation?
Foster: In my experience working with open source communities, the people within those communities tend to be interested in learning more about how people work together. It's easy for people to get caught up in doing their bits of work, and it can be interesting to see the project from a different perspective. I also suspect that some developers will be interested in attending just to see if I'm “getting it right," and I welcome those people to my talk. I'd rather know if I'm making mistakes or bad assumptions now, while I still have time to fix them before I finish my dissertation in the next six months.
I also think that it will help companies and organizations, especially ones who are newer to the Linux kernel, understand how people work together. However, I think that even experienced companies might have something to learn about some of the nuances behind how people collaborate on the kernel.
Linux.com: Who should attend your talk and why?
Foster: The talk is intended for Linux kernel developers and other people who are interested in better understanding how kernel developers collaborate on the mailing lists. In addition to kernel developers, people from companies who are a bit newer to kernel development should find it interesting, and I would expect people who are interested in community and metrics to enjoy the talk.