How Google Uses and Contributes to Open Source
Engineer Marc Merlin has been working at Google since 2001 but has been involved with Linux since 1993, in its very early days. Since then, open source adoption has dramatically increased, but a new challenge is emerging: Not many companies care about the license side of open source, Merlin stated in his talk “How Google Uses and Contributes to Open Source” at LinuxCon and ContainerCon North America.
“There are companies and people who just take the software and say, “I didn't have to pay for it. I can do anything I want. The license file is a big blob of text. I'm not going to read that," Merlin said.
“If you get software that you didn't have to pay for and you get a license with it that gives you a few things that you should be doing, then you should be respecting the license,” Merlin said.
Respect the license
In some cases, companies aren’t aware that their products contain open source code, or they’re not informed about how to comply with open source licenses. Although some companies are well informed about the licenses, they choose not to care for many reasons: They move to the next product, they are too small to be sued, or they are in a country where they can do whatever they want.
Regardless of the reasons companies don’t comply, license compliance is a complicated, but extremely important, issue. Google didn’t always get it right when it came to open source management, Merlin said, but their practices provide lessons that can benefit any company that needs to improve their open source compliance.
Evolution of open source at Google
Merlin said that he has moved many times within Google, because he gets bored. What he ends up doing is helping teams make sure that they do the right thing when it comes to using and contributing to open source.
Back in its early days, around 1998, Google was a small company. It was using open source just like any other small company. While Google was abiding by licences, they were not giving back much due to several reasons. “Some of it was just run fast and make sure that we have money next month to pay everyone's salary,” said Merlin. Another fact was that Google was using very old versions of Linux. They had it frozen and whatever changes they were making were irrelevant to the current version of Linux, which was far ahead of what they were using.
In 2004, Chris DiBona joined Google to run the Open Source Program Office. “The whole point of the office is to help the company do the right thing, help engineers with their questions, and to make sure everyone plays by the rules,” said Merlin.
How you develop affects contributions
Google was working on so many different projects back then that it was -- and still is -- iterating fast.
“When you go fast, you just patch the source and move on to the next problem because things are on fire. It was difficult to get the patches part of it and contribute them,” said Merlin. There was no Git back then; those were CVS days, which made it even harder to keep track of source code.
In addition to using really old code, another problem was that most of what Google was working on internally was not designed to be open source at that time. There are very large internal libraries on which other projects depend. That made it hard to open source new things because you can’t just take one piece of it and open source it.
Go open source from the beginning
Google changed that by writing a lot of things from the ground up as open source or to be open source ready. That was a good lesson that they learned, and that’s a problem many companies face when they want to open source their stuff but can’t because the code was not designed to be open source from the beginning.
Even if Google can’t open source certain code, they found a way to bring that work to the public. “We wrote papers talking about the magic algorithm that we used. We can't give you the code for the reason I just explained, but we're giving you the way they work so you can rewrite them,” said Merlin. Google has published hundreds of such papers and people are using it to create projects based on those ideas.
Due to all these efforts, Google is today among the top 10 contributors to the Linux kernel. Google has hired many kernel developers and open source leaders, such as Andrew Morton, Theodore T’so, and Jeremy Allison who don’t do any work for Google; they work directly on Linux or their own projects. In addition, Google also has a team of kernel developers who write code for Google’s own needs.
Open source projects
Chrome and Android may be the biggest open source projects at Google, but there are several more open source projects. Google also encourages people to contribute to open source on their own time. Although many employer agreements define ownership of code written by their employees when it is in the same line of business, even on their own time, Google has a process where employees can get for ownership of code they wrote on their own time and hardware.
Merlin said that Google has set up a process whereby the company can disclaim that code and give it back to that employee, or Google can just approve and release that without having to go through the company. Google is also encouraging employees to use their Google email so that their contributions are more visible.
Google has internal bonus programs for employees to encourage them when they do something good, and people thought the company should do something similar for open source.
“We now have an internal program where we can nominate open source projects that may be one or two people -- not a huge group -- and could benefit from having a bit of extra hardware or extra money,” Merlin said.
Open source licenses
Licenses play a critical role when companies like Google choose to contribute to projects. Just because a project is open source doesn't mean Google employees can freely contribute to it.
“There are open source licenses that are problematic,” said Merlin. “There are some licenses that are not probably not legal licenses at all, at least in some countries.” In such cases, employees are asked not to work on that as employees. “If they insist on working on that project, we give them the right. We disclaim all their work and say ‘that’s not Google work, this is you as a person’. And then they do what they want with that project.”
There are cases where projects don’t have any licenses. “Sometimes we have to contact the project owners and tell them that their project doesn’t have a license. We'd like to contribute to it, but until you do, we can’t.” That helps those projects fix licenses and allows Google employees to contribute to such projects.
Patch submission is a tricky area, too. If the code is on GitHub, then that’s easy, but patch submission through email needs extra work. “We like to know what’s the project. What license that project is, we have a quick look at it and we'll just tell them if it’s OK or not OK to contribute to that project,” said Merlin.
Other ways Google gives back
Code contribution is not the only way Google is giving back. The company sponsors many open source and Linux-related events. It also sponsors organizations like Free Software Foundation and Software Freedom Conservancy and projects like OpenSSL.
At the same time, Google also gives back by creating platforms when needed; Code@Google is a good example. When Google saw that SourceForge was becoming questionable, they set up code.google.com to give a platform to developers until GitHub popped up and became the new de facto platform for open source development.
Now virtually all of Google’s open source code is on GitHub, except for Android. “The Android distribution is so big and it gets released in big chunks. So, when it gets released, everyone wants to sync that,” Merlin said. “It’s so huge that if we put it on GitHub, it would completely kill GitHub. We use our own mirrors for that, to help out.”
Google also runs programs that expose students to open source, they also try to encourage more women to get into the field.
Then, there is Google Summer of Code for students. “Instead of going to work for McDonald's in the summer to make money, which would be a waste of their brain power, we basically take submissions for them to work on open source projects,” said Merlin. “They get a mentor. If they do well, they get a stipend to pay for their time and allow them to work and pay their college bills, whatever that summer job money was going to be for.”
Dealing with licenses
Companies have to be extremely careful when using open source. Different projects use different licenses, and you need to be in compliance with them. To ensure that people don’t make mistakes, “we have an internal policy that you cannot use any open source at Google, unless it goes in a separate, third party hierarchy, and each of them has directories with the license of each project,” Merlin said. So, when you include your library, you know what license it is and you can easily check that. You can’t use open source if you do not follow that process. You cannot take code and paste it.”
Things become complicated when you have projects that you ship. In the case of open source, you need to list the projects that you use and their licenses. In the case of BSD and MIT, you need to list the name and the copyright of the person you got that project from. That can be a bit annoying, he said, as “you actually have to go and find every project, scan their license, find their name or the copyright line, and credit them. Otherwise, we're not compliant with the BSD or MIT license.”
Google has to be extremely careful when they buy a company. They may not have their code in compliance. Merlin said sometimes Google actually passes on companies because they're totally not compliant, and to bring them into compliance is not going to be possible.
What license does Google prefer?
Google uses Apache 2 by default. “The main reason is that it offers a patent license to the user. If you get code from us, under Apache 2, that means you get a grant to any patent that applies to the code that you just got from us,” said Merlin. That’s not true of GPLv2, though GPLv3 now does add a patent grant. However, GPLv3 has added a clause that prevents device manufacturers from restructuring reflashing devices, which is not an ideal solution for some companies.
Merlin mentioned that they can’t contribute to WTFPL licenses, and sometimes projects don’t change the license but are willing to dual license and although it’s not ideal it’s good enough for contribution. But, he also warned that dual licenses also can be trouble and suggested avoiding them. “Once you receive a patch, it’s unclear under which license that patch is received or if it's under both licenses.”
Talking about even freer licenses, Merlin said that while CC Zero is a proper license, it requires you to give up all your rights to the code to which you just contributed.
What about CLAs?
Contributor License Agreements (CLAs) themselves don’t mean anything; they are just legal agreements. There can be very bad legal agreements that you should never sign, or it may be completely safe. The problem is that you have to review CLAs on a case-by-case basis.
Google uses the Apache foundation ICLA, without modifying it or putting anything special in it. CLAs ensure that companies like Google “can re-license your code under a different open source (license) if we need to. Sometimes we need to merge with other projects and that's what the CLA allows us to do,” said Merlin.
Google employees are not allowed to sign a CLA for another company before it has been reviewed. Merlin said that most projects would be better off using Apache CLA. He shared an example of a project that he once contributed to and then the project decided to re-license it and they had to contact every single person who ever contributed a patch. Those people had to print a form, sign it, and fax it back within 10 days. It was a huge amount of wasted effort for them. Apache CLA takes care of it and makes your project future proof.
Merlin also warned about not using AGPL, as no one knows where it stops; it’s just way too broad, he said.
Merlin’s description of Google’s open source story highlighted how meticulous is the process needs to be to get open source right. The best part is that all these changes and motivations came from within the company. Through their efforts, they became one of the top contributors to Linux and created two dominant Linux-based operating systems in the consumer space. That’s a good open source story. Google it!
You won't want to miss the stellar lineup of keynotes, 185+ sessions and plenty of extracurricular events for networking at LinuxCon + ContainerCon Europe in Berlin. Secure your spot before it's too late! Register now.