This article contains some of the tips and lessons that I learned in achieving my first successful merged pull request to open source with some examples from my personal experience.
Background: planting the seeds for open-source collaboration
Collaboration is the fulcrum of Holberton School, the full-stack software engineering program in San Francisco where I received my software training. We have learned to manage our software systems and code bases on the Ubuntu distribution of Linux, and we use many other open-source technologies such as MySQL, Docker, VirtualBox, HAProxy, Emacs, GCC, git, and more. With this foundation, as we neared the end of our first 9 months of training, our leaders challenged us to begin contributing to the open source community; I decided to take on this challenge.
“You have to start somewhere”
Unfortunately, most new engineers face many challenges when trying to contribute to the open source software community. In no particular order, these are some of the challenges that I faced in my experience:
1. Contributing Guidelines: Documentation, Style, Methodology:
After the excitement of your first pull request, you may have felt disappointed after seeing an automated message similar to this one from the Linux repository on github:
” Thanks for your contribution to the Linux kernel! Linux kernel development happens on mailing lists, rather than on GitHub… So that your change can become part of Linux, please email it to us as a patch. Sending patches isn’t quite as simple as sending a pull request, but fortunately it is a well documented process.”
Or, you may have been immediately redirected by a personal message from Linus Torvalds himself.
When you make a pull request without reading the Contributing Guidelines.
Spending hours learning how to properly make a pull-request and properly document the code and pull request can be overwhelming for many. It was overwhelming for me, even though I am certain that learning such practices would improve my own collaborative, engineering, and development skills.
2. Advanced and large code bases:
The reality of entering the world of professional engineers as a student or new engineer presents with the difficulty of having to learn new code. This may include spending hours trying to learn code from immense systems and advanced functions before that you have never seen before being able to meaningfully contribute to the code bases. This presents as a time constraint since most beginners have to spend a lot of time learning and understanding the code.
There are many people eager to contribute to open source projects, who jump at opportunities to help fix a repository’s listed “issues” or make other minor improvements by refactoring code or fixing grammar mistakes in the documentation. A healthy thought is to not be threatened by such collaborators nor view them as competition. However, as a student seeking your first opportunity to help contribute to open source projects, such people quickly take much of the work that a beginner would be able to help with. In my case, these other contributing engineers inspired some of my competitive instincts, and so after a few ‘losses’ or missed opportunities, I did indeed feel like a defeated loser. Similar to the speed of Stack Overflow’s question response time, which can be within seconds, many open source projects’ issues have someone working to resolve them before I could even understand what the issue was.
Later I learned at DockerCon 2017, that many top companies have paid engineers, whose only job is to contribute to major open-source projects (see below image: Commits of committers per hour). This discouraged me from trying to contribute to major projects in the 1 – 2 hours of free time that I had per day when the professionals were spending 8 hours per day trying to do the same thing. The study, Paid vs. Volunteer Work in Open Source by: Riehle, Riemer, Kolassa, Schmidt, shows that About 50% of all work contributed to the open source software projects in the study occurred on the days Monday to Friday, between the hours 9am and 5pm. This does not prove, but suggests that these contributions were a part of some engineers’ employment.
Commits of committers per hour (to open source projects). Source: http://dirkriehle.com/wp-content/uploads/2013/08/paid-v8-final-web.pdf
4. Ghost Repositories
Indeed my first Pull Request to an open source project was to a repository that had no activity for the previous 4-5 months. I saw that the pull request history for the repository had many contributors and successfully merged pull requests, and so I was motivated to contribute to the repository. I was a little concerned that the most recent pull requests had not been merged to master, but I ignored this concern because I assumed that the changes were not accepted for the reason of failing the CI (continuous integration) tests. After I passed the CI tests in my forked repository, I made the pull request. However, I failed the CI test in my pull request to the parent repository, which is when I realized that the current master build, due to the most recent merged pull request to master from April 2017, had code that was responsible for failing the CI tests. I submitted an issue with a notification of this issue along with my pull request that fixed this issue and updated other code. Months later, neither my pull request nor my issue were reviewed nor commented on. I also noticed that the author has no activity on the repository since April 2017 nor has the master branch has been updated. Perhaps, the author is very busy or going through tough times, so there may be a decent explanation. Nevertheless, since I interpreted my first contribution as being entirely ignored by any contributor to the repository, I felt unimportant and defeated.
Last Merged Pull Request (failed CI tests, 4 months old).
Always a lesson, never a failure
All of the above difficulties, refined my GitHub manual parsing skills so that I focused on these main priorities:
Ensure that the repository is open to pull requests from new users.
Ensure that you can handle the contributing guidelines for methodology/ documentation.
Ensure the repository has been recently active in comments and that it has merged pull requests; or if not, at least that the most recent pull request has activity or reasons why it was not merged.
Ensure that you understand the code related to the issues and/ or any of the code anywhere in the code base.
Ensure that you understand at least one of the current issues.
Once you decide to make a contribution
Know how to make a pull request to a repository that you are not an official contributor to (hint: fork the repo).
Ensure the contribution idea provides value (perhaps base it off of an issue).
Start with a small change to evaluate how the repository’s maintainers respond to your pull request.
Keep a clean commit history for your pull request.
$ git commit --amend
This command opens up an interactive editor with the most recent commit message.
Change the message to what you want it to be, then save/quit. Any other additions you have made and added will be added to the most recent commit.
It should be executed after $ git add ….
If the commit that you overwrite is already pushed to a branch on the origin repository, then you will have to run $ git push –force origin [YOUR BRANCH].
Note: if you are making a force push be very careful because you will overwrite the repository you are pushing to. Be sure to make backups if you are unsure of what you are doing.
$ git rebase --interactive HEAD~[NUM]
This command opens up an interactive editor with the most recent ‘NUM’ commits.
Next to each commit, there will be the message “pick”. To remove this commit message, replace “pick” with “squash”. This removes the message and adds the saved code updates to the next previous commit before the squashed commit. Since changes are added to previous commits, you cannot squash the most previous commit from the interactive mode.
Then, save/quit, which will open another interactive editor.
You can optionally write a message in the bottom of this editor or write one comment to replace all of the squashed comments.
Then save/quit again.
Finally, run $ git push –force origin [YOUR BRANCH], to add these changes to another origin repository.
Keep in mind all of the maintainers requests in their review of your pull request and in the repository’s contribution guide (CONTRIBUTING[.md or .txt])
Test your code! Use the Continuous Integration tests. This might mean that you need to include a github integration in the repository that you forked to make your contribution. Test locally and in different environments.
My First Merged PR to Google open source
I first discovered Google’s Open Source Community from a mentor of our program, Nicolas Thiébaud, currently a software engineer at Google. Next, I browsed projects, in Google’s open source github group, written in languages that I was familiar with such as Python. I found a neat project called, ci_edit, which is a text editor with a command line interface. After browsing the repository, I positively confirmed that I could handle all of the points in my lessons learned checklist. I realized that while this was a text editor, it needed to be executed following the absolute path to the executable in the installation directory, or it had to be manually added to the execution PATH. I did consider creating a debian package, but decided not to because I had not done this ever and I thought that it would take more time. So I decided that my first contribution would be to create a bash installation script that would execute bash commands to:
1. Create or update an installation package
2. Add a symbolic link to the executable in PATH.
More of the functionality of the contributions and my explanations can be viewed in the comments of the pull requests:
Mistakes and Follow up
Hopefully, when you make your first pull request to the open source community, you will be working with a great team. Thankfully, the people I worked with were very helpful, and so I learned a lot from the suggestions from the maintainer of the repository. Some of the biggests tips were to be more verbose in my documentation, have better secure measures to avoid user error, and to add some better functionality for the user. I did incorporate all these changes; however, there were 2 major bugs (so far discovered) in the first pull request even after it was merged. One of the bugs was due to the methodology that the maintainer made in one of the lines of code, and another was because of an error on my part. Had I spent more time with local testing, I would have found this error. Thankfully, another contributor to the repository found the bug almost immediately and made us aware of it. My second pull request fixed these bugs and updated some of the documentation.
More Lessons Learned
When you make your first pull request, be sure to adequately test your code locally in various environments, incorporate any continuous integration tests, and if possible have a peer review your code. In my case, I believe that my haste to make a contribution before anyone else did and my concern that I would put in too much effort without anyone accepting my update were the main factors that motivated me to skip some of the local tests that I should have been thinking about. Also, if you don’t think the repository is maintained by helpful or currently active people, find a repository that has great collaborators. In my case, Dave Schuyler, @dschuyler, had a very collaborative and supportive way of working with me. He gave suggestions for updating my code, and always interacted respectfully about the problems he saw in my code. Additionally, he actively communicated with me throughout the github.com review process.
Now go forth and maintain some open source code!