The Beginner’s Guide to Contributing to Open Source Projects

November 21, 2018

The practice of building and maintaining open source software works because people from all over the world, of all abilities and backgrounds, form communities to support the projects they care about. It’s not difficult to support the open source projects you use every day, and the efforts you make will have tangible effects on the quality of that software.

Most open source projects don’t have a dedicated staff to support them. Instead, developers and users from around the world work on them, often in their spare time. For many programmers, though, the thought of contributing to open source projects seems too difficult and time-consuming. They think that you have to be a programming genius blessed with unlimited free time to make a meaningful contribution.

That’s simply not true. Successful open source projects thrive on a wide variety of contributions from people with all levels of coding skills and commitment.

Home Assistant: The Python Approach to Home Automation

Eric Brown

November 20, 2018

A number of home automation platforms support Python as an extension, but if you’re a real Python fiend, you’ll probably want Home Assistant, which places the programming language front and center. Paulus Schoutsen created Home Assistant in 2013 “as a simple script to turn on the lights when the sun was setting,” as he told attendees of his 2016 Embedded Linux Conference and Open IoT conference presentation. (You can watch the complete video below.)

Schoutsen, who works as a senior software engineer for AppFolio in San Diego, has attracted 20 active contributors to the project. Home Assistant is now fairly mature, with updates every two weeks and support for more than 240 different smart devices and services. The open source (MIT license) software runs on anything that can run Python 3, from desktop PCs to a Raspberry Pi, and counts thousands of users around the world.

Like most automation systems, Home Assistant offers mobile and desktop browser clients to control smart home devices from afar. It differs from most commercial offerings, however, in that it has no hub appliance, which means there are no built-in radios. You can add the precisely those radios you want, however, using USB sticks. There’s also no cloud component, but Schoutsen argues that any functionality you might sacrifice because of this is more than matched by better security, privacy, and resiliency.

“There is no dependency on a cloud provider,” said Schoutsen. “Even when the Internet goes down, the home doesn’t shut down, and your very private data stays in your home.”

Schoutsen did not offer much of a promo in his presentation, but quickly set to work explaining how the platform works. Since Home Assistant is not radically different from other IoT frameworks — one reason why it interfaces easily with platforms ranging from Nest to Arduino to Kodi — the presentation is a useful introduction to IoT concepts.

To get a better sense of Home Assistant’s strengths, I recently asked Schoutsen for his elevator pitch. He highlighted the free, open source nature of the software, as well as the privacy and security of a local solution. He also noted the ease of setup and discovery, and the strength of the underlying Python language.

Easy Extensions

“Python makes it very easy to extend the system,” Schoutsen told me. “As a dynamic language it allows a flexibility that Java developers can only dream off. It is very easy to test out and prototype new pieces on an existing installation without breaking things permanently. With the recent introduction of MicroPython, which runs on embedded systems as Arduino and ESP8266, we can offer a single language for all levels of IoT: from sensors to automation to integration with third-party services.”

In Schoutsen’s ELC 2016 presentation, he described how Home Assistant is an event-driven program that incorporates a state machine that keeps track of “entities” — all the selected devices and people you want to track. Each entity has an identifier, a state condition, and attributes. The latter describes more about the state, such as the color and intensity of the light on a Philips Hue smart bulb.

To integrate a Philips Hue into the system, for example, you would need to use a light “component,” which is aware of the bulb and how to read its state (off or on). Home Assistant offers components for every supported device or service, as well as easy access to component groups such as lights, thermostats, switches, and garage doors. Setup is eased with a network discovery component that scans the network and, if you have a supported device, sets it up automatically.

The software is further equipped with a service registry, which provides services over the event bus. “We can register the turn-on command for a light, and have it send an email or SMS,” said Schoutsen. “A timer can send a time change event every second, and a component can ask to be notified at a particular time, or in intervals. Based on time change events, it will trigger the callback of the components.”

Each component writes its state to the state machine, emitting a state change event to the event bus. “The light component would register its turn on service inside the service registry so that anyone could fire an event to the event bus to turn on the light,” said Schoutsen.

You can easily integrate a light component with a motion detector component using an automation component. This would listen to the motion detector events, and fire a “turn light on” event to the event bus, which in turn would be forwarded to the service registry. The registry would then check to see that the light component can handle the event. “Automation components can listen for events, observe certain attribute states or triggers, and act on them,” explained Schoutsen.

Another component type handles presence detection. “The platform can check the router to see which phones are connected in order to see who is home,” said Schoutsen. “Other components are responsible for recording event and state history, or for entity organization — grouping multiple entities and summarizing their state.” Components are available for integrating third party services, such as MQTT or IFTTT, and other components export data to external databases and analysis tools.

Schoutsen went on to explain concepts such as a “platform” layer that sits above the entity components. Each platform integrates an “abstract base class,” which “acts as the glue between the real device and the one represented in Home Assistant,” said Schoutsen. Later, he ran through a code example for a basic switch and explored the use of trigger zones for geofencing.

As Schoutsen says, Home Assistant is “gaining a lot of traction.” Check out the complete video to see what happens when Python meets IoT.

https://www.youtube.com/watch?v=4-6rTwKl6ww

The State of the Octoverse: Top Programming Languages of 2018

GitHub

November 20, 2018

At the core of every technology on GitHub is a programming language. In this year’s Octoverse report, we published a brief analysis of which ones were best represented or trending on GitHub. In this post, we’ll take a deeper dive into why—and where—top programming languages are popular.

There are dozens of ways to measure the popularity of a programming language. In our report, we used the number of unique contributors to public and private repositories tagged with the appropriate primary language. We also used the number of repositories created and tagged with the appropriate primary language.

Top programming languages by repositories created, 2008-2018

Top repositories, year over year, by number of repositories created

Introducing the Non-Code Contributor’s Guide

Kubernetes Blog

November 20, 2018

It was May 2018 in Copenhagen, and the Kubernetes community was enjoying the contributor summit at KubeCon/CloudNativeCon, complete with the first run of the New Contributor Workshop. As a time of tremendous collaboration between contributors, the topics covered ranged from signing the CLA to deep technical conversations. Along with the vast exchange of information and ideas, however, came continued scrutiny of the topics at hand to ensure that the community was being as inclusive and accommodating as possible. Over that spring week, some of the pieces under the microscope included the many themes being covered, and how they were being presented, but also the overarching characteristics of the people contributing and the skill sets involved. From the discussions and analysis that followed grew the idea that the community was not benefiting as much as it could from the many people who wanted to contribute, but whose strengths were in areas other than writing code.

This all led to an effort called the Non-Code Contributor’s Guide.

Now, it’s important to note that Kubernetes is rare, if not unique, in the open source world, in that it was defined very early on as both a project and a community. While the project itself is focused on the codebase, it is the community of people driving it forward that makes the project successful. The community works together with an explicit set of community values, guiding the day-to-day behavior of contributors whether on GitHub, Slack, Discourse, or sitting together over tea or coffee.

By having a community that values people first, and explicitly values a diversity of people, the Kubernetes project is building a product to serve people with diverse needs. The different backgrounds of the contributors bring different approaches to the problem solving, with different methods of collaboration, and all those different viewpoints ultimately create a better project.

The Non-Code Contributor’s Guide aims to make it easy for anyone to contribute to the Kubernetes project in a way that makes sense for them. This can be in many forms, technical and non-technical, based on the person’s knowledge of the project and their available time. Most individuals are not developers, and most of the world’s developers are not paid to fully work on open source projects. Based on this we have started an ever-growing list of possible ways to contribute to the Kubernetes project in a Non-Code way!

Get Involved

Some of the ways that you can contribute to the Kubernetes community without writing a single line of code include:

Community education, answering questions on Discuss, StackOverflow, and Slack
Outward facing community work such as hosting meetups and events
Writing project documentation
Writing operational manuals, helping users understand how to run Kubernetes
Helping deliver Kubernetes, as a part of the release team
Project, program, and product management
And many more!

The guide to get started with Kubernetes project contribution is documented on Github, and as the Non-Code Contributors Guide is a part of that Kubernetes Contributors Guide, it can be found here. As stated earlier, this list is not exhaustive and will continue to be a work in progress.

To date, the typical Non-Code contributions fall into the following categories:

Roles that are based on skill sets other than “software developer”
Non-Code contributions in primarily code-based roles
“Post-Code” roles, that are not code-based, but require knowledge of either the code base or management of the code base

If you, dear reader, have any additional ideas for a Non-Code way to contribute, whether or not it fits in an existing category, the team will always appreciate if you could help us expand the list.

If a contribution of the Non-Code nature appeals to you, please read the Non-Code Contributions document, and then check the Contributor Role Board to see if there are any open positions where your expertise could be best used! If there are no listed open positions that match your skill set, drop on by the #sig-contribex channel on Slack, and we’ll point you in the right direction.

We hope to see you contributing to the Kubernetes community soon!

This article originally appeared on the Kubernetes Blog.

Learn Node.js, Unit 3: A tour of Node.js

IBM developerWorks

November 20, 2018

Node is often described as “JavaScript on the server”, but that doesn’t quite do it justice. In fact, any description of Node.js I can offer will be unfairly reductionist, so let me start with the one provided by the Node team:

“Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine.” (Source)

That’s a fine description, but it kinda needs a picture, doesn’t it? If you look on the Node.js website, you’ll notice there are no high-level diagrams of the Node.js architecture. Yet, if you search for “Node.js architecture diagram” there are approximately 178 billion different diagrams that attempt to paint an overall picture of Node (I’ll refer to Node.js as Node from now on). After looking at a few of them, I just didn’t see one that fit with the way I’ve structured the material in this course, so I came up with this:

Node Architecture

Figure 1. The Node.js architecture stack

What Is Machine Learning? We Drew You Another Flowchart

MIT Technology Review

November 20, 2018

The vast majority of the AI advancements and applications you hear about refer to a category of algorithms known as machine learning. (For more background on AI, check out our first flowchart here.)

Machine-learning algorithms use statistics to find patterns in massive* amounts of data. And data, here, encompasses a lot of things—numbers, words, images, clicks, what have you. If it can be digitally stored, it can be fed into a machine-learning algorithm.

Machine learning is the process that powers many of the services we use today—recommendation systems like those on Netflix, YouTube, and Spotify; search engines like Google and Baidu; social-media feeds like Facebook and Twitter; voice assistants like Siri and Alexa. The list goes on.

Practical Networking for Linux Admins: TCP/IP

Linux.com Editorial Staff

November 19, 2018

Get to know networking basics with this tutorial from our archives.

Linux grew up with a networking stack as part of its core, and networking is one of its strongest features. Let’s take a practical look at some of the TCP/IP fundamentals we use every day.

It’s IP Address

I have a peeve. OK, more than one. But for this article just one, and that is using “IP” as a shortcut for “IP address”. They are not the same. IP = Internet Protocol. You’re not managing Internet Protocols, you’re managing Internet Protocol addresses. If you’re creating, managing, and deleting Internet Protocols, then you are an uber guru doing something entirely different.

Yes, OSI Model is Relevant

TCP is short for Transmission Control Protocol. TCP/IP is shorthand for describing the Internet Protocol Suite, which contains multiple networking protocols. You’re familiar with the Open Systems Interconnection (OSI) model, which categorizes networking into seven layers:

7. Application layer
6. Presentation layer
5. Session layer
4. Transport layer
3. Network layer
2. Data link layer
1. Physical layer

The application layer includes the network protocols you use every day: SSH, TLS/SSL, HTTP, IMAP, SMTP, DNS, DHCP, streaming media protocols, and tons more.

TCP operates in the transport layer, along with its friend UDP, the User Datagram Protocol. TCP is more complex; it performs error-checking, and it tries very hard to deliver your packets. There is a lot of back-and-forth communication with TCP as it transmits and verifies transmission, and when packets get lost it resends them. UDP is simpler and has less overhead. It sends out datagrams once, and UDP neither knows nor cares if they reach their destination.

TCP is for ensuring that data is transferred completely and in order. If a file transfers with even one byte missing it’s no good. UDP is good for lightweight stateless transfers such NTP and DNS queries, and is efficient for streaming media. If your music or video has a blip or two it doesn’t render the whole stream unusable.

The physical layer refers to your networking hardware: Ethernet and wi-fi interfaces, cabling, switches, whatever gadgets it takes to move your bits and the electricity to operate them.

Ports and Sockets

Linux admins and users have to know about ports and sockets. A network socket is the combination of an IP address and port number. Remember back in the early days of Ubuntu, when the default installation did not include a firewall? No ports were open in the default installation, so there were no entry points for an attacker. “Opening a port” means starting a service, such as an HTTP, IMAP, or SSH server. Then the service opens a listening port to wait for incoming connections. “Opening a port” isn’t quite accurate because it’s really referring to a socket. You can see these with the netstat command. This example displays only listening sockets and the names of their services:

$ sudo netstat -plnt 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address     Foreign Address State  PID/Program name
tcp        0      0 127.0.0.1:3306    0.0.0.0:*       LISTEN 1583/mysqld     
tcp        0      0 127.0.0.1:5901    0.0.0.0:*       LISTEN 13951/qemu-system-x  
tcp        0      0 192.168.122.1:53  0.0.0.0:*       LISTEN 2101/dnsmasq
tcp        0      0 192.168.122.1:80  0.0.0.0:*       LISTEN 2001/apache2
tcp        0      0 192.168.122.1:443 0.0.0.0:*       LISTEN 2013/apache2
tcp        0      0 0.0.0.0:22        0.0.0.0:*       LISTEN 1200/sshd            
tcp6       0      0 :::80             :::*            LISTEN 2057/apache2    
tcp6       0      0 :::22             :::*            LISTEN 1200/sshd            
tcp6       0      0 :::443            :::*            LISTEN 2057/apache2

This shows that MariaDB (whose executable is mysqld) is listening only on localhost at port 3306, so it does not accept outside connections. Dnsmasq is listening on 192.168.122.1 at port 53, so it is accepting external requests. SSH is wide open for connections on any network interface. As you can see, you have control over exactly what network interfaces, ports, and addresses your services accept connections on.

Apache is listening on two IPv4 and two IPv6 ports, 80 and 443. Port 80 is the standard unencrypted HTTP port, and 443 is for encrypted TLS/SSL sessions. The foreign IPv6 address of :::* is the same as 0.0.0.0:* for IPv4. Those are wildcards accepting all requests from all ports and IP addresses. If there are certain addresses or address ranges you do not want to accept connections from, you can block them with firewall rules.

A network socket is a TCP/IP endpoint, and a TCP/IP connection needs two endpoints. A socket represents a single endpoint, and as our netstat example shows a single service can manage multiple endpoints at one time. A single IP address or network interface can manage multiple connections.

The example also shows the difference between a service and a process. apache2 is the service name, and it is running four processes. sshd is one service with one process listening on two different sockets.

Unix Sockets

Networking is so deeply embedded in Linux that its Unix domain sockets (also called inter-process communications, or IPC) behave like TCP/IP networking. Unix domain sockets are endpoints between processes in your Linux operating system, and they operate only inside the Linux kernel. You can see these with netstat:

$ netstat -lx     
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ACC ]     STREAM     LISTENING     988      /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     29730    /run/user/1000/systemd/private
unix  2      [ ACC ]     SEQPACKET  LISTENING     357      /run/udev/control
unix  2      [ ACC ]     STREAM     LISTENING     27233    /run/user/1000/keyring/control

It’s rather fascinating how they operate. The SOCK_STREAM socket type behaves like TCP with reliable delivery, and SOCK_DGRAM is similar to UDP, unordered and unreliable, but fast and low-overhead. You’ve heard how everything in Unix is a file? Instead of networking protocols and IP addresses and ports, Unix domain sockets use special files, which you can see in the above example. They have inodes, metadata, and permissions just like the regular files we use every day.

If you want to dig more deeply there are a lot of excellent books. Or, you might start with man tcp and man 2 socket. Next week, we’ll look at network configurations, and whatever happened to IPv6?

Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.

Open Source 2018: It Was the Best of Times, It Was the Worst of Times

Ben Golub

November 19, 2018

Recently, IBM announced that it would be acquiring Red Hat for $34 billion, a more-than-60-percent premium over Red Hat’s market cap, and a nearly 12x multiple on revenues. In many ways, this was a clear sign that 2018 was the year commercial open source has arrived, if there was ever previously a question about it before.

Indeed, the Red Hat transaction is just the latest in a long line of multi-billion dollar outcomes this year. To date, more than $50 billion dollars have been exchanged in open source IPOs and mergers and acquisitions (M&A); and all of the M&A deals are considered “mega deals” — those valued over $5 billion.

IBM acquired Red Hat for $34 billion
Hortonworks’ $5.2 billion merger with Cloudera
Elasticsearch IPO – $4+billion
Pivotal IPO – $3.9 billion
Mulesoft acquired by Salesforce – $6.5 Billion

If you’re a current open source software (OSS) shareholder, it may feel like the best of times. However, If you’re an OSS user or emerging open source project or company, you might be feeling more ambivalent.

On the positive side, the fact that there have been such good financial outcomes should come as encouragement to the many still-private and outstanding open-source businesses (e.g., Confluent, Docker, HashiCorp, InfluxDB). And, we can certainly hope that this round of exits will encourage more investors to bet on OSS, enabling OSS to continue to be a prime driver of innovation.

However, not all of the news is rosy.

First, since many of these exits were in the form of M&A, we’ve actually lost some prime examples of independent OSS companies. For many years, there was a concern that Red Hat was the only example of a public open source company. Earlier this year, it seemed likely that the total would grow to 7 (Red Hat, Hortonworks, Cloudera, Elasticsearch, Pivotal, Mulesoft, and MongoDB). Assuming the announced M&As close as expected, the number of public open source companies is back down to four, and the combined market cap of public open source companies is much less than it was at the start of the year.

We Need to Go Deeper

I think it’s critical that we view these open source outcomes in the context of another unavoidable story — the growth in cloud computing.

Many of the open source companies involved share an overlooked common denominator: they’ve made most of their money through on-premise businesses. This probably comes as a surprise, as we regularly hear about cloud-related milestones, like the one that states that more than 80% of server workloads are in the cloud, that open source drives ⅔ or more of cloud revenues, and that the cloud computing market is expected to reach $300 billion by 2021.

By contrast, the total revenues of all of the open source companies listed above was less than $7B. And, almost all of the open source companies listed above have taken well over $200 million in investment each to build out direct sales and support to appropriately sell to the large, on premises enterprise market.

yRPFSfntUxV0-LzXJSZJDUuMjBJP_v6jIbOg4MQW

Open Source Driving Revenue, But for Whom?

The most common way that open source is used in the cloud is as a loss-leader to sell infrastructure. The largest cloud companies all offer free or near-free open source services that drive consumption of compute, networking, and storage.

To be clear, this is perfectly legal, and many of the cloud companies have contributed generously in both code and time to open source. However, the fact that it is difficult for OSS companies to monetize their own products with a hosted offering means that they are shut off from one of the most important and sustainable paths to scaling. Perhaps most importantly, OSS companies that are independent are largely closed off from the fastest growing segment of the computing market. Since there are only a handful of companies worldwide with the scale and capital to operate traditional public clouds (indeed, Amazon, Google, Microsoft, and Alibaba are among the largest companies on the planet), and those companies already control a disproportionate share of traffic, data, capital and talent, how can we ensure that investment, monetization, and innovation continue to flow in open source? And, how can open source companies sustainably grow.

For some OSS companies, the answer is M&A. For others, the cloud monetization/competition question has led them to adopt controversial and more restrictive licensing policies, such as Redis Lab’s adoption of the Commons Clause and MongoDB’s Server Side License.

But there may be a different answer to cloud monetization. Namely, create a different kind of cloud, one based on decentralized infrastructure.

Rather than spending billions to build out data centers, decentralized infrastructure approaches (like Storj, SONM, and others), provide incentives for people around the world to contribute spare computing, storage or network capacity. For example, by fairly and transparently allowing storage node operators to share in the revenue generated (i.e., by compensating supply), Storj was able to rapidly grow to a network of 150,000 nodes in 180 countries with over 150 PB of capacity–equivalent to several large data centers. Similarly, rather than spending hundreds of millions on traditional sales and marketing, we believe there is a way to fairly and transparently compensate those who bring demand to the network, so we have programmatically designed our network so that open source companies whose projects send users our way can get fairly and transparently compensated proportional to the storage and network usage they generate. We are actively working to encourage other decentralized networks to do the same, and believe this is the future of open cloud computing

This isn’t charity. Decentralized networks have strong economic incentives to compensate OSS as the primary driver of cloud demand. But, more importantly, we think that this can help drive a virtuous circle of investment, growth, monetization, and innovation. Done correctly, this will ensure that the best of times lay ahead!

Ben Golub is the former CEO of Docker and interim CEO at Storj Labs.

Watch the Open Source Summit keynote presentation from Ben Golub and Shawn Wilkinson to learn more about open source and the decentralized web.

Beyond Passwords: 2FA, U2F and Google Advanced Protection

Troy Hunt

November 19, 2018

Last week I wrote a couple of different pieces on passwords, firstly about why we’re going to be stuck with them for a long time yet and then secondly, about how we all bear some responsibility for making good password choices. A few people took some of the points I made in those posts as being contentious, although on reflection I suspect it was more a case of lamenting that we shouldn’t be in a position where we’re still dependent on passwords and people needing to understand good password management practices in order for them to work properly.

This week, I wanted to focus on going beyond passwords and talk about 2FA. Per the title, not just any old 2FA but U2F and in particular, Google’s Advanced Protection Program. This post will be partly about 2FA in general, but also specifically about Google’s program because of the masses of people dependent on them for Gmail. Your email address is the skeleton key to your life (not just “online” life) so protecting that is absolutely paramount.

Let’s start with defining some terms because they tend to be used a little interchangeably. Before I do that, a caveat: every single time I see discussion on what these terms mean, it descends into arguments about the true meaning and mechanics of each. Let’s not get bogged down in that and instead focus on the practical implications of each.

How to Install a Device Driver on Linux

OpenSource.com

November 19, 2018

…most default Linux drivers are open source and integrated into the system, which makes installing any drivers that are not included quite complicated, even though most hardware devices can be automatically detected.

To learn more about how Linux drivers work, I recommend reading An Introduction to Device Drivers in the book Linux Device Drivers.

Two approaches to finding drivers

1. User interfaces

If you are new to Linux and coming from the Windows or MacOS world, you’ll be glad to know that Linux offers ways to see whether a driver is available through wizard-like programs. Ubuntu offers the Additional Drivers option. Other Linux distributions provide helper programs, like Package Manager for GNOME, that you can check for available drivers.

2. Command line

What if you can’t find a driver through your nice user interface application? Or you only have access through the shell with no graphic interface whatsoever? Maybe you’ve even decided to expand your skills by using a console. You have two options: