Kubernetes and Docker Swarm are both popular and well-known container orchestration platforms. You don’t need a container orchestrator to run a container, but they are important for keeping your containers healthy and add enough value to mean you need to know about them.
This blog post introduces the need for an orchestrator then chalks-up the differences at an operational level between these two platforms.
What has orchestration done for you lately?
Even if you are not using Kubernetes or Swarm for your internal projects – it doesn’t mean that you’re not benefitting from their use. For instance ADP who provide iHCM and Payroll in the USA use Docker’s EE product (which is based around Swarm) to run some of their key systems.
By Gabriel Krisman Bertazi, Software Engineer at Collabora.
This blog post is based on the talk I gave at the Open Source Summit North America 2017 in Los Angeles. Let me start by thanking my employer Collabora, for sponsoring my trip to LA.
Last time I wrote about Performance Assessment, I discussed how an apparently naive code snippet can hide major performance drawbacks. In that example, the issue was caused by the randomness of the conditional branch direction, triggered by our unsorted vector, which really confused the Branch Predictor inside the processor.
An important thing to mention before we start, is that performance issues arise in many forms and may have several root causes. While in this series I have focused on processor corner-cases, those are in fact a tiny sample of how thing can go wrong for performance. Many other factors matter, particularly well-thought algorithms and good hardware. Without a well-crafted algorithm, there is no compiler optimization or quick hack that can improve the situation.
In this post, I will show one more example of how easy it is to disrupt performance of a modern CPU, and also run a quick discussion on why performance matters – as well as present a few cases where it shouldn’t matter.
If you have any questions, feel free to start a discussion below in the Comments section and I will do my best to follow-up on your question.
CPU Complexity is continuously rising
Every year, new generations of CPUs and GPUs hit the market carrying an always increasing count of transistors inside their enclosures as show by the graph below, depicting the famous Moore’s law. While the metric is not perfect on itself, it is a fair indication of the steady growth of complexity inside of our integrated circuits.
While all these mechanisms are tailored to provide good performance for the common case of programming and common data patterns, there are always cases where an oblivious programmer can end up hitting the corner case of such mechanisms, and not only write code which is unable to benefit from them, but also code which executes way worse than if there were no optimization mechanism at all.
As a general rule, compilers are increasingly great at detecting and modifying code to benefit from the CPU architecture, but there will always be cases where they won’t be able to detect bad patterns and modify the code. In those cases, there is no replacement for a capable programmer who understands how the machine is designed, and who can adjust the algorithm to benefit from its design.
When does performance really matter?
The first reaction of an inexperienced developer after learning about some of the architectural issues that affect performance, might be to start profiling everything he can get his hands on, to obtain the absolute maximum capability of his expensive new hardware. This approach is not only misleading, but an actual waste of time.
In a city that experiences traffic jams every day, there is little point in buying a faster car instead of taking the public bus. In both scenarios, you are going be stuck in the traffic for hours instead of arriving at your destination earlier. The same happens with your programs. Consider an interactive program that performs a task in background while waiting for user input, there is little point in trying to gain a few cycles by optimizing the task, since the entire system is still limited by the human input, which will always be much, much slower than the machine. In a similar sense, there is little point in trying to speed-up the boot time of a machine that almost never reboots, since the reboot time cost will be payed only rarely, when a restart is required.
In a very similar sense, the speed-up you gain by recompiling every single program in your computer with the fastest compiler optimizations possible for your machine, like some people like to do, is completely irrelevant, considering the fact that the machine will spend most of the time in an idle state, waiting for the next user input.
What actually makes a difference, and should be target of every optimization work, are cases where the workload is so intensive that gaining a few extra cycles very often will result in a real increase of the computing done in the long run. This requires, first off all, that the code being optimized is actually in the critical path of performance, which means that that part of the code is actually what is holding the rest of the system back. If that is not the case, the gain will be minimum and the effort will be wasted.
Moving back to the reboot example, in a virtualization environment, where new VMs or containers boxes need to be spawned very fast and very often to respond to new service requests, it makes a lot of sense to optimize reboot time. In that case, every microsecond saved at boot time matters to reduce to overall response of the system.
The corollary of the Ahmdal’s law states just that. It argues that there is little sense in aggressively optimizing a part of the program that executes only a few times, very quickly, instead of optimizing the part that occupies the largest part of the execution time. In another (famous) words, a gain of 10% of time in code that executes 90% of time is much better for the overall performance than a 90% speed up in code that executes only 10% of the time.
This container storage package is built atop its Gluster Storage technology and integrated with the OpenShift platform.
“The key piece we’re trying to solve with container-native storage is for storage to become invisible eventually. We want developers to have enough control over storage where they’re not waiting for storage admins to carve out storage for their applications. They’re able to request and provision storage dynamically and automatically,” said Irshad Raihan, Red Hat senior manager of product marketing.
Software is always changing, but hardware not so much. This two-part tour introduces networking hardware, from traditional switches and routers to smartphones and wireless hotspots.
Local Area Network
The traditional local area network is connected with an Ethernet switch and Cat cables. The basic components of an Ethernetwork are network interface cards (NICs), cables, and switches. NICs and switches have little status lights that tell you if there is a connection, and the speed of the connection. Each computer needs an NIC, which connects to a switch via an Ethernet cable. Figure 1 shows a simple LAN: two computers connected via a switch, and a wireless access point routed into the wired LAN.
Figure 1: A simple LAN.
Installing cable is a bit of work, and you lose portability, but wired Ethernet has some advantages. It is immune to the types of interference that mess up wireless networks (microwave ovens, cordless phones, wireless speakers, physical barriers), and it is immune to wireless snooping. Even in this glorious year 2017 of the new millennium there are still Linux distributions, and devices like IP surveillance cameras and set-top boxes, that require a wired network connection for the initial setup, even if they also support wi-fi. Any device that has one of those little physical factory-reset switches that you poke with a paperclip has a hard-coded wired Ethernet address.
With Linux you can easily manage multiple NICs. My Internet is mobile broadband, so my machines are connected to the Internet through a wireless hotspot, and directly to each other on the separate wired Ethernetwork for fast local communications. My workstations have easy wi-fi thanks to USB wireless interfaces (figure 2).
Figure 2: USB wireless interfaces.
Switches come in “dumb” and managed versions. Dumb switches are dead simple: just plug in, and you’re done. Managed switches are configurable and offer features like power over Ethernet (PoE), controllable port speeds, virtual LANs (VLANs), disable/enable ports, quality of service, and security features.
Ethernet switches route traffic only where it needs to go, between the hosts that are communicating with each other. If you remember the olden days of Ethernet hubs, then you remember that hubs broadcast all traffic to all hosts, and each host had to sort out which packets were meant for it. That is why one definition of a LAN is a collision domain, because hubs generated so much uncontrolled traffic. This also enabled easy snooping on every host connected to the hub. A nice feature on a managed switch is a snooping port, which may be called a monitoring port, a promiscuous port, or a mirroring port, which allows you to monitor all traffic passing through the switch.
Quick Ethernet cheat sheet:
Ethernet hardware supports data transfer speeds of 10, 100, 1000, and 10,000 megabits per second.
These convert to 1.25, 12.5, 125, and 1,250 megabytes per second.
Real-world speeds are half to two-thirds of these values.
Network bandwidth is limited by the slowest link, such as a slow hard drive, slow network interface, feeble CPU, congested router, or boggy software.
Most computers have built-in Ethernet interfaces.
Gigabit (1000 Mb/s) USB Ethernet interfaces are dirt cheap, currently around $25, and require USB 3.0.
Ethernet is backwards-compatible, so gigabit devices also support slower speeds.
A single user may not see much benefit from 10 Gigabit Ethernet, but multiple users will. You could use a 10 GigE link as your LAN backbone, and use slower hardware to connect your subnets and individual hosts.
What is Bandwidth?
Bandwidth means several things: latency, throughput, error rate, and jitter. Analogies are tricky, but we can illustrate this with a water pipe. The diameter of the pipe limits the total bandwidth: the larger the pipe, the more water it can deliver. Latency is how long you have to wait for the water to start coming out. Jitter measures how smoothly the water is delivered, or how erratically.
I can’t think of a water analogy for error rate; in computer networking that is how many of your data packets are corrupted. Data transfers require that all packets arrive undamaged because a single bad packet can break an entire data file transfer. The TCP protocol guarantees packet delivery and re-sends corrupted and missing packets, so a high error rate results in slower delivery.
Having large bandwidth doesn’t guarantee that you will enjoy smooth network performance. Netflix, for one example, requires only a minimum of 1.5 Mb/s. High latency, jitter, and error rates are annoying for data transmissions, but they are deadly for streaming media. This is why you can have an Internet account rated at 20-30 Mb/s and still have poor-quality video conferencing, music, and movies.
Ethernet Cables
Ethernet cables are rated in Cats, short for category: Cat 5, 6, 7, and 8. Cat 5 was deprecated in 2001, and it’s unlikely you’ll see it for sale anymore. Cat 5e and 6 support 10/100/1000 Mb/s. Cat 6a and 7 are for 10 Gb/s. (You also have the option of optical fiber cabling for 10 Gb/s, though it is more expensive than copper Cat 6a/7 cables.) Cat cables contain 4 balanced-signal pairs of wires, and each individual wire is made of either several copper strands twisted together, or one solid copper wire. Twisted-pair cables are flexible. Solid-core wires are stiffer and have less transmission loss.
Plenum cables are designed for permanent installations inside the plenum spaces in buildings, dropped ceilings, inside of walls, and underneath floors. Plenum cables are wrapped in special plastics that meet fire safety standards. These cost more than non-plenum, but don’t cheap out because duh, do I have to explain why? Plenum cables should be solid-core rather than twisted pairs.
Patch cables are twisted-pair. Traditionally “patch” meant a short cable, for connecting computers to wall outlets, switches to routers, and for patch panels, though they can be as long as you need, up to about 300 feet for Cat 5e, 6, and 6a. For longer runs you’ll need repeaters.
Come back next week for part 2, where we will learn how to connect networks, and some cool hacks for mobile broadband.
Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.
Imagine this scenario: You invested in Ansible, you wrote plenty of Ansible roles and playbooks that you use to manage your infrastructure, and you are thinking about investing in containers. What should you do? Start writing container image definitions via shell scripts and Dockerfiles? That doesn’t sound right.
Some people from the Ansible development team asked this question and realized that those same Ansible roles and playbooks that people wrote and use daily can also be used to produce container images. But not just that—they can be used to manage the complete lifecycle of containerized projects. From these ideas, the Ansible Container project was born.
In our consultancy work, we often see companies tagging production images in an ad-hoc manner. Taking a look at their registry, we find a list of images like:
1
2
3
4
5
6
acmecorp/foo:latest
acmecorp/foo:v1.0.0–beta
acmecorp/foo:v0.3.0
acmecorp/foo:v0.2.1
acmecorp/foo:v0.2.0
acmecorp/foo:v0.1.0
and so on.
There is nothing wrong with using semantic versioning for your software, but using it as the only strategy for tagging your images often results in a manual, error prone process (how do you teach your CI/CD pipeline when to upgrade your versions?)
I’m going to explain you an easy, yet robust, method for tagging your images. Spoiler alerts: use the commit hash as the image tag.
Suppose the HEAD of our Git repository has the hash ff613f07328fa6cb7b87ddf9bf575fa01b0d8e43. We can manually build an image with this hash like so:
As we come into this year’s Node.js Interactive conference it’s a good time to reflect on the State of Node.js, and by any reasonable measure the state of Node.js is very strong. Every day there are more than 8.8 million Node instances online, that number has grown by 800,000 in the last nine months alone. Every week there are more than 3 billion downloads of npm packages. The number of Node.js contributors has grown from 1,100 contributors last year to more than 1,500 contributors today. To date there have been a total of 444 releases, and we have 39,672 stars on Github. This is an enviable position for any technology and a testament to the value of Node.js and the dedication of the Node.js community.
Growth of Node.js From a User Perspective
We see incredible success with Node.js for front-end, back-end and full stack development. In this year’s Node.js User Survey we got incredible feedback and gained increased understanding of how Node.js is being used. We know that the the biggest use case for Node.js is back-end development, but users are also developing cross-platform and desktop applications, enabling IoT and even powering security apps. This week we are launching our annual survey again to identify trends and track our progress. I highly encourage you to take the survey and share your insights with the rest of the community.
As Linux distributions container-based operations microservices, they come across new file-system related challenges. Linux vendors, including Red Hat, SUSE and Canonical, are major players in the container space. In addition to their traditional OSes, these companies have also built container as service platforms to handle containerized workloads and microservices. Following the footsteps of CoreOS’s Container Linux, Red Hat has created Project Atomic; Canonical came out with Ubuntu Core and SUSE released SUSE CaaS Platformand Kubic.
Namespaces, Dedup, Scheduling
“One of the biggest challenges that the containers ecosystem faces is that file systems are not currently namespace aware,” said Ben Breard, Red Hat senior technology product manager for Linux containers. Though there are several concepts to create a namespace of sorts with existing file systems, this current limitation creates challenges, particularly around security and usability, for things like user namespaces.
At its Oracle OpenWorld conference this week in San Francisco, the company announced its new Blockchain Cloud Service. The distributed ledger cloud platform aims to help enterprises make various transactions more secure, using blockchain technology. The new service — which is fully managed by Oracle — is part of Oracle’s Cloud Platform.
“We’re introducing blockchain, both as a platform-as-a-service and as a way to do secure transactions: intercompany accounting transactions, procurement transactions, and loyalty programs that span multiple providers in a loyalty network, using blockchain to handle the secure hyperledger,” said Thomas Kurian, president of product development at Oracle, at this week’s conference.
Today’s tech market is chock full of both talent and demand, a seemingly perfect combination. However, sealing the deal on a new gig is a little trickier as employers increasingly seek cloud expertise from their existing and new employees.
This year’s Open Source Jobs Survey and Report is a fantastic resource that provides an overview of trends and motivations of employers and professionals both.
Cloud expertise grabs the top spot across the board
One of the most glaring findings is the incredibly high demand for cloud expertise. Hiring managers cited “Cloud Technologies” as the skill they’re looking for most (70 percent). Even when looking specifically at open source skills, “Cloud/Virtualization” is the most sought after at 60 percent. Further, when asked what areas of expertise most affected hiring decisions, hiring managers kept the same tune and cited “Cloud” at 60 percent.
George McFerran, EVP Product & Marketing, Dice
It’s no secret that cloud expertise is an absolute necessity for any tech professional seeking career growth and it’s not just employers who see the benefit to a cloud-based skillset – open source professionals also see cloud expertise as the most in-demand skill (47 percent) and predict that cloud technologies will gain the most importance this year (69 percent).
While enterprises consider their move to the cloud, many are falling headlong into the cold, hard facts: there are not enough cloud services professionals working in the market today. Salaries are shooting up, but the worker pipeline looks a little thin. This is the time for tech pros to expand their skillsets and careers simultaneously.
Get certified and consider specializing
Amazon Web Services (AWS) certification is a must. Although some employers dismiss this certification, the truth is, companies use it all of the time to select those they want to interview and hire.
Professionals should also learn a range of skills, including cloud migration, application integration, automation, data analytics and security.
More specifically, cloud-based IoT is a thriving area of development but experts are few and far between. Most IoT systems are on AWS, Microsoft or Google public clouds, so having the skillset across platforms is a win-win.
There’s also the option to focus on new architectures such as containers and microservices; these are far less known and will set professionals apart from the crowd.
Brag about it
It’s clear that as cloud services have swept across the enterprise computing market, career opportunities have followed. To ensure that your cloud-based skills are known, refresh your resume and update your profile on Dice.com so you’re matched with an employer who will be glad to have found someone with the most in-demand skills in today’s tech market.
The full 2017 Open Source Jobs Report is available todownload now.