Big Data Ingestion: Flume, Kafka, and NiFi

By

-

July 20, 2017

Flume, Kafka, and NiFi offer great performance, can be scaled horizontally, and have a plug-in architecture where functionality can be extended through custom components.

When building big data pipelines, we need to think on how to ingest the volume, variety, and velocity of data showing up at the gates of what would typically be a Hadoop ecosystem. Preliminary considerations such as scalability, reliability, adaptability, cost in terms of development time, etc. will all come into play when deciding on which tools to adopt to meet our requirements. In this article, we’ll focus briefly on three Apache ingestion tools: Flume, Kafka, and NiFi. All three products offer great performance, can be scaled horizontally, and provide a plug-in architecture where functionality can be extended through custom components.

Docker Leads OCI Release of V1.0 Runtime and Image Format Specifications

By

Docker Blog

-

July 20, 2017

Today marks an important milestone for the Open Container Initiative (OCI) with the release of the OCI v1.0 runtime and image specifications – a journey that Docker has been central in driving and navigating over the last two years. It has been our goal to provide low-level standards as building blocks for the community, customers and the broader industry. To understand the significance of this milestone, let’s take a look at the history of Docker’s growth and progress in developing industry-standard container technologies.

The History of Docker Runtime and Image Donations to the OCI

Docker’s image format and container runtime quickly emerged as the de facto standard following its release as an open source project in 2013. We recognized the importance of turning it over to a neutral governance body to fuel innovation and prevent fragmentation in the industry. Working together with a broad group of container technologists and industry leaders, the Open Container Project was formed to create a set of container standards and was launched under the auspices of the Linux Foundation in June 2015 at DockerCon. It became the Open Container Initiative (OCI) as the project evolved that Summer.

How Microsoft Deployed Kubernetes to Speed Testing of SQL Server 2017 on Linux

By

The New Stack

-

July 20, 2017

When the Microsoft SQL Server team started working on supporting Linux for SQL Server 2017, their entire test infrastructure was, naturally enough, on Windows Server (using virtual machines deployed on Azure). Instead of simply replicating that environment for Linux, they used Azure Container Service to produce a fully automated test system that packs seven times as many instances into the same number of VMs and runs at least twice as fast.

“We have hundreds of thousands of tests that go along with SQL Server, and we decided the way we would test SQL Server on Linux was to adopt our own story,” SQL program manager Tony Petrossian told the New Stack. “We automated the entire build process and the publishing of the various containers with different versions and flavors. Our entire test infrastructure became containerized and is deployed in ACS.

Condensing Your Infrastructure with System Containers

By

Swapnil Bhartiya

-

July 19, 2017

When most people hear the word containers, they probably think of Docker containers, which are application containers. But, there are other kinds of containers, for example, system containers like LXC/LXD. Stéphane Graber, technical lead for LXD at Canonical Ltd., will be delivering two talks at the upcoming Open Source Summit NA in September: “GPU, USB, NICs and Other Physical Devices in Your Containers” and “Condensing Your Infrastructure Using System Containers” discussing containers in detail.

In this OS Summit preview, we talked with Graber to understand the difference between system and application containers as well as how to work with physical devices in containers.

Linux.com: What are system containers, how are they different from virtual machines?

Stéphane Graber: The end result of using system containers or a virtual machine is pretty similar. You get to run multiple operating systems on a single machine.

The VM approach is to virtualize everything. You get virtualized hardware and a virtualized firmware (BIOS/UEFI) which then boots a full system starting from bootloader, to kernel, and then userspace. This allows you to run just about anything that a physical machine would be able to boot but comes with quite a bit of overhead for anything that is virtualized and needs hypervisor involvement.

System containers, on the other hand, do not come with any virtualized hardware or firmware. Instead, they rely on your existing operating system’s kernel and so avoid all of the virtualization overhead. As the kernel is shared between host and guest, this does, however, restrict you to Linux guests and is also incompatible with some workloads that expect kernel modifications.

A shared kernel also means much easier monitoring and management as the host can see every process that’s running in its containers, how much CPU and RAM each of those individual tasks are using, and it will let you trace or kill any of them.

Linux.com: What are the scenarios where someone would need system containers instead of, say VM? Can you provide some real use cases where companies are using system containers?

Graber: System containers are amazing for high-density environments or environments where you have a lot of idle workloads. A host that could run a couple hundred idle virtual machines would typically be able to run several thousand idle system containers.

That’s because idle system containers are treated as just a set of idle processes by the Linux kernel and so don’t get scheduled unless they have something to do. Network interrupts and similar events are all handled by the kernel and don’t cause the processes to be scheduled until an actual request is coming their way.

Another use case for system containers is access to specialized hardware. With virtual machines, you can use PCI passthrough to move a specific piece of hardware to a virtual machine. This, however, prevents you from seeing it on the host, and you can’t share it with other virtual machines.

Because system containers run on the same kernel as the host. Device passthrough is done at the character/block device level, making concurrent access from multiple containers possible so long as the kernel driver supports it. LXD, for example, makes it trivial for someone to pass GPUs, USB devices, NICs, filesystem paths and character/block devices into your containers.

Linux.com: How are system containers different from app containers like Docker/rkt?

Graber: System containers will run a full, usually unmodified, Linux distribution. That means you can SSH into such a container the you can install packages, apply updates, use your existing management tools, etc. They behave exactly like a normal Linux server would and make it easy to move your existing workloads from physical or virtual machines over to system containers.

Application containers are usually based around a single process or service with the idea that you will deploy many of single-service containers and connect them together to run your application.

That stateless, microservice approach is great if you are developing a new application from scratch as you can package every bit of it as separate images and then scale your infrastructure up or down at a per-service level.

So, in general, existing workloads are a great fit for system containers, while application containers are a good technology to use when developing something from scratch.

The two also aren’t incompatible. We support running Docker inside of LXD containers. This is done thanks to the ability to nest containers without any significant overhead.

Linux.com: When you say condensing your infrastructure what exactly do you mean? Can you provide a use case?

Graber: It’s pretty common for companies to have a number of single-purpose servers, maybe running the company PBX system, server room environment monitoring system, network serial console, etc.

All of those use specialized hardware, usually through PCI cards, serial devices or USB devices. The associated software also usually depends on specific, often outdated version of the operating system.

System containers are a great fit there as you can move those workloads to containers and then just pass the different devices they need. The end result is one server with all the specialized hardware inside it, running a current, supported Linux distribution with all the specialized software running in their individual containers.

The other case for condensing your infrastructure would be to move your Linux virtual machines over to LXD containers, keeping the virtual machines for running other operating systems and for those few cases where you want an extra layer of security.

Linux.com: Unlike VMs, how do system containers deal with physical devices?

Graber: System containers see physical devices as UNIX character or block devices (/dev/*). So the driver itself sits in the host kernel with only the resulting userspace interface being exposed to the container.

Linux.com: What are the benefits or disadvantages of system containers over VMs in context of devices?

Graber: With system containers, if a device isn’t supported by the host kernel, the container won’t be able to interact with it. On the other hand it also means that you can now share supported devices with multiple containers. This is especially useful for GPUs.

With virtual machines, you can pass entire devices through PCI or USB passthrough with the driver for them running in the virtual machine. The host doesn’t have to know what the device is or load any driver. However, because a given PCI or USB device can only be attached to a single virtual machine, you will either need a lot more hardware or constantly change your configuration to move it between virtual machines.

You can see the full schedule for Open Source Summit here and save $150 through July 30. Linux.com readers save an additional $47 with discount code LINUXRD5. Register now!

3 Reasons to Attend Open Source Summit in L.A.

By

Swapnil Bhartiya

-

July 19, 2017

Open Source Summit (formerly LinuxCon + Container Con) is almost here. It’s undoubtedly the biggest Linux show in North America that brings open source projects together under the same roof. With the rebranding of LinuxCon as the Open Source Summit, it has further widened its reach and includes several co-hosted events.

Three big reasons to attend this year include: Celebrities, Collaboration, and Community. Here, we share what some past attendees had to say about the event.

Celebrities

This year, actor and online entrepreneur Joseph Gordon-Levitt will be delivering a keynote. Gordon-Levitt founded an online production company called hitRECord that makes art collaboratively with more than half a million artists of all kinds, and he will be speaking on the evolution of the Internet as a collaborative medium.

The open source world, however, has its own lineup of stars who will be speaking at the event, including Linus Torvalds, Greg Kroah-Hartman, Zeynep Tufekci, Dan Lyons, Jono Bacon, and more!

Had a great time at the conference, got to meet some of the best and brightest in the Linux and Cloud industry! – William Roper, Hewlett-Packard

Collaboration

Open Source Summit is known for being a bridge between open source approaches and the world that’s now opening up to open source technologies. It’s a perfect platform for collaboration between both partners and competitors, and it creates a unique environment for communication and commitment to open source.

Collaboration is what makes great feats of technological and social progress possible. LinuxCon is where the industry’s brightest and most prolific collaborators go to become even better collaborators. – Alex Ng, Senior Software Engineer, Microsoft

LinuxCon provides a unique opportunity to learn about a range of OSS projects/technologies, meet with developers and vendors, make important contacts, and have fun at the social events. I highly recommend LinuxCon (and other LF events) for anyone wanting to expand their understanding of the people, culture, and machinery behind Linux and OSS. – Alex Luccisano, Cisco Systems

Community

Open Source Summit is more about people than technology. It’s the only place where you will see so much richness when it comes to community participation. You will see members from so many different communities including OpenStack, kernel, Docker, networking, database, cloud… you get the idea.

I only go to one conference a year, and it’s LinuxCon. I never miss it. It has a little bit of every technology, and a wide variety of people to network with. – Troy Dawson, Senior Software Engineer, Red Hat

A worthwhile event with good content and speakers. Although a first timer at the event, I felt welcome. The event staff was friendly and helpful. The women’s t-shirts and open source lunch helped make the event more welcoming and accepting. – Carol Willing, Willing Consulting

LinuxCon was a great conference with a mix of different sessions from educating kids with puppet shows using open source to Google talking about their upgrading process of thousands of machines and how they did it. There seems to be sessions that would interest anyone across the board. – Bill Mounsey

Open Source Summit creates a very family-friendly environment for attendees to bring their kids. As a journalist I have been attending the Open Source Summit annually since 2009, and it’s the only tech event where I bring my entire family. In 2015, I met Torvalds again and told him that my son was big enough now to run around. He said he knew and pulled out his phone to show me a photo of my son chasing Tux the penguin around the venue the year before.

The author’s son with Tux the Penguin at a past event.

Check out the full schedule for Open Source Summit here, and save $150 on registration through July 30. Linux.com readers save an additional $47 with discount code LINUXRD5. Register now!

The Open Container Initiative Launches Version 1.0 of Its Container Specs

By

TechCrunch

-

July 19, 2017

It took a while, but the Open Container Initiative (OCI) today announced the launch of the 1.0 versions of both its container runtime and image specs for software container. The two-year-old open source foundation was established by Docker and other leaders in the container ecosystem to be the guardian of exactly these specifications, which are basically the industry standards for container formats and runtimes.

Docker kicked off much of the work on these specs when it donated the codebase of its container runtime to the OCI. Over time, the technical community also added a spec for the container image format to the project as well. Today, the OCI has over 40 members, which include virtually every major tech company that plays in the cloud space (think AWS, Cisco, Facebook, Google, Huawei, IBM, Intel, Microsoft, Oracle, RedHat and VMware) as well as a number of container-focused startups like Rancher and Wercker.

If You Were On a Desert Island, Which License Would You Take With You?

By

Linux.com Editorial Staff

-

July 19, 2017

First, we need to ask ourselves why we should bother choosing a license?

Are you:

presenting your software to the public?
representing your software in a way that leads others to believe they can copy it or build on it?

Then, yes, you should choose a license. Be fair to your visitors and back up the appearance of permission by expressly giving permission.

The copyright law defaults to author control of copying, modification, and distribution, so others need the author’s permission to copy, modify, or distribute. If you would like others to be free to copy your software and possibly build on it, then you should choose a license. Open source licenses give the permissions needed to get that default copyright impediment out of the way.

Cyber Security Comes Down to Culture, Say Dutch Security Experts

By

ComputerWeekly

-

July 19, 2017

IT security can no longer be seen as just a technical matter. People, education and management matter too, but culture is the overarching and binding element, says security executive at Dutch bank.

IT is pervasive and for modern life seemingly inescapable. But on closer inspection, IT is quite young and even immature. Right now, the argument can be made that IT is a teen. It thinks it knows it all, blindly accepts that the world revolves around it, but it’s actually inexperienced, obstinate and not really talkative or sharing. And yet it’s unavoidable.

How to Make a Strategic, Value-Driven Business Case for Your DevOps Initiative

By

Contino

-

July 19, 2017

Did you ever hear the story about the organisation that continued to meet their customers’ expectations and then went under? No? Me neither.

DevOps is a way of enabling enterprises to keep pace with change and continue to meet customer expectations and deliver business value. The business case for DevOps is utterly compelling; all the more reason why any DevOps initiative needs to be articulated in a way that means it will get approved.

There are many articles and blogs out there that describe the ‘business benefits’ of DevOps, but these typically are written from a largely technological view of the ‘business benefits’. However, the business case is at the intersection of strategy and operations. It should lay out a tangible way forward for a project or programme to achieve one or more strategic objectives. DevOps business cases that focus on tactical (technical) achievements rather than attaining the strategic enterprise objective will not fare as well as those that focus on business or customer value. The technical achievements of a DevOps project or programme is of course important, but as we deconstruct and rebuild the business case in subsequent paragraphs, you will see the value of focusing on strategic enterprise objectives and delivering value to the business and customer.

In this article, I will give the business view of the DevOps business case as someone that has been involved in the writing, reviewing and approving of business cases.

Testing or Monitoring? MTBF or MTTR? Make your choice!

By

Dan Lebrero

-

July 19, 2017

What is more important testing or monitoring? Should you optimize for mean time between failures (MTBF) or mean time to repair (MTTR)?

Your team is torn by the choice.

Half of your teammates vote for a fully automated test suite, the other half for having good monitoring in production.

You have the decisive vote. What will be your choice?

Give yourself a minute to decide.

Test Suite

If you choose a test suite, you are aiming to not have any bugs in production.

You will have your unit and acceptance tests to know that you are building the right thing, your integration and system tests to check that components can talk to each other, your performance and soak tests to know that you have enough capacity and your resilience tests to make sure your system can cope with failure.

You will be full of confidence during development.

But then you deploy to production and what?