ODPi Webinar on Taking Apache Hadoop Enterprise-Wide: Your Questions Answered

By

-

July 27, 2017

ODPi recently hosted a webinar with John Mertic, Director of ODPi at the Linux Foundation, and Tamara Dull, Director of Emerging Technologies at SAS Institute, to discuss ODPi’s recent 2017 Preview: The Year of Enterprise-wide Production Hadoop whitepaper and explore DataOps at Scale and the considerations businesses need to make as they move Apache Hadoop and Big Data out of Proof of Concepts (POC)s and into enterprise-wide production, hybrid deployments.

Watch Replay

Download Slides

The webinar dived into why hybrid deployments are the norm for businesses moving Apache Hadoop and Big Data out of Proof of Concepts (POC)s and into enterprise-wide production. Mertic and Dull went through several important considerations that these businesses must address and provided the step-change DataOps requirements that come when you take Hadoop into enterprise-wide production. The webinar also discussed why deployment and management techniques that work in limited production may not scale when you go enterprise wide.

Screen Shot 2017-07-24 at 10.46.44 AM.png

The webinar was so interactive with polls and questions, we unfortunately did not get to every question. This is why we sat down with Dull and Mertic to respond to them now.

What are the top 3 considerations for data projects?

TD: I have four, not three, considerations if you want your big data project to be successful:

Strategy: Does this project address/solve a real business problem? Does it tie back to a corporate strategy? If the answer is “no” to either question, try again.
Integration: At the infrastructure level, how are your big data technologies going to fit into your current environment? This is where ODPi comes in. And at the data level, how are you going to integrate your data – structured, semi-structured, and unstructured? You must figure this out before you go production with Hadoop.
Privacy/Security: This is part of the data governance discussion. I highlight it because as we move into this newer Internet of Things era, if privacy and security are not included in the design phase of whatever you’re building (product or service), you can count on this one biting you in the butt later on.
People/Skills: Do you have the right people on your data team? Today’s big data team is an evolution of your existing BI or COE team that requires new, additional skills.

Why is Hadoop a good data strategy solution?

TD: Hadoop allows you to collect, store, and process any and all kinds of data at a fraction of the cost and time of more traditional solutions. If you want to give all your data a voice at the table, then Hadoop makes it possible.

Why are some companies looking to de-emphasize Hadoop?

TD: See the “top 3 considerations” question. If any of these considerations are missed, your Hadoop project will be at risk. No company wants to emphasize risky projects.

How will a stable, standardized Hadoop benefit other big data projects like Spark?

JM: By helping grow innovation in those projects. It’s a commonly seen side effect that stabilizing the commodity core areas of a technology stack ( look at Linux for a great example ) enables R&D efforts to focus at higher levels in the stack.

How is Enterprise Hadoop challenges different for different verticals (healthcare, telco, banking, etc.)?

JM: Each vertical has very specific industry regulations and practices in how data is used and classified. This makes efforts around data governance that much more crucial – sharable templates and guidelines streamline data usage and enable focus on insight and discovery.

Is Hadoop adoption different around the world (i.e., EU, APAC, South America, etc.)?

JM: Each geo definitely has unique adoptions patterns depending on local business culture, the technology sector maturity and how technology is adopted and implemented in those regions. For example, we see China as a huge area of Hadoop growth that looks to adopt more full stack solutions as the next step from the EDW days. EU tends to lag a bit more behind in data analytics in general as they implement technology in a more thoughtful approach, and NA companies tend to implement technologies and then look how to connect to business problems.

What recent movements/impact in this space are you most excited about?

TD: We’ve been talking about “data-driven decision making” for decades. We now have the technologies to make it a reality – much quicker and without breaking the bank.

Where do you see the state of big data environments two years from now?

TD: Big data environments will be more stable and standardized. There will be less technical discussion about the infrastructure – i.e., the data plumbing – and more business discussion about analyzing the data and figuring out how to make or save money with it.

What impact does AR have on IoT and Big Data?

TD: I see this the other way around: Big data and IoT are fueling AR. Because of big data and IoT, AR can provide us with more context and a richer experience no matter where we are.

Can you recommend a resource that explains the hadoop ecosystem? People in this space seem to assume knowledge of the different open source project names and what they do, and explain what one component does in terms of the others. For me, it has been very difficult to figure out, i.e., “spark is like storm except in memory and less focused on streaming.”

TD: This is a very good question. What makes it more challenging is that the Hadoop ecosystem is growing and evolving, so you can count on today’s popular projects getting bumped to the side as newer projects come into play. I often refer to The Hadoop Ecosystem Table to understand the bigger picture and then drill down from there if I want to understand more.

We invite you to get involved with ODPi and learn more by visiting the ODPi website at https://www.odpi.org/.

We hope to see you again at an upcoming Linux Foundation webinar. Visit Linux.com to view the upcoming webinar schedule: https://www.linux.com/blog/upcoming-free-webinars-linux-foundation

Building IPv6 Firewalls: IPv6 Security Myths

By

Carla Schroder

-

July 27, 2017

We’ve been trundling along nicely in IPv6, and now it is time to keep my promise to teach some iptables rules for IPv6. In this two-part series, we’ll start by examining some common IPv6 security myths. Every time I teach firewalls I have to start with debunking myths because there are a lot of persistent weird ideas about the so-called built-in IPv6 security. In part 2 next week, you will have a nice pile of example rules to use.

Security yeah, no

You might recall the optimistic claims back in the early IPv6 days of all manner of built-in security that would cure the flaws in IPv4, and we would all live happily ever after. As usual, ’tisn’t exactly so. Let’s take a look at a few of these.

IPsec is built-in to IPv6, rather than added on as in IPv4. This is true, but it’s not particularly significant. IPsec, IP Security, is a set of network protocols for encrypting and authenticating network traffic. IPsec operates at the Network layer. Other encryption protocols that we use every day, such as TLS/SSL and SSH, operate higher up in the Transport Layer, and are application-specific.

IPsec operates similarly to TLS/SSL and SSH with encryption key exchanges, authentication headers, payload encryption, and complete packet encryption in encrypted tunnels. It works pretty much the same in IPv6 and IPv4 networks; patching code isn’t like sewing patches on clothing, with visible lumps and seams. IPv6 is approaching 20 years old, so whether certain features are built-in or bolted-on isn’t relevant anyway.

The promise of IPsec is automatic end-to-end security protecting all traffic over an IP network. However, implementing and managing it is so challenging we’re still relying on our old favorites like OpenVPN, which uses TLS/SSL, and SSH to create encrypted tunnels.

IPsec in IPv6 is mandatory. No. The original specification required that all IPv6 devices support IPsec. This was changed in 2011 RFC 6434 Section 11 from MUST to SHOULD. In any case, having it available is not the same as using it.

IPsec in IPv6 is better than in IPv4. Nah. Pretty much the same.

NAT = Security. No no no no no no, and NO. NAT is not and never has been about security. It is an ingenious hack that has extended the lifespan of IPv4 many years beyond its expiration date. The little bit of obfuscation provided by address masquerading doesn’t provide any meaningful protection, and it adds considerable complexity by requiring applications and protocols to be NAT-aware. It requires a stateful firewall which must inspect all traffic, keep track of which packets go to your internal hosts, and rewrite multiple private internal addresses to a single external address. It gets in the way of IPsec, geolocation, DNSSEC, and many other security applications. It creates a single point of failure at your external gateway and provides an easy target for a Denial of Service (DoS) attack. NAT has its merits, but security is not one of them.

Source routing is built-in. This is true; whether it is desirable is debatable. Source routing allows the sender to control forwarding, instead of leaving it up to whatever routers the packets travel through, which is usually Open Shortest Path First (OSPF). Source routing is sometimes useful for load balancing, and managing virtual private networks (VPNs); again, whether it is an original feature or added later isn’t meaningful.

Source routing presents a number of security problems. You can use it to probe networks and gain information and bypass security devices. Routing Header Type 0 (RH0) is an IPv6 extension header for enabling source routing. It has been deprecated because it enables a clever DoS attack called amplification, which is bouncing packets between two routers until they are overloaded and their bandwidth exhausted.

IPv6 networks are protected by their huge size. Some people have the idea that because the IPv6 address space is so large this provides a defense against network scanning. Sorry but noooo. Hardware is cheap and powerful, and even when we have literally quintillions of potential addresses to use (an IPv6 /64 network segment is 18.4 quintillion addresses) we tend to organize our networks in predictable clumps.

The difficulties of foiling malicious network scanning are compounded by the fact that certain communications are required for computer networks to operate. The problem of controlling access is beyond the abilities of any protocol to manage for us. Read Network Reconnaissance in IPv6 Networks for a lot of interesting information on scanning IPv6 networks, which attacks require local access and which don’t, and some ways to mitigate hostile scans.

Multitudes of Attack Vectors

Attacks on our networks come from all manner of sources: social engineering, carelessness, spam, phishing, operating system vulnerabilities, application vulnerabilities, ad networks, tracking and data collection, snooping by service providers… going all tunnel vision on an innocent networking protocol misses almost everything.

Come back next week for some nice example IPv6 firewall rules.

You might want to review the previous installments in our meandering IPv6 series:

Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.

Why Kubernetes, OpenStack and OPNFV Are a Match Made in Heaven

By

OpenStack

-

July 27, 2017

Chris Price, open source strategist for SDN, cloud and NFV at Ericsson, says there’s plenty of love in the air between Kubernetes, OpenStack and OPNFV.

“Kubernetes provides us with a very simple way of very quickly onboarding workloads and that’s something that we want in the network, that’s something we want in our platform,” Price said, speaking about what he called “a match made in heaven” between Kubernetes, OpenStack and NFV at the recent OPNFV Summit.

Price believes the Euphrates release is the right time to integrate more tightly with Kubernetes and OpenStack, finding ways of making the capabilities from each available to the NFV environment and community.

Using Prototypes to Explore Project Ideas, Part 1

By

Practicing Developer

-

July 27, 2017

Imagine that you work for an agency that helps clients navigate the early stages of product design and project planning.

No matter what problem space you are working in, the first step is always to get ideas out of a client’s head and into the world as quickly as you possibly can. Conversations and wireframes can be useful for finding a starting point, but exploratory programming soon follows because words and pictures alone can only take you so far.

By getting working software into the mix early in the process, product design becomes an interactive collaboration. Fast feedback loops allow stumbling blocks to be quickly identified and dealt with before they can burn up too much time and energy in the later (and more expensive) stages of development.

The Biggest Shift in Supercomputing Since GPU Acceleration

By

The Next Platform

-

July 27, 2017

If you followed what was underway at the International Supercomputing Conference (ISC) this week, you will already know this shift is deep learning. Just two years ago, we were fitting this into the broader HPC picture from separate hardware and algorithmic points of view. Today, we are convinced it will cause a fundamental rethink of how the largest supercomputers are built and how the simulations they host are executed. After all, the pressures on efficiency, performance, scalability, and programmability are mounting—and relatively little in the way of new thinking has been able to penetrate those challenges.

The early applications of deep learning in using approximation approach to HPC—taking experimental or supercomputer simulation data and using it to train a neural network, then turning that network around in inference mode to replace or augment a traditional simulation—are incredibly promising. This work in using the traditional HPC simulation as the basis for training is happening fast and broadly, which means a major shift is coming to HPC applications and hardware far quicker than some centers may be ready for. What is potentially at stake, at least for some application areas, is far-reaching. Overall compute resource usage goes down compared to traditional simulations, which drives efficiency, and in some cases, accuracy is improved. Ultimately, by allowing the simulation to become the training set, the exascale-capable resources can be used to scale a more informed simulation, or simply be used as the hardware base for a massively scalable neural network.

New GitHub Features Focus on Open Source Community

By

Linux.com Editorial Staff

-

July 27, 2017

GitHub is adding new features and improvements to help build and grow open source communities. According to the organization, open source thrives on teamwork, and members need to be able to easily contribute and give back. The new features are centered around contributing, open source licensing, blocking, and privacy.

To make open source licensing easier, the organization has introduced a new license picker that provides an overview of the license, the full text, and the ability to customize fields.

Azure Container Instances: No Kubernetes Required

By

InfoWorld

-

July 27, 2017

Microsoft has introduced a new container service, Azure Container Instances (ACI), that is intended to provide a more lightweight and granular way to run containerized applications than its Azure Container Service (ACS).

ACI runs individual containers that you can configure with specific amounts of virtual CPU and memory, and that are billed by the second. Containers can be pulled from various sources – Docker Hub, the Azure Container Registry, or a private repository – and deployed from the CLI or by way of an Azure template.

Open Source Mentoring: Your Path to Immortality

By

Swapnil Bhartiya

-

July 26, 2017

Rich Bowen is omnipresent at any Open Source conference. He wears many hats. He has been doing Open Source for 20+ years, and has worked on dozens of different projects during that time. He’s a board member of the Apache Software Foundation, and is active on the Apache HTTPd project. He works at Red Hat, where he’s a community manager on the OpenStack and CentOS projects.

At Open Source Summit North America, Bowen will be delivering a talk titled “Mentoring: Your Path to Immortality.” We talked to Bowen to know more about the secret of immortality and successful open source projects.

Linux.com: What was the inspiration behind your talk?

Rich Bowen: My involvement in open source is 100 percent the result of people who mentored me, encouraged me to participate, and cheered me on as I worked. In recent years, as I have lost steam on some of these projects. I’ve turned my attention to encouraging younger people to step in and fill my space. This has been every bit as rewarding as participating myself, and I wanted to share some of this joy.

Linux.com: Have you seen projects that died because their creators left?

Bowen: Oh, sure. Dozens of them. And many of them were projects that had a passionate user community, but no active developers. I tend to think of these projects as not really open source. It’s not enough to have your code public, or even under an open source license. You have to actually have a collaborative community in order for your project to be truly open and sustainable.

Linux.com: When we talk about immortality of a project and changing leadership, there can be many factors — documentation, adapting processes, sustainability. What do you think are some of the factors that ensure immortality?

Bowen: Come to my talk and find out! Seriously, the most important thing — the thing that I want people to take away from my talk — is that you be willing to step out of your comfort zone and ask someone to help out. Be willing to relinquish control, and let someone else do something that you could probably do letter. Or, maybe you couldn’t. There’s only one way to find out.

Linux.com: Can you give an example of some of the projects that followed the model and have never faced issues with changing guard?

Bowen: I would have to point to the Apache Web server. The project is 23 years old, and there’s only one person involved now who was involved at the beginning. The rest of the people working on it come and go, based on their interests and availability. The culture of handing out commit rights to all interested parties has been sustained over the years, and all the people on the project are treated as equals.

Other interesting examples include projects like Linux, Perl, or Python, which have very strong project leaders who, while they remain the public face of the project, in reality, delegate a lot of their power to the community. These projects all have strong cultures of mentors reaching out to new contributors and helping them with their first contributions.

Linux.com: How important are people and processes in the open source world or is it all about technology?

Bowen: We have a saying at Apache: Community > Code.

Obviously, our communities are based around code, but it’s the community, not the code, that the Apache board looks at when it evaluates whether a project is running in a sustainable way.

I would assert that open source is all about people — people who happen to like technology. The open source mindset, and everything that I talk about in my presentation, are equally applicable to any discipline where people create in a collaborative way — academia is one obvious example, but there are lots of other places like government, business coalitions, music, and so on.

Check out the full schedule for Open Source Summit here and save $150 on registration through July 30. Linux.com readers save an additional $47 with discount code LINUXRD5. Register now!

Activities for All at OS Summit in Los Angeles: Mentoring, Yoga, Puppies, and More!

By

The Linux Foundation

-

July 26, 2017

Open Source Summit North America is less than two months away! Join 2,000+ open source professionals Sept. 11-14 in Los Angeles, CA, for exciting keynotes and technical talks covering all things Linux, cloud, containers, networking, emerging open source technologies, and more.

Register now!

With your registration, you also get access to many special events throughout the four-day conference. Special events include:

New Speed Networking Workshop: Looking to grow your technical skills, get more involved in an open source community, or make a job change? This networking and mentoring session taking place Monday, Sept. 11 is for you!
New Recruiting Program: Considering a career move or a job change? This year we are making it easier than ever for attendees to connect with companies looking for new candidates.
Evening Events: Join fellow attendees for conversation, collaboration and fun at numerous evening events including the attendee reception at Paramount Studios featuring studio tours, live music, and dinner from LA favorites In-N-Out, Coolhaus, Pink’s and more!
Women in Open Source Lunch: All women attendees are invited to connect at this networking lunch, sponsored by Intel, on Monday, Sept. 11.
Dan Lyons Book Signing: Attendees will have the opportunity to meet author Dan Lyons on Tuesday, Sept. 12. The first 100 attendees will receive a free signed copy of his book Disrupted: My Misadventure in the Start-Up Bubble.
Thursday Summits & Tutorials: Plan to stay on September 14, to attend the Diversity Empowerment Summit (Hosted by HPE & Intel), Networking Mini-Summit or deep-dive tutorials – all included in your OSS NA registration!
New Executive Track: Full details coming soon on this special event, hosted by IBM, taking place Tuesday, Sept. 12.
Morning Activities for Attendees: Morning meditation, a 5K fun run, and a downtown Los Angeles sightseeing bus tour.

Check back for updates on even more activities, including our Attendee Partner Program, Kids Day (an opportunity for kids to learn Scratch programming, in partnership with LA Makerspace), and Puppy Pawlooza (enjoy playtime with shelter dogs thanks to our partnership with LA Animal Rescue).

Linux.com readers receive an additional $47 off with code LINUXRD5. Register now »

How to Integrate Containers in OpenStack

By

Sander van Vugt

-

July 26, 2017

One of the key features of the OpenStack platform is the ability to run applications, and quickly scale them, using containers.

Containers are ready-to-run applications because they come packed with the entire stack of services required to run them.

OpenStack is an ideal platform for containers because it provides all of the resources and services for containers to run in a distributed, massively scalable cloud infrastructure. You can easily run containers on top of Nova because it includes everything that is needed to run instances in a cloud. A further development is offered by project Zun.

In more complex environments, container orchestration is often required. Using container orchestration makes managing many containers in data center environments easier. Kubernetes has become the preferred solution for container orchestration. Container orchestration in OpenStack is implemented using project Magnum.

In the current OpenStack, there are no less than three solutions for running containers:

Directly on top of Nova
Using container orchestration in project Magnum
Using project Zun

In this tutorial, I’ll show you how to run containers in OpenStack using the Nova driver with Docker.

What is Docker?

Multiple solutions are available for running containers on cloud infrastructure. Currently, Docker is the most used solution for containers. It offers all that is needed to run containers in a corporate environment, and is backed by Docker Inc. for support.

Docker has many advantages. Its containers are portable as images and can be assembled from an application source code. File system level changes can also easily be managed. And

Docker can collect STDIN and STDOUT of processes running in a container, which allows for interactive management of containers.

The Nova driver embeds an HTTP client which talks with the Docker internal REST API through a UNIX socket. The HTTP API is used to control containers and fetch information about them.

The driver fetches images from OpenStack’s Glance service and loads them into the Docker file system. From Docker, container images may be placed in Glance to make them available to OpenStack.

Enabling Docker in OpenStack

Now that you have a general sense of how containers work in OpenStack, let’s talk about how you can enable containers using the Nova driver for Docker. The OpenStack Wiki has a detailed explanation of how to configure any OpenStack installation to enable Docker. You can also use your distribution’s deployment mechanism to deploy Docker.

When you do this, the Docker driver will be added to the nova.conf file. And the Docker container format will be added to glance.conf.

Once it’s enabled, Docker images can be added to the Glance repository, using the CLI command docker save. The below commands show how you can first pull a Docker image, next save it to the local machine using docker save, after which a glance image can be created from it, using the docker container format.

$ docker pull samalba/hipache; 
$ docker save samalba/hipache | glance image-create --is-public=True 
  --container-format=docker --disk-format=raw --name samalba/hipache

Booting from a Docker image

Finally, once Docker is enabled in Nova, you can boot an OpenStack instance from a Docker image.Just add the image to the Glance repository, and then you’ll be able to boot from it.This works like booting any other instance from a nova environment.

$ nova boot --image "samalba/hipache" --flavor m1.tiny test

After booting, you’ll see the docker instance in the openstack environment using either nova list as well as docker ps.

Conclusion

In this short tutorial series on OpenStack, we’ve covered how to install a distribution, get an instance up and running, and enable containers in just a few hours.

Read the other articles in the series:

How to Install OpenStack in Less Than an Hour

Get an OpenStack Instance Up and Running in 40 Minutes or Less

Interested in learning more OpenStack fundamentals? Check out the self-paced, online Essentials of OpenStack Administration course from The Linux Foundation Training. The course is excellent preparation for the Certified OpenStack Administrator exam.