Home Blog Page 490

Unlike Oil and Water, Legacy and Cloud Mix Well

For all the hype about moving applications to the cloud and making legacy apps “cloud-native,” those of us in IT have a poorly-kept secret: legacy systems are alive and well – and they’re not going anywhere anytime soon. Though the cloud promises the cost savings and scalability that businesses are eager to adopt, many organizations are not yet ready to let go of existing applications that required massive investments and have become essential to their workflows.

The process of rewriting these often mission-critical apps for the cloud typically ends up being lengthy and expensive, with unexpected problems that vary from company to company. Some of the challenges an organization will face when rewriting applications include:

1. Latency Issues…

Read more at InsideHPC

Getting Started with GitHub

Github is an online platform built to promote code hosting, version control and collaboration among individuals working on a common project.  Projects can be handled from anywhere through the platform. (Hosting and reviewing code, managing projects and building software with other developers around the world) The GitHub platform offers project handling to both open-source and private projects. 

Features offered in regards to team project handling include; GitHub Flow and GitHub Pages. These functions make it easy for teams with regular deployments to in handling the workflow. GitHub pages, on the other hand, provides a place for showcasing open source projects, displaying resumes, hosting blogs among others.  

Individual projects can also be easily handled with the aid of GitHub as it provides essential tools for projects handling. It also makes it easier to share one’s project with the world.

Read more at LinuxandUbuntu

Xen Hypervisor Patched for Privilege Escalation and Information Leak Flaws

The Xen Project has fixed five new vulnerabilities in the widely used Xen virtualization hypervisor. The flaws could allow attackers to break out of virtual machines and access sensitive information from host systems.

According to an analysis by the security team of Qubes OS, an operating system that relies on Xen for its security model, most of the vulnerabilities stem from the mechanism that’s used to share memory between domains. Under Xen, the host system and the virtual machines (guests) run in separate security domains.

The most severe vulnerability is located in the memory management code for paravirtualized (PV) VMs and allows for a guest to escalate its privilege to that of the host,…

Read more at The New Stack

Want to be a Software Industry Influencer? Get Involved in Open Source

SD Times recently recognized The Linux Foundation among the top innovators and leaders in software development in its annual SD Times 100 list.

The LF was honored to be named a top Influencer, along with ten other industry heavyweights including Apple, Facebook, GitHub, Google, IBM, Intel, Microsoft, Netflix, Red Hat, and Slack.

Does this list look familiar? It should. Each of the companies on the influencers list makes significant contributions to the open source community (bonus points for those who know that most are also members of The Linux Foundation).

Open source has long been a de facto standard for development and the companies on the influencers list pioneered this approach with their own products and services. At the same time, they have led the IT revolution in massively scalable cloud computing, AI, social networking, and many other innovations, and continue to do so. This is not a coincidence.

Read more at The Linux Foundation

See Session Highlights for Upcoming OS Summit and Embedded Linux Conference in Prague

Check out the newly released conference schedules for Open Source Summit Europe and the co-located Embedded Linux Conference Europe, taking place simultaneously October 23-26 in Prague, Czech Republic. This year’s lineup features more than 200 sessions presented by experts from Comcast, Docker, Red Hat, Siemens AG, Amazon, and more.

Open Source Summit Europe combines LinuxCon, ContainerCon, and CloudOpen conferences with the all new Open Community Conference and Diversity Empowerment Summit and is the premier open source technical conference in Europe, gathering 2,000 developers, admins, and community leadership professionals to collaborate, share information and learn about the latest in open technologies.

The co-located Embedded Linux Conference Europe — now in its 12th year — is the place to collaborate with peers on all aspects of embedded Linux, from the hardware to user space development.

In addition to previously announced keynote speakers, more 200 educational sessions are on offer at Open Source Summit and Embedded Linux Conference.  

Session highlights at Open Source Summit Europe include:

  • Love What You Do, Everyday! – Zaheda Bhorat, Amazon Web Services

  • The Rise of Open Source in the Manufacturing Industry – Steffan Evers, Bosch Software Innovations GmbH

  • DIY Open-Source Data Lakes and You – Ashley Hathaway, Stitch Data

  • Detecting Performance Regressions In The Linux Kernel – Jan Kara, SUSE

  • Highway to Helm: Deploying Kubernetes Native Applications – Michelle Noorali, Microsoft

  • Deploying and Scaling Microservices with Docker and Kubernetes – Jérôme Petazzoni, Docker

  • printk() – The Most Useful Tool is Now Showing its Age – Steven Rostedt, VMWare

  • Every Day Opportunities for Inclusion and Collaboration – Nithya Ruff, Comcast

  • Beyond Your Code: Building A Successful Project Community – Ruth Suehle, Red Hat

  • Multi-repo, Multi-node Gating at Massive Scale – Monty Taylor, Red Hat

Session highlights at Embedded Linux Conference Europe include:

  • KEYNOTE: Jan Kiszka, Senior Key Expert, Siemens AG

  • Continuous Integration: Jenkins, libvirt and Real Hardware – Anna-Maria Gleixner, Linutronix GmbH

  • Linux-based RTOS Platform for Constructing Self-Driving Vehicles – Jim Huang, South Star Xelerator (SSX)

  • Orchestrated Android-Style System Upgrades for Embedded Linux – Diego Rondini, Kynetics

The complete Open Source Summit schedule can be viewed here, and the schedule for Embedded Linux Conference can be viewed here.

Registration is discounted to $800 through August 27, and academic and hobbyist rates are also available. Applications are also being accepted for diversity and needs-based scholarships. Linux.com readers receive an additional $40 off with code OSSEULDC20. Register Now!

DeepSPADE (alias DeepSmokey): A Machine-Learning System That Collects Spam from the Internet

This blog is about a deep learning system I’ve created, called DeepSPADE (alias DeepSmokey) and how it’s being used to build better Internet communities.

To begin, what is DeepSPADE, and what does it do?

DeepSPADE stands for Deep Spam Detection, and the basic point is for machine learning to do a Natural Language Classification task to differentiate between spam and non-spam posts on public community forums.

One such website is Stack Exchange (SE), a network of over 169 different web forums for everything ranging from programming, to artificial intelligence, to personal finance, to Linux, and much more!

Stack Overflow (SO), a community forum part of SE that’s dedicated to general programming, is the world’s most popular forum site for coders. With over 14,500,000 questions asked during the seven years it’s been up, and 6,500,000 of those questions answered, you can see how popular it truly is.

However, like any public website, Stack Overflow is cluttered with garbage. While most members of this community are legitimately interested in sharing their knowledge or getting help from others, there are some who seek to spam the website. In fact, there are more than 30 spam posts everyday on SO, on average.

To combat this, the  SmokeDetector system was designed and developed by a group of programmers, called Charcoal SE. SmokeDetector uses massive RegEx to try and find spam messages based on their content.

Once I, a big supporter of Machine Learning, found out they used RegEx for their spam classification, I immediately shouted “Why not Deep Learning?!?” This idea was welcomed by the Charcoal Community; in fact, the reason they hadn’t incorporated it earlier was that they didn’t have anybody who worked with machine learning. I joined the Charcoal Community and began developing DeepSPADE to contribute towards their mission.

The DeepSPADE Model

DeepSPADE uses a combination of Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to run this classification task. The word-vectors it uses to actually understand the natural language that it’s given are word2vec vectors trained before the actual model’s training starts. However, during model training, the vectors are fine-tuned to achieve optimal performance.

The Neural Network (NN) is designed in Keras with a Tensorflow (TF) back end (TF provides significant performance gains over Theano), and Figure 1 shows a very long diagram of the model itself:

Y8G62Zk3Irb451e_Mu8Gofv220dqVwuJjXO6GpfV

image?w=463&h=31&rev=25&ac=1

As you can see, the model I’ve designed is very deep. In fact, not only is it deep, it’s a parallel model.

Let’s start off with a question that a lot of people have: Why are you using CNNs and GRUs? Why not just either one of those layers?

The answer lies deep within the actual working of these two layers. Let’s break them down:

CNNs understand patterns in data that aren’t time-bound. This means that the CNN doesn’t look at the natural language in any specific order, it just looks at the Natural Language like an array of data without order. This is helpful if there is a very specific word that we know is almost always related to spam or non-spam.

GRUs (or RNNs – Recurrent Neural Networks – in general) understand patterns in data that are very specifically arranged in a time-series. This means that the RNN understands the order of words, and this is helpful because some words may convey entirely different concepts based on how they work.

When these two layers are combined in a specific way to highlight their advantages, the real magic happens!

In fact, to explain why the combination is so powerful, take a look at the following “evolution” of the accuracy of the DeepSPADE system on 16,000 testing rows:

  • 65% – Baseline accuracy with Convolutional Neural Networks

  • 69% – With deeper Convolutional Neural Networks

  • 75% – With introduction of higher quantity & quality of data

  • 79% – With small improvements to model

  • 85% – With LSTMs introduced along with CNN model (no parallelism)

  • 89% – With higher embedding size, deeper CNN and LSTM

  • 96% – With GRUs instead of LSTMs, more Dropout, more Pooling, and higher embedding size

  • 98.76% – With Parallel model & higher embedding size

The answer, again, lies in how the CNN itself works: It has a very strong ability to filter out noise and look at the signal of some content – plus, the performance (training/inference time) is much greater compared to that of an RNN.

So, the three Conv1D+Dropout+MaxPool groups in the beginning act as filters. They create many representations of the data with different angles of the data portrayed in each. They also work to decrease the size of the data while preserving the signal.

After that, the result of those groups splits into two different parts:

  • It goes into a Conv1D+Flatten+Dense.

  • It goes into a group of 3 GRU+Dropout, and then a Flatten+Dense.

Why the parallelism? Because again, both networks try and find different types of data. While the GRU finds ordered data, the CNN finds data “in general”.

Once the opinion of both Neural Nets is collected, the opinions are concatenated and fed through another Dense layer, which understands patterns and relationships as to when each Neural Network’s results or opinions are more important. It does this dynamic weighting and feeds into another Dense layer, which gives the output of the model.

Finally, this system can now be added to SmokeDetector, and its automatic weighting systems can begin incorporating the results of Deep Learning!

Plus, this system is trained, tested, and used entirely on Linux servers! Of course, Linux is an amazing platform for such software, because the hardware constraints are practically nil, and because most great development software is supported primarily on Linux (Tensorflow, Theano, MXNet, Chainer, CUDA, etc.).

I love open source software – doesn’t everyone? And, although this project isn’t open source just yet, there is a great surprise awaiting all of you soon!

Tanmay Bakshi, 13, is an Algorithm-ist & Cognitive Developer, Author and TEDx Speaker. He will be presenting a keynote talk called “Open-Sourced Inspiration – The Present and Future of Tech and AI” at Open Source Summit in Los Angeles. He will also present a BoF session discussing DeepSPADE.

Check out the full schedule for Open Source Summit here. Linux.com readers save on registration with discount code LINUXRD5. Register now!

Creating Better Disaster Recovery Plans

Five questions for Tanya Reilly: How service interdependencies make recovery harder and why it’s a good idea to deliberately and preemptively manage dependencies.

I recently asked Tanya Reilly, Site Reliability Engineer at Google, to share her thoughts on how to make better disaster recovery plans. Tanya is presenting a session titled Have you tried turning it off and turning it on again? at the O’Reilly Velocity Conference, taking place Oct. 1-4 in New York.

1. What are the most common mistakes people make when planning their backup systems strategy?

The classic line is “you don’t need a backup strategy, you need a restore strategy.” If you have backups, but you haven’t tested restoring them, you don’t really have backups. Testing doesn’t just mean knowing you can get the data back; it means knowing how to put it back into the database, how to handle incremental changes, how to reinstall the whole thing if you need to. It means being sure that your recovery path doesn’t rely on some system that could be lost at the same time as the data.

Read more at O’Reilly

Your Serverless Raspberry Pi Cluster with Docker

This blog post will show you how to create your own Serverless Raspberry Pi cluster with Docker and the OpenFaaS framework. People often ask me what they should do with their cluster and this application is perfect for the credit-card sized device – want more compute power? Scale by adding more RPis.

“Serverless” is a design pattern for event-driven architectures just like “bridge”, “facade”, “factory” and “cloud” are also abstract concepts – so is “serverless”. …

We’ll be using OpenFaaS which lets you turn any single host or cluster into a back-end to run serverless functions. Any binary, script or programming language that can be deployed with Docker will work on OpenFaaS and you can chose on a scale between speed and flexibility. The good news is a UI and metrics are also built-in.

Read more at Alex Ellis Blog

Trending Developer Skills, Based on My Analysis of “Ask HN: Who’s Hiring?”

A few years ago, I became curious about identifying emerging technologies and predicting them. So I created Hacker News Hiring Trends, or HN Hiring Trends for short. Hacker News is one of the most popular discussion boards for programmers. It is also one of the best places to discover new technologies. Every month Hacker News hosts a thread called “Ask HN: Who is Hiring?” Users also post jobs opportunities from their companies on this thread.

The fact that these job opportunities are posted monthly and that most are from start-ups (new technologies are usually created or used in start-ups) makes this the ideal environment to capture data. Data which can be used to discover trends. 

Let’s dig into the latest trends.

Read more at freeCodeCamp

Spyware Backdoor Prompts Google to Pull 500 Apps with >100m Downloads

At least 500 apps collectively downloaded more than 100 million times from Google’s official Play Market contained a secret backdoor that allowed developers to install a range of spyware at any time, researchers said Monday.

The apps contained a software development kit called Igexin, which makes it easier for apps to connect to ad networks and deliver ads that are targeted to the specific interests of end users. Once an app using a malicious version of Igexin was installed on a phone, the developer kit could update the app to include spyware at any time, with no warning.

Read more at Ars Technica