Home Blog Page 374

How to Use find in Linux

In a recent Opensource.com article, Lewis Cowles introduced the find command.

find is one of the more powerful and flexible command-line programs in the daily toolbox, so it’s worth spending a little more time on it.

At a minimum, find takes a path to find things. For example:

find /

will find (and print) every file on the system. And since everything is a file, you will get a lot of output to sort through. This probably doesn’t help you find what you’re looking for. You can change the path argument to narrow things down a bit, but it’s still not really any more helpful than using the ls command. So you need to think about what you’re trying to locate.

Perhaps you want to find all the JPEG files in your home directory. The -name argument allows you to restrict your results to files that match the given pattern.

Read more at OpenSource.com

Turing Test 2

In 1950, Alan Turing wrote a paper entitled “Computing Machinery and Intelligence.”a He proposed a test in which a human attempts to distinguish between a human and a computer by exchanging text messages with each of them. If the human is unable to distinguish between the two, the computer is said to have passed the “Turing Test.”…

Much has been written about the increasingly sophisticated ability of computer programs to pass the CAPTCHA tests or a variation in which the program sends the image to a human on the Internet who is given some benefit or payment for solving the problem, which is then relayed by the imitating program to the computer program running the CAPTCHA test. This is not merely an amusing game. As computer programs have grown capable of more sophisticated behavior, they are being used to emulate humans to fool less-sophisticated programs into treating computer-generated actions as if they originate from a human. This is an important practical problem because failure to make this distinction may mean malicious programs can register millions of fake identities on an email system for purposes of sending phishingc email messages or making comments on social media Web pages.

Read more at Communications of the ACM

Cloud Computing in Focus: Serverless, Microservices, KubeCon + CloudNativeCon, and More

Cloud computing concepts can seem as nebulous as clouds themselves, but in April, we published several cloud-related articles to help clarify a few underlying ideas and look at some specific implementations.

This month, Swapnil Bhartiya tackled the subject of serverless computing with There’s a Server in Every Serverless Platform. According to a recent whitepaper from the Cloud Native Computing Foundation (CNCF) Serverless Working Group, “serverless computing refers to the concept of building and running applications that do not require server management.” However, as Bhartiya explains, there are still servers involved.

Also, with the rise of containers, many companies have started to break monoliths into microservices. In Microservices Explained, Bhartiya described how this approach offers a way to break down complex applications and allow components to evolve independently. He talked with Docker’s Patrick Chanezon, who said, “The idea is that you are building your application as a set of loosely coupled services that can be updated and scaled separately under the container infrastructure.”

In other cloud news, KubeCon + CloudNativeCon Europe is happening this week in Copenhagen, Denmark, and those in attendance can look forward to three days of talks, co-located events, and collaboration focused on cloud-native computing. Not everyone can be there in person, though, so Linux.com has been running a series of articles to preview a few of the featured conference talks.   

The following articles will give you a taste of the event and help you learn the latest about containers, cloud, Kubernetes, and more.

Put Wind into your Deployments with Kubernetes and Helm by Eldad Assis

Kubernetes is known for the ease with which you can spin up a cluster, deploy your applications to it, and scale it to your needs. This article shows how easy it can be to run and test your code in a production-like environment.

Extending the Kubernetes Cluster API by Henrik Schmidt

The Cluster API is a new working group under the umbrella of the sig-cluster-lifecycle that aims to enable you to create clusters and machines via a simple, declarative API. The working group is in the very early stages of defining all API types, but Henrik Schmidt has more details in this article.

CRI: The Second Boom of Container Runtimes by CNCF

Harry (Lei) Zhang and Xu Wang, will present “CRI: The Second Boom of Container Runtimes”  this week at KubeCon + CloudNativeCon Europe. In this article, Zhang provides some background on CRI, container runtimes, KataContainers, and how they all fit together.

Extending Kubernetes API for Complex Stateful Applications using Operator by Anil Kumar

Kubernetes 1.5 includes the new StatefulSet API object, which gives you a set of resources to deal with stateful containers, such as volumes and stable network ids. Learn more from CouchBase’s Anil Kumar.

Fluent Bit: Flexible Logging in Kubernetes by Eduardo Silva

Logging in containerized environments involves new challenges that need to be addressed. In this article, Treasure Data’s Eduardo Silva describes the current status of the Fluentd ecosystem and looks at improvements in the new Fluent Bit v0.13 release that will be of interest to Kubernetes users.

For more information, read the CNCF Working Group’s serverless whitepaper. And, check out the whole schedule of events at KubeCon + CloudNativeCon, happening May 2-4 in Copenhagen, Denmark.

The Open Source Roots of Machine Learning

The concept of machine learning, which is a subset of artificial intelligence, has been around for some time.  Ali Ghodsi, an adjunct professor at UC Berkeley, describes it as “an advanced statistical technique to make predictions on a massive amount of data.”  Ghodsi has been influential in areas of Big Data, distributed systems, and in machine learning projects including Apache Spark, Apache Hadoop, and Apache Mesos. Here, he shares insight on these projects, various use-cases, and the future of machine learning.  

Building a business around AI

There are some commonalities among these three projects that have been influenced by Ghodsi’s research. All have successful companies and business models built around them — Databricks for Apache Spark, Mesosphere for Apache Mesos, and Hortonworks in the case of Apache Hadoop.

“Around 2008-2009, people were building interesting and exciting use cases around artificial intelligence and machine learning. If you could use modern machines — hardware like GPUs — to scale your data and combine Big Data with AI, you could get fantastic results,” said Ghodsi, “But very few companies were successful at that. Everyone else had a hard time succeeding in such efforts. That’s why all these open source projects started.”

Open source theme

Another common theme across these projects and companies is open source. “We wanted to democratize that kind of technology. So, we created them as open source projects, with a community around it to enable contributions,” Ghodsi said.

These companies continue to be the leading contributors to their respective projects. For example, Databricks is one of the major contributors to Apache Spark project, but it’s not the sole contributor to the project. A true open source project won’t succeed if there isn’t a diverse community around it.  “If you want millions of developers around the world to use a new API that you’ve built, it better be open sourced. If it’s not open source, it will be challenging to get people to adopt it,” said Ghodsi.

A bit about machine learning

Machine learning replaces manual, repeatable processes. Such systems existed previously, but they used different models to achieve automation. “A lot of previous systems were rule-based: if this happens, then do that,” said Ghodsi. “But if you want to moderate billions of chat messages in real time, you can’t do that manually with people sitting around and monitoring everything.”

“You can’t do that with rule-based techniques either. The problem with rule-based techniques is that there is always a way to game them and go around them. The best way of doing it is to have a real-time machine learning engine that can be trained,” he said.

Ghodsi provided an example of a “very large company” that serves billions of people with its free chat application. Most of its users are teenagers. The company is using artificial intelligence, machine learning, and natural language processing to automatically detect any foul language or any activity that’s alarming.

The chat messages go through machine learning tools and are labeled accordingly. Over time, the machine learning algorithm starts seeing patterns in age, timing, length of messages, etc.  It will find those patterns instead of a person setting rules. It can’t be gamed, as it continues to evolve. The biggest flaw with the traditional rule-based system is that once someone figures out a way to go around those rules, it takes time to create a new set of rules and then update those rules. It’s a cat and mouse game. Machine learning overcomes that problem and becomes a very powerful tool in such cases.

Another use-case of machine learning is credit card fraud. “Every time you swipe a credit card, machine learning can help detect if it was a fraudulent swipe. It can detect anomalies in real time. Another use case is the security of corporate networks. You have billions of packages coming into your corporate network, how would you know one of them is an attack? Machine learning enables you to do that effectively in real time,” said Ghodsi.

Machine learning helping machine learning

Machine learning is also being used to help make the IT infrastructure stack intelligent, secure, and efficient. Any IT stack produces a massive among of logs that get stored somewhere. Customers pay for storage, but no one actually looks at those logs unless something breaks.  Machine learning helps mine these logs and improve the overall stack.

Databricks is using machine learning to improve Apache Spark itself. They are looking at error messages hitting customers, increase in latencies, etc. “If you look at the RPC messages that are sent to a Big Data cluster, running Apache Spark, how do you detect if it’s getting slower? Even if there is a one millisecond delay, it will have a big impact on the performance of Spark itself.  You can mine those logs that you have collected from the whole Spark computation and then use machine learning to actually optimize the stack itself,” said Ghodsi.

Mixing machine learning with cloud

Cloud, whether public or private, has become an integral part of modern IT infrastructure. Databricks users want to take advantage of the cloud. According to Ghodsi, a lot of Databricks users are using Azure Storage, Cosmos DB, SQL Data Warehouse. These users wanted better integration with Databricks. “We wanted to remove friction and enable these companies to democratize artificial intelligence,” said Ghodsi.

Databricks has partnered with Microsoft to bring Azure to its customers. As a result of combined efforts of the two companies, users get a tightly integrated solution. Ghodsi reinforced that integration is critical because it’s very challenging for enterprise users to build these AI applications. Machine learning doesn’t just happen. You don’t just write the code, and it’s done. You need to go back to the data. You need to combine it with other data sources that are coming from various places. You need to iterate back and forth.

Ghodsi provided an example of the healthcare industry. He mentioned a healthcare player who is using natural language processing for medical records, analyzing and building phenotype databases. Let’s say one patient has type two diabetes. They have the sequenced genome of the patient. The company uses machine learning to combine these two sets of data sources to find out which genome is responsible for which type two diabetes. They can use it to develop better drugs. They are dealing with a large amount of data that needs to run on a secure, compliance and scalable cloud, so they need to leverage the cloud with Apache Spark capabilities.

It’s future proof

Machine learning is going to play a massive role in the coming years. “Machine learning is going to create many new jobs,” said Ghodsi. “Putting my UC Berkeley hat on, I see tremendous interest from the students who want to study machine learning. We are going to see a generation of data scientists that doesn’t yet exist. They will think of things that you and I are not smart enough to think of. The next generation will come up with even better technologies and ideas.”

Implementing Advanced Scheduling Techniques with Kubernetes

One of the advantages of using an advanced container orchestration tool like Kubernetes is the flexible scheduler. This provides a wide range of options for users to specify conditions for assigning pods to particular worker nodes that satisfy a condition, and not just based on available resources of the node. In order to explain how Kubernetes makes decisions about placing pods on correct hosts, we can look at the following simplified diagram of a Kubernetes master and a few of its components…

The master API (kube-apiserver) is an instrument that provides read/write access to the cluster’s desired and current state. Components like the scheduler can use the master API to retrieve current state information, apply some logic and calculations, and update the API with new information about the desired state (like specifying to which node a new pod will be scheduled, or which pod should be moved to another node).  In addition, cluster users and administrators can update the cluster state or view it through the Kubernetes dashboard, which is a UI that provides access to the API. CI/CD pipelines can also create new resources or modify existing ones using the API.

Read more at The New Stack

Achieve Resilient Cloud Applications Through Managed DNS

As the pace and complexity of application development and delivery have increased, the role of DNS has changed considerably. Originally, DNS was a simple on-premises location service for matching IP addresses to correct hostnames. As applications have moved from local data centers to the public cloud, the role of DNS has expanded to the current role of a sophisticated director, controlling global and site load balancing, traffic steering, and providing intelligent response to user requests.

Modern hybrid applications typically utilize public cloud components, including content delivery networks and cloud storage. All of these components need to communicate seamlessly despite any connectivity issues—meaning resiliency is critical. Historically, applications were developed in a self-contained localized data center, making connectivity issues smaller in scale and therefore easier to solve. 

Read more at O’Reilly

Manual Work Is a Bug: Always Be Automating

Let me tell you about two systems administrators I know. Both were overloaded, busy IT engineers. Both had many repetitive tasks to do. Both wanted to automate these tasks. After observing these two people for a year, I noticed that one made a lot of progress, while the other one didn’t. It wasn’t a matter of skill—both were very good software engineers. The difference was their approach, or mindset.

I’d say that the successful one had a mindset of always thinking in terms of moving toward the goal of a better automated system. … 

The successful engineer realizes that the earlier he starts collaborating, the sooner others can contribute. Together they can create a culture of documentation that spreads throughout the team. Thus, every project is collaborative and has a “stone soup” feeling, as all are invited to bring their skills and insights. The more people who embody this culture, the more success it has.

This culture can be summarized in two sentences: (1) Every manual action must have a dual purpose of completing a task and improving the system. (2) Manual work should not be tolerated unless it generates an artifact or improves an existing one.

Read more at ACM Queue

Pop OS 18.04 Bursts onto the Linux Scene

Meet the Linux distribution pushing hard to design an efficient and creative environment for users….

Where Linux excels is in the fields of computer science, engineering, and DevOps – this is where our customers live. It’s important for us to make sure we create the most productive computer environment for them to be efficient, free, and creative. During the first Pop!_OS release, we addressed the most common pain points we heard from customers with the Linux desktop:

  • The time it takes to set up a productive environment.
  • Removing bloatware.
  • Up-to-date drivers and software.
  • A fast app center that works well.

All of these items were fixed in the first Pop!_OS release. Additionally, it was also important that Pop!_OS provide a pleasant experience for non-System76 customers. 

Read more at TechRadar

Linux Mint 19 “Tara” Won’t Collect or Send Any of Your Personal or System Data

Now that Canonical released the Ubuntu 18.04 LTS (Bionic Beaver) operating system, on which Linux Mint 19 “Tara” will be based, it’s time for the Linux Mint team to finalize their releases. There’s still no fixed release date for Linux Mint 19 “Tara,” nor LMDE (Linux Mint Debian Edition) 3, but Clement Lefebvre said they will arrive soon.

Another interesting thing in Linux Mint 19 “Tara” is that it won’t collect or send any personal or system data as Clement Lefebvre confirmed today the operating system would not include the “ubuntu-report” that Canonical implemented in Ubuntu 18.04 LTS (Bionic Beaver) to allow users to optionally send their data.

Read more at Softpedia

How Open Source Is Powering the Modern Mainframe

When I mention the word “mainframe” to someone, the natural response is colored by a view of an architecture of days gone by — perhaps even invoking a memory of the Epcot Spaceship Earth ride. This is the heritage of mainframe, but it is certainly not its present state.

From the days of the System/360 in the mid 1960s through to the modern mainframe of the z14, the systems have been designed along four guiding principles of security, availability, performance, and scalability. This is exactly why mainframes are entrenched in the industries where those principles are top level requirements — think banking, insurance, healthcare, transportation, government, and retail. You can’t go a single day without being impacted by a mainframe — whether that’s getting a paycheck, shopping in a store, going to the doctor, or taking a trip.

What is often a surprise to people is how massive open source is on mainframe. Ninety percent of mainframe customers leverage Linux on their mainframe, with broad support across all the top Linux distributions along with a growing number of community distributions. Key open source applications such as MongoDB, Hyperledger, Docker, and PostgreSQL thrive on the architecture and are actively used in production. And DevOps culture is strong on mainframe, with tools such as Chef, Kubernetes, and OpenStack used for managing mainframe infrastructure alongside cloud and distributed.

Learn more

You can learn more about open source and mainframe, both the history along with the current and future states of open source on mainframe, in our upcoming presentation. Join us May 15 at 1:00pm ET for a session led by Open Mainframe Project members Steven Dickens of IBM, Len Santalucia of Vicom Infinity, and Mike Riggs of The Supreme Court of Virginia.

In the meantime, check out our podcast series “I Am A Mainframer” on both iTunes and Stitcher to learn more about the people who work with mainframe and what they see the future of mainframe to be.

This article originally appeared at The Linux Foundation.