Why Open Source Matters to Alibaba

By

-

June 24, 2018

Alibaba has more than 150 open source projects and is a long-time contributor to many others, Wei Cao, a Senior Staff Engineer at Alibaba, says that sharing knowledge and receiving feedback from the community helps Alibaba refine their projects. We spoke with Wei Cao — who is the head of Alibaba Cloud Database Department and leads the R&D of Alibaba RDS,POLARDB products — to learn more about the company’s open source focus and about some of the database-related projects they contribute to.

Linux.com: Why is open source so important for Alibaba?

Wei Cao: At present, Alibaba has more than 150 open source projects. We work on the open source projects with the aim to contribute to the industry and solve real-life problems. We share our experiences with the rest of the open source enthusiasts.

Wei Cao, Senior Staff Engineer at Alibaba

As a long-time contributor to various other open source projects, Alibaba and Alibaba Cloud have fostered a culture that encourages our teams to voluntarily contribute to various open source projects, either by sharing experiences or helping others to solve problems. Sharing and contributing to the community altogether is in the DNA of Alibaba’s culture.

When we first started to use open sources projects like MySQL, Redis, PostgreSQL, we received a lot of help from the community. Now we would like to give back to the same communities by sharing our accumulated knowledge and receive feedback from the community so that we can refine our projects.

We believe this truly represents the essence of open source development, where everyone can build on each other’s knowledge. We are dedicated to making our technology inclusive through continuously contributing to bug-fixing and patch optimization of different open source projects.

Linux.com: Can you tell us what kind of culture is within Alibaba to encourage its developers to consume and contribute to Open Source project?

Wei Cao: Alibaba has always had a culture of integrity, partnership, sharing and mutual assistance. At the same time, we always believe that more people participating in the community can promote the industry better and also make us more profitable. Therefore, our staff members are willing to pay close attention to open source projects in the community. They keep using open source projects and accumulating experience to give feedback on projects and jointly promote the development of the industry.

Linux.com: Can you tell us what kind of open source projects you are using in your company?

Wei Cao: Our database products use many open source projects such as MySQL, Redis, PostgreSQL, etc. Our teams have done feature and performance enhancement and optimization, depending on various use-cases. We have done compression for IoT and security improvements for financial industries.

Linux.com: Can you tell us about the open source projects that you have created?

Wei Cao: We will be releasing a new open source project, called Mongo-Shake, at the LC3 Conference. Based on MongoDB’s oplog, Mongo-Shake is a universal platform for services.

It reads the Oplog operation logs of a MongoDB cluster and replicates MongoDB data, and subsequently implements specific requirements through operation logs. Logs can provide a lot of scene-based applications.

Through the operation logs, we provide log data subscriptions to consume PUB/SUB functions and can be flexibly connected to adapt to different scenarios (such as log subscription, data center synchronization, Cache asynchronous elimination, etc.) through SDK, Kafka, MetaQ, etc. Cluster data synchronization is a core application scenario. Synchronization is achieved through playback after grabbing oplogs. Its Application Scenario includes:

Asynchronous replication of MongoDB data between clusters eliminates the need for double write costs.
Mirror backup of MongoDB cluster data. (Not support in this open source version)
Log offline analysis.
Log subscription.
Cache synchronization.
Through the results of the log analysis, it is known which caches can be eliminated and which caches can be preloaded to prompt the cache to be updated.
Monitor base on log.

Linux.com: Can you tell us about the major open source projects you contribute to?

Wei Cao: We have contributed many database-related open source projects. In addition, we have released open source projects, like AliSQL and ApsaraCache, which are widely used in Alibaba.

AliSQL: AliSQL is a MySQL branch, developed by Alibaba Cloud database team, and is servicing Alibaba’s business and Alibaba Cloud’s RDS. AliSQL version is verified to run many Alibaba workloads and is widely used within Alibaba cloud. The latest AliSQL also merged many useful

AliSQL does a lot of enhancement in the features and performance based on MySQL. It has more than 300 patches, We have added many monitor indicators, features, and optimized it for different user cases. enhancements from the other branches like Percona, MariaDB, WebScaleSQL, and also contains a lot of patches with Alibaba’s experiences.

In general test cases, AliSQL has 70% performance improvement over official MySQL version, according to R&D team’s sysbench benchmarks. In comparison with MySQL, AliSQL offers:

Better support for TokuDB, more monitoring and performance optimization.
CPU time statistics for SQL queries.
Sequence support.
Add Column Dynamically .
ThreadPool support. And a lot of Bugfix and performance improvements.

The founder of MySQL/MariaDB, Michael Widenius “Monty” has praised Alibaba for open sourcing AliSQL. We got a lot of help from the open source community in the early development of AliSQL.

Now open source AliSQL is the best contribution we have made to this community. We hope to continue our open source journey in future. Full cooperation with the open source community can make the MySQL/MariaDB ecosystem more robust.

ApsaraCache: ApsaraCache is based on the Redis 4.0, with additional features and performance enhancements. In comparison to Redis, ApsaraCache’s performance is independent of data size. It’s related to scenarios. It also has better performance in cases such as short connections, full memory recovery, and time-consuming instruction execution.

Multi protocol support

ApsaraCache supports both Redis and Memcached protocol with no client code need to be modified. ApsaraCache supports Memcached protocol and users can persist data by using ApsaraCache in Memcached mode just like Redis.

Reusing Redis architecture, we have developed new features of Memcache such as support for persistence, disaster tolerance, backup recovery, slow log audit, information statistics and other functions.

Ready for production

ApsaraCache has proven to be very stable and efficient during 4 years of technical grinding and tens of thousands of practical testing of production environment.

The major improvements in ApsaraCache are:

Disaster depth reinforcement refactors the kernel synchronization mechanism to solve the problem of full synchronization of native kernel caused by copy interrupt under weak network condition.
Compatible with the Memcached protocol, it supports dual copy of Memcached and offers more reliable Memcached service.
In short connection scenario, ApsaraCache makes 30% performance increase compared with the vanilla version.
ApsaraCache’s function of thermal upgrade can complete the thermal update of an instance within 3ms and solve the problem of frequent kernel upgrading on users.
AOF reinforcement, and solve the problem of Host stability caused by frequent AOF Rewrite.
ApsaraCache health detection mechanism.

This article was sponsored by Alibaba and written by Linux.com.

How to Check Disk Space on Linux from the Command Line

By

adavis

-

June 22, 2018

Quick question: How much space do you have left on your drives? A little or a lot? Follow up question: Do you know how to find out? If you happen to use a GUI desktop (e.g., GNOME, KDE, Mate, Pantheon, etc.), the task is probably pretty simple. But what if you’re looking at a headless server, with no GUI? Do you need to install tools for the task? The answer is a resounding no. All the necessary bits are already in place to help you find out exactly how much space remains on your drives. In fact, you have two very easy-to-use options at the ready.

In this article, I’ll demonstrate these tools. I’ll be using Elementary OS, which also includes a GUI option, but we’re going to limit ourselves to the command line. The good news is these command-line tools are readily available for every Linux distribution. On my testing system, there are a number of attached drives (both internal and external). The commands used are agnostic to where a drive is plugged in; they only care that the drive is mounted and visible to the operating system.

With that said, let’s take a look at the tools.

df

The df command is the tool I first used to discover drive space on Linux, way back in the 1990s. It’s very simple in both usage and reporting. To this day, df is my go-to command for this task. This command has a few switches but, for basic reporting, you really only need one. That command is df -H. The -H switch is for human-readable format. The output of df -H will report how much space is used, available, percentage used, and the mount point of every disk attached to your system (Figure 1).

Figure 1: The output of df -H on my Elementary OS system.

What if your list of drives is exceedingly long and you just want to view the space used on a single drive? With df, that is possible. Let’s take a look at how much space has been used up on our primary drive, located at /dev/sda1. To do that, issue the command:

df -H /dev/sda1

The output will be limited to that one drive (Figure 2).

Figure 2: How much space is on one particular drive?

You can also limit the reported fields shown in the df output. Available fields are:

source — the file system source
size — total number of blocks
used — spaced used on a drive
avail — space available on a drive
pcent — percent of used space, divided by total size
target — mount point of a drive

Let’s display the output of all our drives, showing only the size, used, and avail (or availability) fields. The command for this would be:

df -H --output=size,used,avail

The output of this command is quite easy to read (Figure 3).

Figure 3: Specifying what output to display for our drives.

The only caveat here is that we don’t know the source of the output, so we’d want to include source like so:

df -H --output=source,size,used,avail

Now the output makes more sense (Figure 4).

Figure 4: We now know the source of our disk usage.

du

Our next command is du. As you might expect, that stands for disk usage. The du command is quite different to the df command, in that it reports on directories and not drives. Because of this, you’ll want to know the names of directories to be checked. Let’s say I have a directory containing virtual machine files on my machine. That directory is /media/jack/HALEY/VIRTUALBOX. If I want to find out how much space is used by that particular directory, I’d issue the command:

du -h /media/jack/HALEY/VIRTUALBOX

The output of the above command will display the size of every file in the directory (Figure 5).

Figure 5: The output of the du command on a specific directory.

So far, this command isn’t all that helpful. What if we want to know the total usage of a particular directory? Fortunately, du can handle that task. On the same directory, the command would be:

du -sh /media/jack/HALEY/VIRTUALBOX/

Now we know how much total space the files are using up in that directory (Figure 6).

Figure 6: My virtual machine files are using 559GB of space.

You can also use this command to see how much space is being used on all child directories of a parent, like so:

du -h /media/jack/HALEY

The output of this command (Figure 7) is a good way to find out what subdirectories are hogging up space on a drive.

Figure 7: How much space are my subdirectories using?

The du command is also a great tool to use in order to see a list of directories that are using the most disk space on your system. The way to do this is by piping the output of du to two other commands: sort and head. The command to find out the top 10 directories eating space on a drive would look something like this:

du -a /media/jack | sort -n -r | head -n 10

The output would list out those directories, from largest to least offender (Figure 8).

Figure 8: Our top ten directories using up space on a drive.

Not as hard as you thought

Finding out how much space is being used on your Linux-attached drives is quite simple. As long as your drives are mounted to the Linux system, both df and du will do an outstanding job of reporting the necessary information. With df you can quickly see an overview of how much space is used on a disk and with du you can discover how much space is being used by specific directories. These two tools in combination should be considered must-know for every Linux administrator.

And, in case you missed it, I recently showed how to determine your memory usage on Linux. Together, these tips will go a long way toward helping you successfully manage your Linux servers.

Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.

Bloomberg Eschews Vendors For Direct Kubernetes Involvement

By

Forbes

-

June 22, 2018

Financial information behemoth Bloomberg is a big fan of Kubernetes, and is using it for everything from serving up Bloomberg.com to complex data processing pipelines.

Rather than use a managed Kubernetes service or employ an outsourced provider, Bloomberg has chosen to invest in deep Kubernetes expertise and keep the skills in-house. Like many enterprise organizations, Bloomberg originally went looking for an off-the-shelf approach before settling on the decision to get involved more deeply with the open source project directly.

“We started looking at Kubernetes a little over two years ago,” said Steven Bower, Data and Infrastructure Lead at Bloomberg. … “It’s a great execution environment for data science,” says Bower. “The real Aha! moment for us was when we realized that not only does it have all these great base primitives like pods and replica sets, but you can also define your own primitives and custom controllers that use them.”

Best Free Linux Firewalls of 2018

By

Tech Radar

-

June 22, 2018

A firewall is an important aspect of computer security these days, and most modern routers have one built in, which while helpful, can be difficult to configure. Fortunately there are also distributions (distros) of the free operating system Linux which have been specifically designed to function as firewalls.

These will generally have much more advanced features than those found on a router, and allow you to have far greater control over keeping your personal or business network safe.

In this article, we’re going to evaluate six of the most popular free firewall distros. We have tried to emphasise both power and ease of use when considering these offerings and their relative merits. If you want to see all the firewall distros available out there, feel free to visit the DistroWatch website for a comprehensive list.

GitHub: Changes to EU Copyright Law Could Derail Open Source Distribution

By

Linux.com Editorial Staff

-

June 22, 2018

A proposed European law would mandate that content providers utilize some kind of content filter to make sure rights holders get their royalties. But for a public open source code repository, such a contraption could be a nuisance, or it could be catastrophic.

The E.U. Parliament’s Legal Affairs Committee voted 14-9-2 Wednesday, Brussels time, to approve the latest draft of a directive to impose sweeping changes to the continent’s copyright protections. Ostensibly, the purpose of this Parliamentary Directive would be to ensure the accessibility of all forms of content to “cultural heritage institutions” (mainly libraries and museums). Tucked into that draft is a mandate for a method for artists and rights holders to negotiate, perhaps electronically, to negotiate for and receive royalties from the distribution of their work.

But despite a flurry of proposed amendments (some of which may not have been fully circulated among members prior to being voted down, according to one member’s objections), the Directive as it stands may fail to distinguish between a multimedia site like YouTube or Spotify, and a source code repository like GitHub or GitLab.

Zapcc High-Speed C++ Compiler Now Open Source

By

InfoWorld

-

June 22, 2018

Zapcc, a caching C++ compiler built for speed, has gone open source.

Ceemple Software, Zapcc’s builder, claims the compiler offers dramatic improvements in both incremental and full builds compared to building with Clang 4.0 and Clang 5.0. Based on heavily modified code from the Clang compiler project, Zapcc uses an in-memory compilation cache in a client-server architecture. All compilation information is remembered between runs.

6 Open Source AI Tools to Know

By

Sam Dean

-

June 21, 2018

In open source, no matter how original your own idea seems, it is always wise to see if someone else has already executed the concept. For organizations and individuals interested in leveraging the growing power of artificial intelligence (AI), many of the best tools are not only free and open source, but, in many cases, have already been hardened and tested.

At leading companies and non-profit organizations, AI is a huge priority, and many of these companies and organizations are open sourcing valuable tools. Here is a sampling of free, open source AI tools available to anyone.

Acumos. Acumos AI is a platform and open source framework that makes it easy to build, share, and deploy AI apps. It standardizes the infrastructure stack and components required to run an out-of-the-box general AI environment. This frees data scientists and model trainers to focus on their core competencies rather than endlessly customizing, modeling, and training an AI implementation.

Acumos is part of the LF Deep Learning Foundation, an organization within The Linux Foundation that supports open source innovation in artificial intelligence, machine learning, and deep learning. The goal is to make these critical new technologies available to developers and data scientists, including those who may have limited experience with deep learning and AI. The LF Deep Learning Foundation just recently approved a project lifecycle and contribution process and is now accepting proposals for the contribution of projects.

Facebook’s Framework. Facebook has open sourced its central machine learning system designed for artificial intelligence tasks at large scale, and a series of other AI technologies. The tools are part of a proven platform in use at the company. Facebook has also open sourced a framework for deep learning and AI called Caffe2.

Speaking of Caffe. Yahoo also released its key AI software under an open source license. The CaffeOnSpark tool is based on deep learning, a branch of artificial intelligence particularly useful in helping machines recognize human speech or the contents of a photo or video. Similarly, IBM’s machine learning program known as SystemML is freely available to share and modify through the Apache Software Foundation.

Google’s Tools. Google spent years developing its TensorFlow software framework to support its AI software and other predictive and analytics programs. TensorFlow is the engine behind several Google tools you may already use, including Google Photos and the speech recognition found in the Google app.

Two AIY kits open sourced by Google let individuals easily get hands-on with artificial intelligence. Focused on computer vision and voice assistants, the two kits come as small self-assembly cardboard boxes with all the components needed for use. The kits are currently available at Target in the United States, and are based on the open source Raspberry Pi platform — more evidence of how much is happening at the intersection of open source and AI.

H2O.ai. I previously covered H2O.ai, which has carved out a niche in the machine learning and artificial intelligence arena because its primary tools are free and open source. You can get the main H2O platform and Sparkling Water, which works with Apache Spark, simply by downloading them. These tools operate under the Apache 2.0 license, one of the most flexible open source licenses available, and you can even run them on clusters powered by Amazon Web Services (AWS) and others for just a few hundred dollars.

Microsoft Onboard. “Our goal is to democratize AI to empower every person and every organization to achieve more,” Microsoft CEO Satya Nadella has said. With that in mind, Microsoft is continuing to iterate its Microsoft Cognitive Toolkit. It’s an open source software framework that competes with tools such as TensorFlow and Caffe. Cognitive Toolkit works with both Windows and Linux on 64-bit platforms.

“Cognitive Toolkit enables enterprise-ready, production-grade AI by allowing users to create, train, and evaluate their own neural networks that can then scale efficiently across multiple GPUs and multiple machines on massive data sets,” reports the Cognitive Toolkit Team.

Learn more about AI in this new ebook from The Linux Foundation. Open Source AI: Projects, Insights, and Trends by Ibrahim Haddad surveys 16 popular open source AI projects – looking in depth at their histories, codebases, and GitHub contributions. Download the free ebook now.

Heather Kirksey on Integrating Networking and Cloud Native

By

The Linux Foundation

-

June 21, 2018

As highlighted in the recent Open Source Jobs Report, cloud and networking skills are in high demand. And, if you want to hear about the latest networking developments, there is no one better to talk with than Heather Kirksey, VP, Community and Ecosystem Development, Networking at The Linux Foundation. Kirksey was the Director of OPNFV before the recent consolidation of several networking-related projects under the new LF Networking umbrella, and I spoke with her to learn more about LF Networking (LFN) and how the initiative is working closely with cloud native technologies.

Kirksey explained the reasoning behind the move and expansion of her role. “At OPNFV, we were focused on integration and end-to-end testing across the LFN projects. We had interaction with all of those communities. At the same time, we were separate legal entities, and things like that created more barriers to collaboration. Now, it’s easy to look at them more strategically as a portfolio to facilitate member engagement and deliver solutions to service providers.”

Blockchain Beyond the Hype: What is the Strategic Business Value?

By

McKinsey & Company

-

June 21, 2018

Companies can determine whether they should invest in blockchain by focusing on specific use cases and their market position.

Speculation on the value of blockchain is rife, with Bitcoin—the first and most infamous application of blockchain—grabbing headlines for its rocketing price and volatility. That the focus of blockchain is wrapped up with Bitcoin is not surprising given that its market value surged from less than $20 billion to more than $200 billion over the course of 2017.1Yet Bitcoin is only the first application of blockchain technology that has captured the attention of government and industry.

Blockchain was a priority topic at Davos; a World Economic Forum survey suggested that 10 percent of global GDP will be stored on blockchain by 2027.2Multiple governments have published reports on the potential implications of blockchain, and the past two years alone have seen more than half a million new publications on and 3.7 million Google search results for blockchain.

Linux and Open-Source Jobs Are in More Demand Than Ever

By

ZDNet

-

June 21, 2018

Do you want a tech job? Then, it’s time to move away from Windows and head toward Linux and open source. According to The Linux Foundation and Dice‘s 2018 Open Source Jobs Report, 87 percent of hiring managers are having trouble finding open-source talent, while hiring open-source talent is now a priority for 83 percent of employers.

“Open source technology talent is in high demand, as Linux and other open source software dominates software development,” said Linux Foundation’s executive director, Jim Zemlin, in a statement. “I am encouraged that that companies are recognizing more and more each day that open-source technology is the way to advance their businesses. The Linux Foundation, our members, and the open source ecosystem are focused on ensuring training and certification opportunities are highly accessible to everyone who wants to seek them out.”