Thank you for your interest in the recorded sessions from LinuxCon + ContainerCon North America 2016! View more than 40+ sessions from the event below.
Luis Camacho Caballero is working on a project to preserve endangered South American languages by porting them to computational systems through automatic speech recognition using Linux-based systems. He was one of 14 aspiring IT professionals to receive a 2016 Linux Foundation Training (LiFT) scholarship, announced last month.
Luis, who is from Peru, has been using Linux since 1998, and appreciates that it is built and maintained by a large number of individuals working together to increase knowledge. Through his language preservation project, he hopes to have the first language, Quechua, the language of his grandparents, completed by the end of 2017, and then plans to expand to other Amazonian languages.
Luis Camacho Caballero has started a project to preserve endangered South American languages through automatic speech recognition using Linux-based systems.Linux.com: Can you tell me more about Quechua, the language of your parents and grandparents?
Luis Camacho Caballero: Quechua was the lingua franca used in South American Andean between V and XVI centuries. It’s strongly associated to Inca culture (1300 BC – 1550 BC) but is clearly older than that. It is still alive and used by about 8 million people distributed among Ecuador, Perú and Bolivia. However, it’s under risk of extinction because, put in practice, the only language supported by government is Spanish. Don’t misunderstand, of course, there is a national agency for heritage preservation but it hasn’t gotten momentum yet. The process of substitution is running faster and stronger than initiatives of preservation.
It’s a shame, I speak just a bit. You can taste a piece of Quechua in these funny clips: 1, 2 and 3 and even hear some famous songs here: Heaven, The way you make feel (below), and bonustrack.
Linux.com: What is your process for recording and digitizing the language?
Luis: It’s a hard process. Basically, it is composed of two parts: building a text/voice Corpus and the language processing itself.
In regard to the first part, the challenges are 1) linking both Corpora, get a exact matching of voice and text and 2) In order to make the corpora more useful, doing part-of-speech tagging, or POS-tagging, in which information about each word’s part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags.
In the part of the automatic speech recognition (ASR) itself, we are testing Artificial Intelligence algorithms looking for the one that matches better with features of the Quechua language.
Linux.com: How did you get involved in this work?
Luis: Since that first time I was exposed to English ASR, maybe six years ago, I knew that I had to do ASR for Quechua, it’s my contribution to preserve my heritage.
Linux.com: Is this a hobby, or a job for you?
Luis: Nowadays I am with PUCP, I wrote a proposal and fortunately it was granted by the Peruvian Science Foundation, so, I have resources for developing this project until Christmas 2017. Part of my job is networking with all the stakeholders and looking for more funds until we reach a complete ASR system, one at the same level of well-supported languages like English.
Linux.com: How do you plan to use your LiFT scholarship?
Luis: Linux is a wonderful platform, almost all language computational portability technology is developed over Linux. I’ve not decided yet which course fits my current needs of Linux support.
Linux.com: How will the scholarship help you?
Luis: I think the scholarship help me at least in two ways: 1) getting in touch with the more renowned expert Linux trainers and 2) getting a valuable knowledge that would otherwise would be expensive or inaccessible.
And now we have been informed by Mark Filion about some other interesting patches contributed by Collabora’s developers to the upcoming Linux 4.8 kernel. These patches promise to add huge performance improvements to emulated NVMe devices.
According to Collabora’s Helen Fornazier, it would appear that it’s currently possible to attach a local SSD (Solid State Drive) disk drive to a virtual machine in Google Cloud Engine (GCE) via an NVMe interface, but you won’t get a good number of IOPS (Input/Output Operations Per Second).
To achieve that, one needs to instantiate a virtual machine using the nvme-backports-debian-7-wheezy image based on the Debian GNU/Linux 7 “Wheezy” operating system, of course. It would also appear that this will only work with the Debian 7 Linux OS, as other distributions available in Google Cloud Engine won’t support so many IOPS.
Custom NVMe command allows for up to four times more IOPS
Helen Fornazier has discovered and tested that Google’s Virtual Machine Monitor, which now includes a custom NVMe command that could increase the number of IOPS up to five times faster. “This is from what I’ve tested so far, but it seems to be possible to get up to 5 times faster according to the original commit message; check the Technical Details sessions to see how this is possible,” says Fornazier in a blog post.
However, it looks like this command has to be present in the kernel you use, and there’s no such support in the mainline Linux kernel. And this is where Collabora’s developers will make their contributions noticed, as they have made the patch available online for anyone interested in using it.
Unfortunately, since this is an unofficial (third-party) NVMe command, it will not land in the mainline Linux kernel anytime soon. But Collabora will try its best to help the NVMe workgroup implement an official extension to standardize it, as this brings considerable performance increasements to emulated NVMe devices.
There’s little doubt that cloud computing will play an important role in data science for the foreseeable future. The flexible, scalable, on-demand computing power available is an important resource, and as a result, there’s a lot of competition between the providers of this service. Two of the biggest players in the space areAmazon Web Services (AWS) and Google Cloud Platform (GCP).
This article includes a short comparison of distributed Spark workloads in AWS and GCP—both in terms of setup time and operating cost. We ran this experiment with our students at The Data Incubator, a big data training organization that helps companies hire top-notch data scientists and train their employees on the latest data science skills.
Trend Micro researchers have discovered a stealthy new rootkit family named after Pokemon character Umbreon which could allow hackers to remotely control targeted devices.
The rootkit has been designed to target Linux systems – including those running Intel and ARM chips – meaning it could be used to access embedded computing devices, wrote senior threat researcher, Fernando Mercês.
It appears to have been written specifically for three platforms – x86, x86-64 and ARM (Raspberry Pi) – and is highly portable, having been written in pure C apart from some additional tools in Python and Bash.
Throughout my software engineering career, I’ve struggled with and against jargon. Intellectually, I understand jargon as a set of specialized terms meant to facilitate smooth and precise communication, particularly in a professional context. It binds groups together: it’s the secret handshake, the side-long wink, the showing that yes, you’re in the club too, you belong. Experientially? I know the ways jargon can keep you out as you feel along, grasping for knowledge in the dark.
Linux Containers (LXC) [1] and Docker [2], as well as software-defined network (SDN) solutions [3], make extensive use of Linux namespaces, which allow you to define and use multiple virtual instances of the resources of a host and kernel. At this time, Linux namespaces include Cgroup, IPC, Network, Mount, PID, User, and UTS.
Network namespaces have been in the admin’s toolkit, ready for production, since kernel 2.6.24. In container solutions, network namespaces allow individual containers exclusive access to virtual network resources, and each container can be assigned a separate network stack. However, the use of network namespaces also makes great sense independent of containers.
In my previous article, What is an open source program office? And why do you need one?, I introduced the idea of an open source program office (OSPO) and discussed what they do, why a company would want to create one, and how to optimize them. In that article I focused on technology vendors and for a good reason—they were the first to embrace open source program offices strategically.
From IBM and Intel to Oracle and even Microsoft, open source program offices were all the rage at technology companies from 1999 through about 2005.
Linux is a free and open source operating system. However, Linux (and another open source operating system) can use and load device drivers without publicly available source code. These are vendor-compiled binary drivers without any source code and known as Binary Blobs. Die hard open source fans and Free Software Foundation (FSF) recommends completely removing all proprietary components including blobs.
Top 5 Reasons to Avoid Binary Blobs
Modification & distribution – Binary blobs can not be improved or fixed by open source developers. You can not distribute modified versions.
Reliability – Binary blobs can be unsupported by vendors at any time by abandoning driver maintenance.
Auditing – Binary blobs cannot be audited for security and bugs. You are forced to trust vendors not to put backdoor and spyware into the blob.
Bugs – Binary blobs hide many bugs. Also, it can motivate people to buy new hardware.