Python vs. R: The Battle for Data Scientist Mind Share

By

-

April 11, 2017

The difference between Python and R is largely philosophical. One is a full-service language developed by Unix scripters that happened to be adopted by stat heads, big data junkies, and social scientists. The other is a tool for data analysis designed and built by stat heads, big data junkies, and social scientists.

The crowds couldn’t be more similar, but the approach is very different. One is a general-purpose tool with many libraries that can help. The other is built specifically for big data analysis.

Which should you choose? Following is a head-to-head comparison to make your decision easier.

How to Configure Autofs (AutoMount) in Linux

By

Balamukunda Sahu

-

April 11, 2017

Autofs also referred as Automount is a nice feature in linux used to mount the filesystems automatically on user’s demand. There are two ways available in linux by which we can mount the file system i.e. /etc/fstab and another one is Autofs. /etc/fstab is used to mount the filesystems automatically at when system bootsup and Autofs is also doing the same thing.

Difference Between /etc/fstab and Autofs (AutoMount)

You might thinking that if both are doing the same thing then why to use Autofs (Automount) instade of /etc/fstab. and what is the difference between /etc/fstab and Autofs. Here I am going to explain you what is the exact difference between /etc/fstab and Autofs.

As we know that /etc/fstab is used for permanent mounting of file systems but it will be useful only if you have less mount points connected to your /etc/fstab file but if you are working on a large organisation and have so many mount points linked to your /etc/fstab file then in that case your overall system’s performance will be get effected.

But Autofs mounts the file systems on user’s demand. Bydefault the mount point’s configured in Autofs is in unmounted state till the user access the mount point, once user try to access the mount point it will mount automatically and if user dont use the mount point for some time then it will automatically go to unmount state.

Read Full Post

Take Our Cloud Providers Survey and Enter to Win a Raspberry Pi

By

The Linux Foundation

-

April 10, 2017

Some of today’s most dynamic and innovative free and open source software (FOSS) projects boast significant investment and involvement by well-known cloud service and solution providers. We are launching a survey to better understand the perception of these solution providers by people engaging in open source communities.

In both enterprise and tech, FOSS adoption and deployment rates today reach 78%, with 65% of companies also contributing to FOSS projects, according to The Future of Open Source survey (2016 and 2015). Leading edge, innovative organizations make even greater investments in open source, fielding software stacks comprised of over 90% FOSS (Gartner.)

Such increasingly visible participation and application of substantial corporate resources has been one of the key drivers of the success of open source software. However, some companies still face challenges:

Companies may consume large amounts of code, but struggle when they have little to no participation in critical projects they leverage leading to larger issues when they want to influence the code
Companies hire FOSS project maintainers without establishing a strategy or larger commitment to open source (resulting in issues retaining FOSS developers long-term)
Companies can make compliance mistakes around adhering to the terms of the FOSS licenses.

The experiences open source community members have with different companies can and do impact perception of those companies among FOSS developers and other community participants. If companies want to be trustworthy participants in FOSS projects, they need to invest to build the appropriate strategies, engaging through participation and building license compliance into their processes.

Corporate Open Source Survey

The Linux Foundation has been commissioned to survey FOSS developers and users about their opinions, perceptions, and experiences with 5 top cloud solution and service providers that engage in and use open source software.

By completing this survey, you will be eligible for a drawing for one of ten Raspberry Pi 3 starter kits, complete with case, cables, power supply, and other accessories. By opting in for a follow-on interview, you will entered in a drawing for one of five additional kits. The survey will remain open for the next three weeks until 12 a.m. EST on April 28, 2017.

Take the Survey Now

Toward Better Collaboration

Enterprise adoption of open source starts with consumption and over time can advance stepwise to project participation and contribution. Engagement in FOSS may begin with individual employees, then advance to more strategic corporate participation. If open source software becomes a key dependency for a product or service, there may also be interest in making technical and marketing investments. With feedback from survey participants, we can better understand perception of cloud solution providers’ participation in FOSS communities and help understand how companies leveraging FOSS can work toward better open source citizenship.

Drawing Rules

At the end of two weeks, The Linux Foundation (LF) will randomly choose ten (10) respondents to receive a Raspberry Pi 3 starter kit (“prize”). Survey participants who opt-in for follow-on interviews will be eligible for a second drawing of five (5) additional Raspberry Pi 3 kits.
A participant is only eligible to win one prize for this drawing and after winning a first prize will not be entered into any additional prize drawings for this promotion.You must be 18 years or older to participate. Employees, vendors and contractors of The Linux Foundation and their families are not eligible, but LF project participants and employees of member companies are encouraged to complete the survey and enter the drawing
To enter the drawing, you must only complete the contact info (name, email, etc.). Completing the contact info will constitute an “entry”. Any participant submitting multiple entries may be disqualified without notice. The Linux Foundation reserves the right to disqualify any participants if for any reason inaccurate or incomplete information is suspected.
There is no cash equivalent and no person other than the winning person may take delivery of the prize(s). The prize may not be exchanged for cash.
The deadline for participation in the drawing is open until 12 a.m. EST on May 5, 2017. Any participants completing a survey after the deadline will not be entered into the drawing. The survey may remain open to participate beyond the drawing deadline.
Entries will be pooled together and a winner will be randomly selected. The winner will be notified via e-mail. The winner’s name, city, and state of residence will be directly contacted and may posted on our respective social media/marketing outlets (Linux.com, Twitter, Facebook, Google+, etc.). Winners have 30 days to respond to our contact or a new drawing for the prize will be made.

How Google’s Borg Inspired the Modern Datacenter

By

Carla Schroder

-

April 10, 2017

In part one of this series, What Is Kubernetes?, and in part two, Why Choose Kubernetes to Manage Containerized Applications?, we learned what Kubernetes does, its architecture, and how it compares to similar container orchestrators. Now we’ll learn how Kubernetes was descended from the secret Google Borg project.

The Borg Heritage

Kubernetes is distinguished from similar container orchestration systems, such as Apache Mesos and Docker Swarm, by its Google heritage. Kubernetes was inspired by Borg, the very advanced internal datacenter management system used by Google for a decade. Nearly all of Google’s services run in containers, both internal and external services such as Gmail, Google search, Google Maps, MapReduce, Google File System, and Google Compute Engine. Think of Borg as the giant brain that manages Google’s datacenters as a single pool of resources to fuel Google’s giant fleet of services, and manages them so efficiently it saves Google the cost of an entire datacenter.

Google still owns Borg, and is building the next version, codenamed Omega, which is not nearly as cool a name as Borg. Kubernetes was founded as an open source project in 2014, and several of Borg’s top contributors also work on Kubernetes. Google donated Kubernetes to the Cloud Native Computing Foundation, which is hosted at the Linux Foundation and supported by a number of big companies (including Google, Cisco, Docker, IBM, and Intel). The idea is to create a reference architecture for cloud technologies that anyone can use.

Borg was a closely held secret until 2015, when Google published the Large-scale cluster management at Google with Borg paper. This reveals a lot of fascinating details and bold claims such as “We are not sure where the ultimate scalability limit to Borg’s centralized architecture will come from; so far, every time we have approached a limit, we’ve managed to eliminate it.”

Figure 1: How Borg has inspired current datacenter systems.

Kubernetes Lineage

Figure 1 shows how Borg has inspired current datacenter systems, and the underlying technologies used in container runtimes today.

Google contributed cgroups to the Linux kernel in 2007, which limits the resources used by collections of processes.
cgroupsand Linux namespaces are at the heart of containers today, including Docker.
Mesos was inspired by discussions with Google when Borg was still a secret.
The Cloud Foundry Foundation embraces The Twelve-Factor Application principles. These principles provide guidance for building Web applications that scale easily, are deployed in the Cloud, and have automated build system. Borg and Kubernetes both address these principles.

To sum up, the modern datacenter owes a lot to Google, and Kubernetes is built on over a decade of research and production use in one of the most demanding environments on Earth.

Download the sample chapter now.

Tracing the User Space and Operating System Interactions

By

Mark Filion

-

April 10, 2017

Like the bug that no one can solve, many issues occur on the interface between the user application and the operating system. But even in the good Open-Source world, understanding what is happening at these interfaces is not always easy. In this article, we review some of the tools to trace the calls being made among the kernel, libraries and the user applications.

Written by Gabriel Krisman Bertazi, Software Engineer at Collabora.

Tracing System Calls with strace

strace traces both directions of the interaction between the kernel and the evaluated application, namely, it traces when an application executes a system call, and when the operating system sends a signal to the process.

In its simplest form, strace runs the application passed in the command line and prints one line for each interaction with the kernel that occurred, like which syscall was invoked, with which parameters and what was the returned value. It can also attach to a running process, which allows you to reduce the amount of clutter by starting the tracer just before triggering the actions that concern you.

strace uses the ptrace interface in the kernel, so it doesn’t require recompiling the application. This characteristic makes strace an ideal tool to start reverse-engineering an application to understand how it works, even if you don’t have the source-code. But it is also helpful as a debugging or as a test/validation tool, for instance to confirm the return of a specific system call without going through the effort of creating a custom tool.

As usual, strace is available for major distros. In Debian, you can install it with:

apt install strace

After installing, you are ready to trace the trivial hello-world application, which just prints “Hello Worldn” to the stdout and exits.

[krisman@dilma /tmp]$ strace ./hello-world
execve("./hello-world", ["./hello-world"], [/* 65 vars */]) = 0
brk(NULL)                               = 0x55b45a6c7000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT
mmap(NULL, 12288, PROT_READ|PROT_WRITE,
     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38e0a3b000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=160345, ...}) = 0
mmap(NULL, 160345, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f38e0a13000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "177ELF2113"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1685264, ...}) = 0
mmap(NULL, 3791264, PROT_READ|PROT_EXEC,
     MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38e047e000
mprotect(0x7f38e0613000, 2093056, PROT_NONE) = 0
mmap(0x7f38e0812000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE
     |MAP_FIXED|MAP_DENYWRITE, 3, 0x194000) = 0x7f38e0812000
mmap(0x7f38e0818000, 14752, PROT_READ|PROT_WRITE,
     MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f38e0818000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE,
     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38e0a11000
arch_prctl(ARCH_SET_FS, 0x7f38e0a11700) = 0
mprotect(0x7f38e0812000, 16384, PROT_READ) = 0
mprotect(0x55b459c2d000, 4096, PROT_READ) = 0
mprotect(0x7f38e0a3e000, 4096, PROT_READ) = 0
munmap(0x7f38e0a13000, 160345)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
brk(NULL)                               = 0x55b45a6c7000
brk(0x55b45a6e8000)                     = 0x55b45a6e8000
write(1, "Hello Worldn", 12Hello World
)           = 12
exit_group(0)                           = ?
+++ exited with 0 +++

Above is the output of strace mixed with the actual output of the hello-world application. strace writes to stderr so to avoid mixing, we could have redirected the stderr output to a file.

In this example, each line corresponds to a system call executed by the hello-world application, starting from the execve call that kickstarted the program. In each line, strace prints the system call name, followed by its parameters in parenthesis and the value returned by its execution. Close to the end of the output, you will find the write() call, which actually printed the “Hello Word” string. In the example above, it got wrapped with the actual output, which also went to the console.

If we look separately at the line that traces the write and clean up the wrapped part, we have:

write(1, "Hello Worldn", 12)           = 12

We can compare this output to the signature of the write() syscall (man 2 write), and see that it tried to write the string passed in argument 2, which has a size of 12 characters (argument 3), to the file descriptor 1 (argument 1), which is the file descriptor for stdout. The value returned by the kernel was 12, indicating that the string was fully written.

But, what about the other lines? As I mentioned above, the hello-world program only prints a string and exits, so what are all the other lines printed by strace? Turns out that even the most simple C program doesn’t actually start to execute from the main() function. In fact, to get the usual C environment that we are used to have, a bunch of low-level code needs to execute beforehand to initialize the C library, resolve link-time dependencies and reserve memory regions from the kernel, and only then handle the control to the main() function. This code, which is spread across the libgcc and the C library, is responsible for the other system calls that we saw in the strace output. Similarly, when the main() function returns, the execution re-enters that code, which calls the exit_group() syscall to end the execution. That syscall never returns, thus there is no return value shown in the last line of the trace.

As I mentioned before, we can also attach strace to a live application, by specifying the PID in the command line. For instance, I can snoop on my IRC messages by watching my IRC client’s communication with the kernel:

[krisman@dilma weechat]$ strace -p2270 -e sendto
strace: Process 2270 attached
sendto(52, "PRIVMSG NickServ :Hello GNU World!!!"..., 38, 0, NULL, 0) = 38

In the example above, I ran strace on my existing weechat instance, which had the PID 2270, and used the -e option to filter only for the sendto() syscall. In this case, I had a previous knowledge of what syscall to search, but strace allows you to search for a group of related syscalls, for instance, all the network related system calls or all the syscalls that operate on file names.

Interactively catching system calls with GDB

If you need a more interactive approach to tracing system calls and signals for any reason, GDB is also up to the task. Since version 7.0, GDB supports the ‘catch syscall’ command, which receives a syscall name or number and sets a breakpoint when entering and leaving the system call. This allows you to thoroughly inspect registers and data structures that are going to be consumed by the kernel before they are submitted, as well as the kernel output after executing the syscall.

For completeness, you can install the gdb package in Debian with:

apt install gdb

For catching system calls with GDB, you don’t really need the program debug information installed but, as usual, life gets a bit easier when you have it. So, either install from your distro, or build you software with the usual -g3 -O0 flags.

[krisman@dilma work]$ gdb /bin/ls -silent
Reading symbols from /bin/ls...(no debugging symbols found)...done.
(gdb) catch syscall open
Catchpoint 1 (syscall 'open' [2])
(gdb) c
The program is not being run.
(gdb) r
Starting program: /bin/ls

Catchpoint 1 (call to syscall open), 0x7ffff7df22e7 in open64()
    at ../sysdeps/unix/syscall-template.S:84
(gdb) c
Continuing.

Catchpoint 1 (returned from syscall open), 0x7ffff7df22e7 in open64()
    at ../sysdeps/unix/syscall-template.S:84
(gdb)

In the example above, GDB was started to debug ls. In this case, we lack debug symbols for ls, but it doesn’t matter much. The first GDB command issued above, adds a breakpoint right before executing the trap instruction to enter open(), allowing the user to verify its arguments. If you then type continue, the syscall will be executed and once again a breakpoint will be hit on the syscall exit, allowing you to inspect the returned data.

The latest GDB still can’t decode the signature of syscalls, which makes the decoding of arguments and return values a bit harder, since you have to cast and de-reference structures explicitly. This feature has been on my wish-list for a while, but it is still not implemented.

The syntax for catching syscalls in GDB is using the name or the number of the syscall:

(gdb) catch syscall fork
(gdb) catch syscall 57

Be aware that syscall numbers change among architectures. Nowadays, most common architectures are already mapped by GDB syscall files, so you are likely safe using the syscall name, unless you are tracing a very new, or a custom system call.

Like strace, GDB also supports catching groups of related syscalls. For that, you can use the ‘group:’ syntax, or ‘g:’ for shorter:

(gdb) catch syscall group:process
Catchpoint 7 (syscalls 'clone' [56] 'fork' [57] 'vfork' [58]
'execve' [59] 'exit' [60] 'wait4' [61] 'arch_prctl' [158]
'exit_group' [231] 'waitid' [247] 'unshare' [272])

Similarly, you can use the ‘catch signal’ command to break at the moment a signal is sent to the application:

(gdb) catch signal SIGUSR1
Catchpoint 8 (signal SIGUSR1)
(gdb) c
Continuing.

Catchpoint 8 (signal SIGUSR1), 0x7ffff78c6d2b in __getdents (fd=3,
    buf=buf@entry=0x55555577acd0 "7fb", nbytes=32768)
    at ../sysdeps/unix/sysv/linux/getdents.c:96
...
(gbd)

This stops the execution right at the moment the signal reaches the process, which can be quite useful for understanding hard race conditions.

GDB can control the way a signal is delivered to the application. For instance, you can use the command ‘handle signal’ to stop the program from receiving the signal at all, or configure GDB to print a message and stop the execution when a signal is delivered.

The command ‘handle signal’ also tells you the current configuration for that signal:

(gdb) handle SIGUSR1
Signal     Stop    Print   Pass to program Description
SIGUSR1    Yes     Yes     Yes             User defined signal 1
(gdb)

In the example above, which is the default configuration, the program is configured to stop the execution as well as forward the signal USR1 to the inferior process, once it happens.

Catching library calls with ltrace

Like strace, ltrace executes the program passed as parameter or, alternatively, attaches to an already existing process; but instead of system calls, it traces the library calls issued by the application, translating parameters and the return values.

As usual, ltrace is also available in Debian, provided by the package with the same name:

apt install ltrace

In the example below, I traced a demonstration program, which executed several common calls to the C library, like printing to stdout, sorting a vector, saving data to a file and allocating memory. ltrace printed each of these calls in sequence, with the arguments used and the returned value.

Since we track the memory allocation and free() calls, it is easy to imagine how we could script something up to match those calls and create something like a poor-man’s memory leak detector 🙂

[krisman@dilma demo]$ ltrace ./demo 1> /dev/null
malloc(200000)                                             = 0x7f27e481e010
time(0)                                                    = 1490934890
srand(0x58dddc6a, 0x31000, 0x7f27e481e010, 1)              = 0
rand(0x7f27e4653600, 0x7ffed9b4f59c, 0x15760, 0x7f27e465311c) = 0x27d07053
qsort(0x7f27e481e010, 50000, 4, 0x55d82b4d6930)            = 
snprintf("The sum is 6543", 50, "The sum is %d", 6543)     = 15
printf("%s. Also writing to filen", "The sum is 6543")    = 38
fopen("/tmp/testfile", "w+")                               = 0x55d82ce1c020
fwrite("The sum is 6543", 50, 1, 0x55d82ce1c020)           = 1
fclose(0x55d82ce1c020)                                     = 0
+++ exited (status 0) +++

ltrace also doesn’t require recompiling your program or compiling with debug information, so it is also a very useful tool for cases where you don’t have access to the source-code. In fact, strace and ltrace are very similar tools in usage and features, the difference between them lies on the stack level where the data is collected. While strace investigates system calls issued to the underlying kernel, ltrace tracks library calls.

GDB is also capable of tracing requests going to libraries, as it allows you to step through the library code in the same way you usually do with applications. Even in the GDB examples above, the breakpoint triggered inside the C library syscall wrappers, so instead of continuing the execution, one could just step over the next lines of code. Therefore, if you want an interactive approach, GDB is also a great option here.

Wrapping up

In this article we reviewed three tools that can be used to trace the applications interaction with the operating system at different levels of the stack. Strace and the GDB catch syscall/signal operate on the kernel-application interface, tracing the system calls that flow from the application to the kernel and the signals that flow in the opposite direction; while ltrace operates on a higher level of the stack, tracing requests flowing from the application to libraries.

These tools can give you a much better understanding of what is going on at these interfaces in real-time, allowing you to understand, for instance, why exactly that well-crafted request is failed to execute. Since they don’t require code recompilation, and usually won’t even require debug symbols, these tools are also a great asset when reverse-engineering a binary to understand how it works, helping hackers and enthusiasts to write their own version of proprietary tools.

Whether you use strace/ltrace or GDB is completely up to you, and there will always be use cases where one tool is better than the other. The goal should not be necessarily mastering every tool out there, but to have basic knowledge that they exist and what they can do, so you can know what to look for when debugging a real-world problem.

How the Open Source Model Will Soar Above the Rest

By

OpenSource.com

-

April 10, 2017

Defining a project is more than just discussing the results of the deliverable. For a project manager, this definition is about learning how to balance a series of interrelated elements. When it comes to the process of creation, the project manager has to manage the dependencies and the project’s critical chain. The project manager also has to communicate effectively with the various stakeholders’ personalities and the dynamic differences between Waterfall and Agile development methods.

Schedule, resources, and scope

One way to define a project is to discuss the work along three interconnected dimensions—schedule, resources, and scope. The interconnected nature of these dimensions is considered a natural rule of projects. Violating the balance is nearly the same thing as defying gravity.

Cloud: The Greatest Business Metamorphosis in a Generation Needs Developers

By

The New Stack

-

April 10, 2017

We are at the beginning of what is arguably the greatest business metamorphosis in a generation. As more organizations become essentially software companies, they need developers to write the cloud apps that will enable them to thrive as they evolve.

As a developer, you’re at the forefront of this transformation, determining how to integrate cloud-based applications and infrastructure into your business. You are changing the way companies interact and engage with your users, their community, and their customers. You are the fundamental shift in how organizations are building out a new way of business.

The Root Cause of Input-Based Security Vulnerabilities – Don’t Fear the Grammar

By

Dev.to

-

April 10, 2017

Input-based attacks like Buffer Overflows, Cross-Site Scripting (XSS), and XXE are common in today’s software. And they do not go away. But why is that? Shouldn’t one assume that existing frameworks handle input correctly, and free developers from struggling with correctly implementing input handling over and over again? Sadly, the answer is no.

In this post I wrap up some ideas of Language Security Langsec which find a general solution to this problem and provide some tools to fix it.

5 Reasons Node.js Rules for Complex Integrations

By

InfoWorld

-

April 10, 2017

With JavaScript, JSON, REST, NPM, and an ever-increasing supply of modules, Node.js should be your first choice for integration.

Because software solutions rarely operate in a vacuum, integration is a necessary fact of life for many developers. Sometimes it’s easy. Anyone who has integrated an application into Slack, for example, will have been treated to an incredibly smooth experience. In many cases it’s as simple as filling in a form (a URL or two, an authentication key) and hitting the Submit button. That’s plain awesome.

The Future Of Ubuntu Linux Desktop — What’s Next?

By

FossBytes

-

April 10, 2017

Short Bytes: After announcing that Ubuntu 18.04 LTS will ship with GNOME as the default desktop environment, Canonical founder Mark Shuttleworth has shared some details regarding Ubuntu’s future. In a Google+ post, he made clear that Canonical will be investing in Ubuntu GNOME with a motive to deliver an all-GNOME experience. One should also note that despite the demise of Unity 8, Snaps and Ubuntu Core are here to stay.

In his announcement post, Shuttleworth announced that his convergence vision was wrong and the open source community perceived it as fragmentation. But, what’s next for the world’s most popular open-source operating system for desktop, i.e., Ubuntu?

In a Google+ post, which looks like a follow up to the original post, Shuttleworth highlighted some major points that will continue to be the focus of Ubuntu desktop.