Home Blog Page 350

What Is the NIST Cybersecurity Framework?

Learn what the NIST Cybersecurity Framework is, who it impacts, and how to implement it in Data Protection 101, our series on the fundamentals of information security.

Set forth by the National Institute of Standards and Technology under the United States Commerce Department, the Cybersecurity Framework is a set of guidelines for private sector companies to follow to be better prepared in identifying, detecting, and responding to cyber-attacks. It also includes guidelines on how to prevent and recover from an attack.

Simply put, the NIST Cybersecurity Framework is a set of best practices, standards, and recommendations that help an organization improve its cybersecurity measures. The optional standards were compiled by NIST after former United States President Barack Obama signed an executive order in 2014.

Read more at Digital Guardian

Open Source Guides for the Enterprise Now Available in Chinese

The popular Open Source Guides for the Enterprise, developed by The Linux Foundation in collaboration with the TODO Group, are now available in Chinese. This set of guides provides industry-proven best practices to help organizations successfully leverage open source.

“Making these resources available to Chinese audiences in their native language will encourage even greater adoption of and participation with open source projects,” said Chris Aniszczyk, CTO of Cloud Native Computing Foundation and co-founder of the TODO Group. The guides span various stages of the open source project lifecycle, from initial planning and formation to winding down a project.

The 10 guides now available in Mandarin include topics such as:

  • Creating an Open Source Program by Chris Aniszczyk, Cloud Native Computing Foundation; Jeff McAffer, Microsoft; Will Norris, Google; and Andrew Spyker, Netflix
  • Using Open Source Code by Ibrahim Haddad, Samsung Research America
  • Participating in Open Source Communities by Stormy Peters, Red Hat; and Nithya Ruff, Comcast
  • Recruiting Open Source Developers by Guy Martin, Autodesk; Jeff Osier-Mixon, Intel Corporation; Nithya Ruff; and Gil Yehuda, Oath
  • Measuring Your Open Source Program’s Success by Christine Abernathy, Facebook; Chris Aniszczyk; Joe Beda, Heptio; Sarah Novotny, Google; and Gil Yehuda

The translated guides were launched at the LinuxCon + ContainerCon + CloudOpen China conference in Beijing, where The Linux Foundation also welcomed Chinese Internet giant Tencent as a Platinum Member.

This post originally appeared at The Linux Foundation.

 

Python 3: Sometimes Immutable Is Mutable and Everything Is an Object

What is Python?

Python is an interpreted, interactive object-oriented programming language; it incorporated modules, classes, exceptions, dynamic typing and high level data types. Python is also powerful when it comes to clear syntax. It is a high-level general-purpose programming language that can be applied to many different classes of problems — with a large standard library that encapsulates string processing (regular expressions, Unicode, calculating differences between files), Internet protocols (HTTP, FTP, SMTP, XML-RPC, POP, IMAP, CGI programming), software engineering (unit testing, logging, profiling, parsing Python code), and operating system interfaces (system calls, filesystems, TCP/IP sockets). Here are some of Python’s features:

  • An interpreted (as opposed to compiled) language. Contrary to C, for example, Python code does not need to be compiled before executing it. In addition, Python can be used interactively: many Python interpreters are available, from which commands and scripts can be executed.
  • A free software released under an open-source license: Python can be used and distributed free of charge, even for building commercial software.
  • Multi-platform: Python is available for all major operating systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone OS, etc.
  • A very readable language with clear non-verbose syntax
  • A language for which a large variety of high-quality packages are available for various applications, from web frameworks to scientific computing.
  • A language very easy to interface with other languages, in particular C and C++.
  • Some other features of the language are illustrated just below. For example, Python is an object-oriented language, with dynamic typing (an object’s type can change during the course of a program).

What does it mean to be an object-oriented language?

Python is a multi-paradigm programming language. Meaning, it supports different programming approach. One of the popular approach to solve a programming problem is by creating objects. This is known as Object-Oriented Programming (OOP).

An object has two characteristics:
1) attributes
2) behavior

Let’s take an example:

Dog is an object:
a) name, age, color are data
b) singing, dancing are behavior

We call data as attributes and behavior as methods in object oriented programming. Again:

Data → Attributes & Behavior → Methods

The concept of OOP in Python focuses on creating reusable code. This concept is also known as DRY (Don’t Repeat Yourself). In Python, the concept of OOP follows some basic principles:

Inheritance — A process of using details from a new class without modifying existing class.
Encapsulation — Hiding the private details of a class from other objects.
Polymorphism — A concept of using common operation in different ways for different data input.

Class

A class is a blueprint for the object.

We can think of class as an sketch of a dog with labels. It contains all the details about the name, colors, size etc. Based on these descriptions, we can study about the dog. Here, dog is an object.

The example for class of dog can be :

class Dog:
    pass

Here, we use class keyword to define an empty class Dog. From class, we construct instances. An instance is a specific object created from a particular class.

Class is the blueprint from which individual objects are created. In the real world we often find many objects with all the same type. Like cars. All the same make and model (have an engine, wheels, doors, …). Each car was built from the same set of blueprints and has the same components.

Object

Think of an object in Python as a block of memory, and a variable is just something that points/references to that block of memory. All the information relevant to your data is stored within the object itself. And the variable stores the address to that object. So it actually doesn’t matter if you reassign a variable pointing to an integer to point to a different data type.

>>> a = 1
>>> a = "I am a string now"
>>> print(a)
I am a string now

Every object has its own identity/ID that stores its address in memory. Every object has a type. An object can also hold references to other objects. For example, an integer will not have references to other objects but if the object is a list, it will contain references to each object within this list. We will touch up on this when we look at tuples later.

The built-in function id() will return an object’s id and type() will return an object’s type:

>>> list_1 = [1, 2, 3]
# to access this object's value
>>> list_1 
[1, 2, 3]
# to access this object's ID
>>> id(list_1) 
140705683311624
# to access object's data type
>>> type(list_1) 
<class 'list'>

So, an object (instance) is an instantiation of a class. When class is defined, only the description for the object is defined. Therefore, no memory or storage is allocated.

The example for object of class Dog can be:

obj = Dog()

Here, obj is object of class Dog.

Suppose we have details of Dog. Now, we are going to show how to build the class and objects of Dog.

class Dog:
#class attribute
    species = "animal"
# instance attribute
    def __init__(self, name, age):
        self.name = name
        self.age = age
# instantiate the Dog class
blu = Dog("Blu", 10)
woo = Dog("Woo", 15)
# access the class attributes
print("Blu is an {}".format(blu.__class__.species))
print("Woo is also an {}".format(woo.__class__.species))
# access the instance attributes
print("{} is {} years old".format( blu.name, blu.age))
print("{} is {} years old".format( woo.name, woo.age))

When we run the program, the output will be:

Blu is an animal
Woo is also an animal
Blu is 10 years old
Woo is 15 years old

In the above program, we create a class with name Dog. Then, we define attributes. The attributes are a characteristic of an object.

Then, we create instances of the Dog class. Here, blu and woo are references (value) to our new objects.

Then, we access the class attribute using __class __.species. Class attributes are same for all instances of a class. Similarly, we access the instance attributes using blu.name and blu.age. However, instance attributes are different for every instance of a class.

Let’s try to understand how value and identity are affected if you use operators “==” and “is”

The “==” operator compares values whereas “is” operator compares identities. Hence, a is b is similar to id(a) == id(y), but two different objects may share the same value, but they will never share the same identity.

Example:

>>> a = ['blu', 'woof']
>>> id(a)
1877152401480
>>> b = a
>>> id(b)
1877152401480
>>> id(a) == id(b)
True
>>> a is b
True
>>> c = ['blu', 'woof']
>>> a == c
True
>>> id(c)
1877152432200
>>> id(a) == id(c)
False

Hashability

What is a hash?

According to Python , “An object is hashable if it has a hash value which never changes during its lifetime”, if and only if the object is immutable.

A hash is an integer that depends on an object’s value, and objects with the same value always have the same hash. (Objects with different values will occasionally have the same hash too. This is called a hash collision.) While id() will return an integer based on an object’s identity, the hash() function will return an integer (the object’s hash) based on the hashable object’s value:

>>> a = ('cow', 'bull')
>>> b = ('cow', 'bull')
>>> a == b
True
>>> a is b
False
>>> hash(a)
6950940451664727300
>>> hash(b)
6950940451664727300
>>> hash(a) == hash(b)
True

Immutable objects can be hashable, mutable objects can’t be hashable.This is important to know, because (for reasons beyond the scope of this post) only hashable objects can be used as keys in a dictionary or as items in a set. Since hashes are based on values and only immutable objects can be hashable, this means that hashes will never change during the object’s lifetime.

Hashability will be covered more under the mutable vs immutable object section, as sometimes a tuple can be mutable and how does that change values and understanding of mutable objects and immutable objects.

To summarize, EVERYTHING is an object in Python the only difference is some are mutable and some immutable. Wait but what kind of objects are possible in Python and which ones are mutable and which ones aren’t?

Objects of built-in types like (bytes, int, float, bool, str, tuple, unicode, complex) are immutable. Objects of built-in types like (list, set, dict, array, bytearray) are mutable. Custom classes are mutable. To simulate immutability in a class, one should override attribute setting and deletion to raise exceptions.

 
1*uFlTNY4W3czywyU18zxl8w.png

Now how would a newbie know which variables are mutable objects and which ones are not? For this we use 2 very handy built-in functions called id() and type()

What is id() and type()?

Syntax to use id()
id(object)

As we can see the function accepts a single parameter and is used to return the identity of an object. This identity has to be unique and constant for this object during the lifetime. Two objects with non-overlapping lifetimes may have the same id() value. If we relate this to C, then they are actually the memory address, here in Python it is the unique id. This function is generally used internally in Python.

Examples:

The output is the identity of the 
object passed. This is random but 
when running in the same program, 
it generates unique and same identity. 
Input : id(2507)
Output : 140365829447504
Output varies with different runs
Input : id("Holberton")
Output : 139793848214784

What is an Alias?

>>> a = 1
>>> id(a)
1904391232
>>> b = a  #aliasing a
>>> id(b)
1904391232
>>> b
1

An alias is a second name for a piece of data. Programmers use/ create aliases because it’s often easier and faster to refer data than to copy it. If the data that is being created and assigned is immutable then aliasing does not matter as the data won’t change, but there will be a lot of bugs if the data is mutable as it will lead to some issues like see below —

>>> a = 1
>>> id(a)
1904391232
>>> b = a  #aliasing a
>>> id(b)
1904391232
>>> b
1
>>> a = 2
>>> id(2)
1904391264
>>> id(b)
1904391232
>>> b
1
>>> a
2

as it can be seen a now points to 2 and id is different as compared to b which is still pointing to 1. In Python, aliasing happens whenever one variable’s value is assigned to another variable, because variables are just names that store references to values.

type() method returns class type of the argument(object) passed as parameter. type() function is mostly used for debugging purposes.

Two different types of arguments can be passed to type() function, single and three argument. If single argument type(obj) is passed, it returns the type of given object.

Syntax :

type(object)

We can find out what class an object belongs to using the built-in type()function:

>>> Blue = [1, 2, 3]
>>> type(Blue)
<class 'list'>
>>> def my_func(x)
...    x = 89
>>> type(my_func)
<class 'function'>

Now that we can compare variables to see their type and id’s, we can dive in deeper to understand how mutable and immutable objects work.

Mutable Objects vs. Immutable Objects

Not all Python objects handle changes the same way. Some objects are mutable, meaning they can be altered. Others are immutable; they cannot be changed but rather return new objects when attempting to update. What does this mean when writing Python code?

The following are some mutable objects:

  • list
  • dict
  • set
  • bytearray
  • user-defined classes (unless specifically made immutable)

The following are some immutable objects:

  • int
  • float
  • decimal
  • complex
  • bool
  • string
  • tuple
  • range
  • frozenset
  • bytes

The distinction is rather simple: mutable objects can change, whereas immutable objects cannot. Immutable literally means not mutable.

A standard example are tuple and list: A tuple is filled on creation, and then is frozen – its content cannot change anymore. To a list, one can append elements, set elements and delete elements at any time. Although keep in mind exceptions: tuple is an immutable list whereas frozenset is an immutable set. Quoting stackoverflow answer-Tuples are indeed an ordered collection of objects, but they can contain duplicates and unhashable objects, and have slice functionality frozensets aren’t indexed, but you have the functionality of sets – O(1) element lookups, and functionality such as unions and intersections. They also can’t contain duplicates, like their mutable counterparts.

Let’s create a dictionary with immutable objects for keys —

>>> a = {‘blu’: 42, True: ‘woof’, (‘x’, ‘y’, ‘z’): [‘hello’]}
>>> a.keys()
dict_keys([‘blu’, True, (‘x’, ‘y’, ‘z’)])

As seen above keys in a are immutable, hashable objects, but if you try to call hash() on a mutable object(such as sets), or trying to use a mutable object for a dictionary key, an error will be raised:

>>> spam = {['hello', 'world']: 42}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> d = {'a': 1}
>>> spam = {d: 42}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

So, tuples, being immutable objects, can be used as dictionary keys?

>>> spam = {('a', 'b', 'c'): 'hello'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

As seen above, if a tuple contains a mutable object, according to the previous explanation about hashability it cannot be hashed. So, immutable objects can be hashable, but this doesn’t necessarily mean they’re alwayshashable. And remember, the hash is derived from the object’s value.

This is an interesting corner case: a tuple (which should be immutable) that contains a mutable list cannot be hashed. This is because the hash of the tuple depends on the tuple’s value, but if that list’s value can change, that means the tuple’s value can change and therefore the hash can change during the tuple’s lifetime.

So far it is now understood that some tuples are hashable — immutable but some other tuple are not hashable — mutable. According to official Python documentation immutable and mutable are defined as — “An object with a fixed value” and “Mutable objects can change their value”. This can possibly mean that mutability is a property of objects, hence it makes sense that some tuples will be mutable while others won’t be.

>>> a = ('dogs', 'cats', [1, 2, 3])
>>> b = ('dogs', 'cats', [1, 2, 3])
>>> a == b
True
>>> a is b
False
>>> a[2].append(99)
>>> a
('dogs', 'cats', [1, 2, 3, 99])
>>> a == b
False

In this example, the tuples a and b have equal (==) values but are different objects, so when list is changed in tuple a the values get changed as a is not longer == b and did not change values of b. This example states that tuples are mutable.

While Python tends towards mutability, there are many use-cases for immutability as well. Here are some straightforward ones:

  • Mutable objects are great for efficiently passing around data. Let’s say object anton and berta have access to the same listanton adds “lemons” to the list, and berta automatically has access to this information.
    If both would use a tuple, anton would have to copy the entries of his shopping-tuple, add the new element, create a new tuple, then send that to berta. Even if both can talk directly, that is a lot of work.
  • Immutable objects are great for working with the data. So berta is going to buy all that stuff – she can read everything, make a plan, and does not have to double check for changes. If next week, she needs to buy more stuff for the same shopping-tuple, berta just reuses the old plan. She has the guarantee that anton cannot change anything unnoticed.
    If both would use a listberta could not plan ahead. She has no guarantee that “lemons” are still on the list when she arrives at the shop. She has no guarantee that next week, she can just repeat what was appropriate last week.

You should generally use mutable objects when having to deal with growing data. For example, when parsing a file, you may append information from each line to a list. Custom objects are usually mutable, buffering data, adjusting to new conditions and so on. In general, whenever something can change, mutable objects are much easier.

Immutable objects are sparingly used in python — usually, it is implicit such as using int or other basic, immutable types. Often, you will be using mutable types as de-facto immutable – many lists are filled at construction and never changed. There is also no immutable dict. You should enforce immutability to optimise algorithms, e.g. to do caching.

Interestingly enough, python’s often-used dict requires keys to be immutable. It is a data structure that cannot work with mutable objects, since it relies on some features being guaranteed for its elements.

Mutable example

>>> my_list = [10, 20, 30]
>>> print(my_list)
[10, 20, 30]
>>> my_list = [10, 20, 30]
>>> my_list[0] = 40
>>> print(my_list)
[40, 20, 30]

Immutable example

>>> tuple_ = (10, 20, 30)
>>> print(tuple_)
[10, 20, 30]
>>> tuple_ = [10, 20, 30]
>>> tuple_[0] = 40
>>> print(tuple_)
Traceback (most recent call last):
  File "test.py", line 3, in < module >
    my_yuple[0] = 40
TypeError: 'tuple' object does not support item assignment

If you want to write most efficient code, you should be the knowing difference between mutable and immutable in python. Concatenating string in loops wastes lots of memory , because strings are immutable, concatenating two strings together actually creates a third string which is the combination of the previous two. If you are iterating a lot and building a large string, you will waste a lot of memory creating and throwing away objects. Use list compression join technique.

Python handles mutable and immutable objects differently. Immutable are quicker to access than mutable objects. Also, immutable objects are fundamentally expensive to “change”, because doing so involves creating a copy. Changing mutable objects is cheap.

Interning, integer caching and everything called: NSMALLPOSINTS & NSMALLNEGINTS

Easy things first —

NSMALLNEGINTS is in the range -5 to 0 and NSMALLPOSINTS is in the 0 to 256 range. These are macros defined in Python — earlier versions ranged from -1 to 99, then -5 to 99 and finally -5 to 256. Python keeps an array of integer objects for “all integers between -5 and 256”. When creating an int in that range, it is actually just getting a reference to the existing object in memory.

If x = 42, what happens actually is Python performing a search in the integer block for the value in the range -5 to 256. Once x falls out of the scope of this range, it will be garbage collected (destroyed) and be an entirely different object. The process of creating a new integer object and then destroying it immediately creates a lot of useless calculation cycles, so Python preallocated a range of commonly used integers.

There are exception to immutable objects as stated above by making a tuple “mutable”. As it is known a new object is created each time a variable makes a reference to it, it does happen slightly differently for a few things –

a) Strings without whitespaces and less than 20 characters
b) Integers between -5 to 256 (including both as explained above)
c) empty immutable objects (tuples)

These objects are always reused or interned. This is due memory optimization in Python implementation. The rationale behind doing this is as follows:

  1. Since programmers use these objects frequently, interning existing objects saves memory.
  2. Since immutable objects like tuples and strings cannot be modified, there is no risk in interning the same object.

So what does it mean by “interning”?

interning allows two variables to refer to the same string object. Python automatically does this, although the exact rules remain fuzzy. One can also forcibly intern strings by calling the intern()function. Guillo’s articleprovides an in-depth look into string interning.

Example of string interning with more than 20 characters or whitespace will be new objects:

>>> a = "Howdy! How are you?"
>>> b = "Howdy! How are you?"
>>> a is b
False

but, if a string is less than 20 char and no whitespace it will look somewhat like this:

>>> a = "python"
>>> b = "python"
>>> a is b
True

As a and b refer to the same objects.

Let’s move on to integers now.

As explained above in macro definition integer caching is happening because of preload python definition of commonly used integers. Hence, variables referring to an integer within the range would be pointing to the same object that already exists in memory:

>>> a = 256
>>> b = 256
>>> a is b
True

This is not the case if the object referred to is outside the range:

>>> a = 1024
>>> b = 1024
>>> a is b
False

Lastly, let’s talk about empty immutable objects:

>>> a = ()
>>> b = ()
>>> a is b
True

Here a and b refer to the same object in memory as it is an empty tuple, but this changes if the tuple is not empty.

>>> a = (1, 2)
>>> b = (1, 2)
>>> a == b
True
>>> a is b
False

Passing mutable and immutable objects into functions:

Immutable and mutable objects or variables are handled differently while working with function arguments. In the following diagram, variables aband name point to their memory locations where the actual value of the object is stored.

 
0*3HGKbWZqNLi9NeIy.png

Major Concepts of Function Argument Passing in Python

Arguments are always passed to functions by object in Python. The caller and the function code blocks share the same object or variable. When we change the value of a function argument inside the function code block scope, the value of that variable also changes inside the caller code block scope regardless of the name of the argument or variable. This concept behaves differently for both mutable and immutable arguments in Python.

In Python, integerfloatstring and tuple are immutable objects. listdict and set fall in the mutable object category. This means the value of integerfloatstring or tuple is not changed in the calling block if their value is changed inside the function or method block but the value of listdict or set object is changed.

Python Immutable Function Arguments

Python immutable objects, such as numberstuple and strings, are also passed by reference like mutable objects, such as list, set and dict. Due to state of immutable (unchangeable) objects if an integer or string value is changed inside the function block then it much behaves like an object copying. A local new duplicate copy of the caller object inside the function block scope is created and manipulated. The caller object will remain unchanged. Therefore, caller block will not notice any changes made inside the function block scope to the immutable object. Let’s take a look at the following example.

Python Immutable Function Argument — Example and Explanation

def foo1(a):
# function block
a += 1
print(‘id of a:’, id(a)) # id of y and a are same
return a
# main or caller block
x = 10
y = foo1(x)
# value of x is unchanged
print(‘x:’, x)
# value of y is the return value of the function foo1
# after adding 1 to argument ‘a’ which is actual variable ‘x’
print(‘y:’, y)
print(‘id of x:’, id(x)) # id of x
print(‘id of y:’, id(y)) # id of y, different from x

Result:

id of a: 1456621360
x: 10
y: 11
id of x: 1456621344
id of y: 1456621360

Explanation:

  • Original object integer x is immutable (unchangeable). A new local duplicate copy a of the integer object x is created and used inside the function foo1() because integers are immutable objects and can’t be changed in placed. The caller main block where variable x is created has no effect on the value of the variable x.
  • The value of variable y is the value of variable a returned from function foo1() after adding 1.
  • Variable x and y are different as you can see their id values are different.
  • Variable y and a are same as you can see their id values are same. Both point to same integer object.

Python Mutable Function Arguments

Python mutable objects like dict and list are also passed by reference. If the value of a mutable object is changed inside the function block scope then its value is also changed inside the caller or main block scope regardless of the name of the argument. Let’s take a look at the following diagram and code example with the explanation at the end when we assign list1 = list2.

 
0*NC6zfdCk4weViMW1.png

Python Mutable Function Argument — Example and Explanation

def foo2(func_list):
# function block
func_list.append(30) # append an element
def foo3(func_list):
# function block
del func_list[1] # delete 2nd element
def foo4(func_list):
# function block
func_list[0] = 100 # change value of 1st element
# main or caller block
list1 = [10, 20]
list2 = list1 # list1 and list2 point to same list object
print(‘original list:’, list1)
print(‘list1 id:’, id(list1))
print(‘value of list2:’, list2)
print(‘list2 id:’, id(list2))
foo2(list1)
print(‘nafter foo2():’, list1)
print(‘list1 id:’, id(list1))
print(‘value of list2:’, list2)
print(‘list2 id:’, id(list2))
foo3(list1)
print(‘nafter foo3():’, list1)
print(‘list1 id:’, id(list1))
print(‘value of list2:’, list2)
print(‘list2 id:’, id(list2))
foo4(list1)
print(‘nafter foo4():’, list1)
print(‘list1 id:’, id(list1))
print(‘value of list2:’, list2)
print(‘list2 id:’, id(list2))

Result:

original list: [10, 20]
list1 id: 24710360
value of list2: [10, 20]
list2 id: 24710360
after foo2(): [10, 20, 30]
list1 id: 24710360
value of list2: [10, 20, 30]
list2 id: 24710360
after foo3(): [10, 30]
list1 id: 24710360
value of list2: [10, 30]
list2 id: 24710360
after foo4(): [100, 30]
list1 id: 24710360
value of list2: [100, 30]
list2 id: 24710360

Explanation:

  • We have created a list object list1 and assigned same object to a new variable list2. Now both list1 and list2 points to the same memory where actual list object [10, 20] is stored.
  • We passed the value list1 variable into the function argument func_list. We appended, deleted and modified the list1 object element in function foo2()foo3() and foo4() through argument func_list.
  • As you have noticed that actual object list1 is changed in the main block when we changed its value in the function block.
  • You should also notice that the value of list2 variable also changes when the value of list2 changes. As we have also read, this is because both list1 and list2variable points to same list object [10, 20].
  • list1 object ID doesn’t change after every call to function foo2()foo3() and foo4(). This is because list1 is mutable and can be modified. Therefore, changing list1object modifies original object value and doesn’t create the new object.

This article was produced in partnership with Holberton School and originally appeared on Medium.

Linux Control Sequence Tricks

There are quite a few control sequences available on Linux systems— many I use routinely, and some I’ve only just recently discovered— and they can be surprisingly useful. In today’s post, we’re going to run through a series of them and take a look at what they do and how they might be useful.

To start, unless you’re brand spanking new to the command line, you are undoubtedly familiar with the ctrl-c sequence that is used to terminate a running command. In print, this same sequence might be expressed as ^c or control-c and sometimes the “c” will be capitalized, but the expression always means “hold the control key and press the key specified — with no shift key or hyphen involved.

Read more at Network World

The Best Ways to test your Serverless Applications

Serverless is more than a cloud computing execution model. It changes the way we plan, build, and deploy apps. But it also changes the way we test our apps.

Meet Alex. Alex is an ordinary JavaScript developer, focused on Node.js lately.

At some point, Alex and his team got a new project. After some analysis, Alex thought that it would be the perfect fit for serverless. He presented the idea to his team. Some of the team members were excited, one of them didn’t like it, but most of them didn’t have a strong opinion. So, they decided to give it a try — the project wasn’t too big, and the risk was low.

The team read about serverless, and they got an idea how to structure their new app. But no one was sure how they should fit serverless into their common development process. They decided to start step by step, and then solve the problems as they encountered them.

Read more at freeCodeCamp

Tencent Becomes a Linux Foundation Platinum Member to Increase its Focus on Open Source

Tencent, the $500-billion Chinese internet giant, is increasing its focus on open source after it became a platinum member of the Linux Foundation.

The company has long been associated with the foundation and Linux generally, it is a founding member of the Linux Foundation’s deep learning program that launched earlier this year, and now as a platinum member (the highest tier) it will take a board of directors seat and work more closely with the organization. That works two ways, with Tencent pledging to offer “further support and resources” to foundation projects and communities, while the Chinese firm itself will also tap into the foundation’s expertise and experience.

Read more at TechCrunch

How to Balance Development Goals with Security and Privacy

Now, as a software security evaluator, I see that sometimes even the simplest data protection is missing from programs, which highlights that the problem with building in security and privacy is not complexity, per se—it’s our habit as engineers to work hard on what is emphasized and visible. We are driven by the immediate business value of features and data, so we build features ASAP and collect as much data as we can. We tend to put our heads in the sand when it comes to the misery of our users whose data may leak from our systems, because after collecting it, we often forget about protecting it.

A balance can exist between development goals and privacy and security concerns. My advice to data-driven engineers is to be careful, think about how much data you really need, and don’t get greedy. 

Read more at O’Reilly

Why Open Source Matters to Alibaba

Alibaba has more than 150 open source projects and is a long-time contributor to many others,  Wei Cao, a Senior Staff Engineer at Alibaba, says that sharing knowledge and receiving feedback from the community helps Alibaba refine their projects. We spoke with Wei Cao — who is the head of Alibaba Cloud Database Department and leads the R&D of Alibaba RDS,POLARDB products — to learn more about the company’s open source focus and about some of the database-related projects they contribute to.

Linux.com: Why is open source so important for Alibaba?

Wei Cao: At present, Alibaba has more than 150 open source projects. We work on the open source projects with the aim to contribute to the industry and solve real-life problems. We share our experiences with the rest of the open source enthusiasts.

Wei Cao, Senior Staff Engineer at Alibaba

As a long-time contributor to various other open source projects, Alibaba and Alibaba Cloud have fostered a culture that encourages our teams to voluntarily contribute to various open source projects, either by sharing experiences or helping others to solve problems. Sharing and contributing to the community altogether is in the DNA of Alibaba’s culture.

When we first started to use open sources projects like MySQL, Redis, PostgreSQL, we received a lot of help from the community. Now we would like to give back to the same communities by sharing our accumulated knowledge and receive feedback from the community so that we can refine our projects.

We believe this truly represents the essence of open source development, where everyone can build on each other’s knowledge. We are dedicated to making our technology inclusive through continuously contributing to bug-fixing and patch optimization of different open source projects.

Linux.com: Can you tell us what kind of culture is within Alibaba to encourage its developers to consume and contribute to Open Source project?

Wei Cao: Alibaba has always had a culture of integrity, partnership, sharing and mutual assistance. At the same time, we always believe that more people participating in the community can promote the industry better and also make us more profitable. Therefore, our staff members are willing to pay close attention to open source projects in the community. They keep using open source projects and accumulating experience to give feedback on projects and jointly promote the development of the industry.

Linux.com: Can you tell us what kind of open source projects you are using in your company?

Wei Cao: Our database products use many open source projects such as MySQL, Redis, PostgreSQL, etc. Our teams have done feature and performance enhancement and optimization, depending on various use-cases. We have done compression for IoT and security improvements for financial industries.

Linux.com: Can you tell us about the open source projects that you have created?

Wei Cao: We will be releasing a new open source project, called Mongo-Shake, at the LC3 Conference. Based on MongoDB’s oplog, Mongo-Shake is a universal platform for services.

It reads the Oplog operation logs of a MongoDB cluster and replicates MongoDB data, and subsequently implements specific requirements through operation logs. Logs can provide a lot of scene-based applications.

Through the operation logs, we provide log data subscriptions to consume PUB/SUB functions and can be flexibly connected to adapt to different scenarios (such as log subscription, data center synchronization, Cache asynchronous elimination, etc.) through SDK, Kafka, MetaQ, etc. Cluster data synchronization is a core application scenario. Synchronization is achieved through playback after grabbing oplogs. Its Application Scenario includes:

  • Asynchronous replication of MongoDB data between clusters eliminates the need for double write costs.

  • Mirror backup of MongoDB cluster data. (Not support in this open source version)

  • Log offline analysis.

  • Log subscription.

  • Cache synchronization.

  • Through the results of the log analysis, it is known which caches can be eliminated and which caches can be preloaded to prompt the cache to be updated.

  • Monitor base on log.

Linux.com: Can you tell us about the major open source projects you contribute to?

Wei Cao: We have contributed many database-related open source projects. In addition, we have released open source projects, like AliSQL and ApsaraCache, which are widely used in Alibaba.

AliSQL: AliSQL is a MySQL branch, developed by Alibaba Cloud database team, and is servicing Alibaba’s business and Alibaba Cloud’s RDS. AliSQL version is verified to run many Alibaba workloads and is  widely used within Alibaba cloud. The latest AliSQL also merged many useful

AliSQL does a lot of enhancement in the features and performance based on MySQL. It has more than 300 patches, We have added many monitor indicators, features, and optimized it for different user cases. enhancements from the other branches like Percona, MariaDB, WebScaleSQL, and also contains a lot of patches with Alibaba’s experiences.

In general test cases, AliSQL has 70% performance improvement over official MySQL version, according to R&D team’s sysbench benchmarks. In comparison with MySQL, AliSQL offers:

  • Better support for TokuDB, more monitoring and performance optimization.

  • CPU time statistics for SQL queries.

  • Sequence support.

  • Add Column Dynamically .

  • ThreadPool support. And a lot of Bugfix and performance improvements.

The founder of MySQL/MariaDB, Michael Widenius “Monty” has praised Alibaba for open sourcing AliSQL. We got a lot of help from the open source community in the early development of AliSQL.

Now open source AliSQL is the best contribution we have made to this community. We hope to continue our open source journey in future. Full cooperation with the open source community can make the MySQL/MariaDB ecosystem more robust.

ApsaraCache: ApsaraCache is based on the Redis 4.0, with additional features and performance enhancements. In comparison to Redis, ApsaraCache’s performance is independent of data size. It’s related to scenarios. It also has better performance in cases such as short connections, full memory recovery, and time-consuming instruction execution.

Multi protocol support

ApsaraCache supports both Redis and Memcached protocol with no client code need to be modified. ApsaraCache supports Memcached protocol and users can persist data by using ApsaraCache in Memcached mode just like Redis.

Reusing Redis architecture, we have developed new features of Memcache such as support for persistence, disaster tolerance, backup recovery, slow log audit, information statistics and other functions.

Ready for production

ApsaraCache has proven to be very stable and efficient during 4 years of technical grinding and tens of thousands of practical testing of production environment.

The major improvements in ApsaraCache are:

  • Disaster depth reinforcement refactors the kernel synchronization mechanism to solve the problem of full synchronization of native kernel caused by copy interrupt under weak network condition.

  • Compatible with the Memcached protocol, it supports dual copy of Memcached and offers more reliable Memcached service.

  • In short connection scenario, ApsaraCache makes 30% performance increase compared with the vanilla version.

  • ApsaraCache’s function of thermal upgrade can complete the thermal update of an instance within 3ms and solve the problem of frequent kernel upgrading on users.

  • AOF reinforcement, and solve the problem of Host stability caused by frequent AOF Rewrite.

  • ApsaraCache health detection mechanism.

This article was sponsored by Alibaba and written by Linux.com.

How to Check Disk Space on Linux from the Command Line

Quick question: How much space do you have left on your drives? A little or a lot? Follow up question: Do you know how to find out? If you happen to use a GUI desktop (e.g., GNOME, KDE, Mate, Pantheon, etc.), the task is probably pretty simple. But what if you’re looking at a headless server, with no GUI? Do you need to install tools for the task? The answer is a resounding no. All the necessary bits are already in place to help you find out exactly how much space remains on your drives. In fact, you have two very easy-to-use options at the ready.

In this article, I’ll demonstrate these tools. I’ll be using Elementary OS, which also includes a GUI option, but we’re going to limit ourselves to the command line. The good news is these command-line tools are readily available for every Linux distribution. On my testing system, there are a number of attached drives (both internal and external). The commands used are agnostic to where a drive is plugged in; they only care that the drive is mounted and visible to the operating system.

With that said, let’s take a look at the tools.

df

The df command is the tool I first used to discover drive space on Linux, way back in the 1990s. It’s very simple in both usage and reporting. To this day, df is my go-to command for this task. This command has a few switches but, for basic reporting, you really only need one. That command is df -H. The -H switch is for human-readable format. The output of df -H will report how much space is used, available, percentage used, and the mount point of every disk attached to your system (Figure 1).

Figure 1: The output of df -H on my Elementary OS system.

What if your list of drives is exceedingly long and you just want to view the space used on a single drive? With df, that is possible. Let’s take a look at how much space has been used up on our primary drive, located at /dev/sda1. To do that, issue the command:

df -H /dev/sda1

The output will be limited to that one drive (Figure 2).

Figure 2: How much space is on one particular drive?

You can also limit the reported fields shown in the df output. Available fields are:

  • source — the file system source

  • size — total number of blocks

  • used — spaced used on a drive

  • avail — space available on a drive

  • pcent — percent of used space, divided by total size

  • target — mount point of a drive

Let’s display the output of all our drives, showing only the size, used, and avail (or availability) fields. The command for this would be:

df -H --output=size,used,avail

The output of this command is quite easy to read (Figure 3).

Figure 3: Specifying what output to display for our drives.

The only caveat here is that we don’t know the source of the output, so we’d want to include source like so:

df -H --output=source,size,used,avail

Now the output makes more sense (Figure 4).

Figure 4: We now know the source of our disk usage.

du

Our next command is du. As you might expect, that stands for disk usage. The du command is quite different to the df command, in that it reports on directories and not drives. Because of this, you’ll want to know the names of directories to be checked. Let’s say I have a directory containing virtual machine files on my machine. That directory is /media/jack/HALEY/VIRTUALBOX. If I want to find out how much space is used by that particular directory, I’d issue the command:

du -h /media/jack/HALEY/VIRTUALBOX

The output of the above command will display the size of every file in the directory (Figure 5).

Figure 5: The output of the du command on a specific directory.

So far, this command isn’t all that helpful. What if we want to know the total usage of a particular directory? Fortunately, du can handle that task. On the same directory, the command would be:

du -sh /media/jack/HALEY/VIRTUALBOX/

Now we know how much total space the files are using up in that directory (Figure 6).

Figure 6: My virtual machine files are using 559GB of space.

You can also use this command to see how much space is being used on all child directories of a parent, like so:

du -h /media/jack/HALEY

The output of this command (Figure 7) is a good way to find out what subdirectories are hogging up space on a drive.

Figure 7: How much space are my subdirectories using?

The du command is also a great tool to use in order to see a list of directories that are using the most disk space on your system. The way to do this is by piping the output of du to two other commands: sort and head. The command to find out the top 10 directories eating space on a drive would look something like this:

du -a /media/jack | sort -n -r | head -n 10

The output would list out those directories, from largest to least offender (Figure 8).

Figure 8: Our top ten directories using up space on a drive.

Not as hard as you thought

Finding out how much space is being used on your Linux-attached drives is quite simple. As long as your drives are mounted to the Linux system, both df and du will do an outstanding job of reporting the necessary information. With df you can quickly see an overview of how much space is used on a disk and with du you can discover how much space is being used by specific directories. These two tools in combination should be considered must-know for every Linux administrator.

And, in case you missed it, I recently showed how to determine your memory usage on Linux. Together, these tips will go a long way toward helping you successfully manage your Linux servers.

Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.

Bloomberg Eschews Vendors For Direct Kubernetes Involvement

Financial information behemoth Bloomberg is a big fan of Kubernetes, and is using it for everything from serving up Bloomberg.com to complex data processing pipelines.

Rather than use a managed Kubernetes service or employ an outsourced provider, Bloomberg has chosen to invest in deep Kubernetes expertise and keep the skills in-house. Like many enterprise organizations, Bloomberg originally went looking for an off-the-shelf approach before settling on the decision to get involved more deeply with the open source project directly.

“We started looking at Kubernetes a little over two years ago,” said Steven Bower, Data and Infrastructure Lead at Bloomberg. … “It’s a great execution environment for data science,” says Bower. “The real Aha! moment for us was when we realized that not only does it have all these great base primitives like pods and replica sets, but you can also define your own primitives and custom controllers that use them.”

Read more at Forbes