Fast Track Apache Spark

By

-

September 20, 2017

My upcoming Strata Data NYC 2017 talk about big data analysis of futures trades is based on research done under the limited funding conditions of academia. This meant that I did not have an infrastructure team, therefore I had to set up a Spark environment myself. I was analyzing futures order books from the Chicago Mercantile Exchange (CME) spanning May 2, 2016, to November 18, 2016. The CME data included extended hours trading with the following fields: instrument name, maturity, date, time stamp, price, and quantity. Futures were comprised of 21 financial instruments spanning six markets—foreign exchange, metal, energy, index, bond, and agriculture. Trades were recorded roughly every half second. In the process of doing this research, I learned a lot of lessons. I want to help you avoid making the mistakes I did so you can start making an immediate impact in your organization with Spark. Here are the six lessons I learned:

Using Redir to Alter Network Traffic: Part 1

By

Chris Binnie

-

September 19, 2017

There are times when, despite your best efforts, you have little choice but to put a quick workaround in place. Reconfiguring network-border firewalls or moving services between machines is simply not an option because the network’s topology is long established and definitely shouldn’t be messed about with.

Picture the scene. You’ve lost an inbound mail server due to some unusual issue with the application which will probably take more than the few minutes you have to fix. In their wisdom, the architects of your mail server infrastructure didn’t separate the web-based interface from the backend daemons that listen for the incoming email, and both services reside on the server with a failed IMAP server (Internet Message Access Protocol), which collects inbound mail for your many temperamental users.

This leaves you in a tricky position. Fundamentally, you need both the services up and available. Thankfully, there’s a cold-swap IMAP server with up-to-date user configuration available, but sadly you can’t move the IP address from the email web interface over to that box without breaking the interface’s connectivity with other services.

To save the day, you ultimately rely on a smattering of lateral thinking. After all, it’s only a TCP port receiving the inbound email, and luckily the web interface can refer to other servers so that users can access their emails. Step forward the excellent “redir” daemon.

This clever little daemon has the ability to listen out for inbound traffic on a particular port on a host and then forward that traffic onwards somewhere else. I should warn you in advance that it might struggle with some forms of encryption which require certificates being presented to it but otherwise I’ve had some excellent results from the redir utility. In this article, I’ll look at how redirecting traffic might be able to help you out of a tight spot and examine some possible alternatives to the miniscule redir utility.

Installation

You probably won’t be entirely surprised to read that it’s as easy as running this command on Debian derivatives:

# apt-get install redir

On Red Hat derivatives, you will likely need to download it from here: http://pkgs.repoforge.org/redir/

Then you simply use “rpm -i <version>” where “version” is the download which you choose. For example you could do something like this:

# wget http://pkgs.repoforge.org/redir/redir-2.2.1-1.2.el6.rf.x86_64.rpm

# rpm -i redir-2.2.1-1.2.el6.rf.x86_64.rpm

Now that we have a working binary, let’s look at how the useful redir utility works; thankfully it’s very straightforward indeed. Let’s begin by considering the non-encrypted version of IMAP (simply because I don’t want to promise too much with services encrypted by SSL or TLS).

First, consider the inbound email server listening on TCP port 143 and what would be needed should you wish to forward traffic from that port to another IP address. This is how you could achieve that with the excellent redir utility:

# redir --laddr=10.10.10.1 --lport=143 --caddr=10.10.10.2 --cport=143

In that example, we see our broken IMAP server (who has IP address “10.10.10.1”) running on local port 143 (set as “–lport=”) having traffic forwarded to our backup IMAP server (with IP address “10.10.10.2”) to the same TCP port number.

To run redir as a daemon in the background, you’re possibly safest to add an ampersand as we do in this example. Here, instead of forwarding traffic to a remote server, we simply adjust the port numbers on our local box.

# redir --laddr=10.10.10.1 --lport=143 --laddr=10.10.10.1 --cport=1234 &

You might also explore the “daemonize” command to assist. I should say that I have had mixed results from this in the past, however. If you want to experiment then there’s a man page here: http://linux.die.net/man/1/daemonize

You can also use the “screen” command to open up a session and leave the command running in the background. There’s a nicely written doc on the excellent “screen” utility here from the slick Arch Linux: https://wiki.archlinux.org/index.php/GNU_Screen

The above config example scenario is an excellent way of catching visitors to a service whose clients aren’t aware of a port number change, too. Say, for example, you have a clever daemon that can listen for both encrypted traffic (which would usually go to TCP port 993 on IMAP for the sake of argument) and unencrypted traffic (usually TCP port 143). You could redirect traffic destined for TCP port 143 to TCP port 993 for a short period of time while you tell your users to update their software. That way you might be able to close another port on your firewall and keep things simpler.

Handling IP address changes

Another life-saving use of the magical redir utility is when a DNS or IP address changes take place. Consider that you have a busy website listening on TCP port 80 and TCP port 443. All hell breaks loose with your ISP, and you’re told that you have 10 days to migrate to a new set of IP addresses. Usually this wouldn’t be too bad but the ISP in question has set your DNS TTL expiry time (Time To Live) to a whopping seven days. This means that you need to make the move quickly to provision for the cached DNS queries which go past seven days and beyond. Thankfully, the very slick redir tool can come to the rescue.

Having bound a new IP address to a machine you simply point back at the old server IP address using redir on your HTTP and HTTPS ports.

Then, you change your DNS to reflect the new IP address as soon as possible. The extra three days of grace should be enough to catch the majority of out-of-date DNS answers but even if it isn’t you could simply use the superb redir in the opposite direction if you ISP let you run something on the old IP address. That way, any stray DNS responses which arrive at your old server are simply forwarded to your new server. In theory (I’ve managed this in the past with a government website), you should have zero downtime throughout and if you drop any DNS queries to your new IP address the percentage will be so negligible your users probably won’t be affected.

In case you’re not aware, the DNS caching would only affect users who had visited in the seven days prior to the change of IP address. In other words, any new users to the website would simply have the new IP address served to them by DNS servers, without any issue whatsoever.

Voting by Proxy

I would be remiss not to mention that, of course, iptables also has a powerful grip on traffic hitting your boxes, too. We can deploy the mighty iptables to allow for a client to unwittingly push traffic via a conduit so that a large network can filter which websites its users are allowed to access, for example.

There’s a slightly outdated document on the excellent Linux Documentation Project (TLDP) website.

With the super-natty redir tool, we can create a transparent proxy as follows (incidentally, transparent proxies are also known as intercepting proxies or inline proxies):

# redir --transproxy 10.10.10.10 80 4567

In this example, we are simply forwarding all traffic destined for TCP port 80 to TCP port 4567 so that the proxy server can filter using its rules.

There’s also a potentially useful option called “–connect” which will allow HTTP proxies with the CONNECT functionality.

To use this option, add the IP address and port of the proxy (using these options “–caddr” and “–cport” respectively).

Shaping

I’ve expressed my reservations about redir handling encrypted traffic because of certificates sometimes messing things up. The same applies with some other two-way communication protocols or those which open up another port such as sFTP (Secure File Transfer Protocol) or SCP (Secure Copy Protocol). However, with some experimentation, and if you’re concerned with how much bandwidth might be forwarded, then the redir can help. Again, you might have mixed results.

You can alter how much bandwidth is allowed through your redirection with this option:

--max_bandwidth

The manual mentioned above does warn that the algorithm employed is a basic one and can’t be expected to be entirely accurate all the time. Think of these algorithms working by considering a period of a few seconds, the recorded throughput rate, and the ceiling which you’ve set it at. When it comes to throttling and shaping bandwidth, it’s not actually as easy to get 100 percent accuracy. Even shaping with the powerful “tc” Linux tool, combined with a suitable “qdisc” for the job in hand, is prone to errors, especially when working with very low capacities of throughput, despite the fact it works on an industrial scale.

My Network Is Down

The Traffic Control tool, “tc”, which I’ve just mentioned is also capable of simulating somewhat unusual network conditions. For example, if you wanted to simulate packets being delayed in transit (you might want to test this with pings), then you can use this “tc” command:

# tc qdisc add dev eth0 root netem delay 250ms

Append another value to the end of that command (e.g., “50ms”) and you then get a plus or minus variation in the delay.

You can also simulate packet loss like this:

# tc qdisc change dev eth0 root netem loss 10%

This should drop 10 percent of packets randomly with all going well. If it doesn’t work, then the manual can be found here: http://linux.die.net/man/8/tc and real-life examples here: http://www.admin-magazine.com/Archive/2012/10

I mention the fantastic “tc” at this juncture, because you might want to deploy similar settings using the versatile redir utility. It won’t offer you the packet loss functionality; however, it will add a random delay, which might be enough to make users look at their settings and then fix their client-side config without removing all access to their service.

Note that redir also supports the “–random_wait” option. Apparently, redir will randomly multiply whatever setting you put after that option by either zero, one, or two milliseconds before sending packets out. This option also can be used with another (the “–bufsize” option). The manual explains that it doesn’t deal directly with packets for its random delays but instead defines them this way:

“A “packet” is a bloc of data read in one time by redir. A “packet” size is always less than the bufsize (see also –bufsize).”

By default, the buffer size is 4,096 bytes; experiment as you wish if you want to alter the throughput speeds experienced by your redirected traffic. In the next article, we’ll look at using iptables to alter how your traffic is manipulated.

Learn more about essential sysadmin skills: Download the Future Proof Your SysAdmin Career ebook now.

Chris Binnie’s latest book, Linux Server Security: Hack and Defend, shows how hackers launch sophisticated attacks to compromise servers, steal data, and crack complex passwords, so you can learn how to defend against these attacks. In the book, he also talks you through making your servers invisible, performing penetration testing, and mitigating unwelcome attacks. You can find out more about DevSecOps and Linux security via his website (http://www.devsecops.cc).

Diversity Empowerment Summit Features Stories from Individual Persistence to Industry-wide Change

By

The Linux Foundation

-

September 19, 2017

Last week at The Linux Foundation’s first Diversity Empowerment Summit we heard from so many amazing speakers about how they are working to improve diversity in the tech industry.

Leaders from companies including Comcast, DreamWorks, IBM, Rancher Labs, Red Hat and many others recounted their own personal struggles to fit in and advance as women and minorities in tech. And they gave us sage advice and practical tips on what women, minorities, and their allies can do to facilitate inclusion and culture change in open source and the broader tech community.

The stories they told were inspiring. They spoke passionately of individual challenges and perseverance, brave acts that raise awareness, and a broad range of initiatives they are undertaking to inspire and create industry-wide change.

Finding success as a woman in tech

Munira Tayabji, a developer and digital supervisor of technology at DreamWorks, spoke about how she overcame the discrimination and alienation she faced as a woman and a minority studying computer science to find a successful career in film animation.

Linux 4.14 ‘Getting Very Core New Functionality’ Says Linus Torvalds

By

The Register

-

September 19, 2017

Memory management wonks, this release is for you. And also you Hyper-V admins.

Linus Torvalds has unsentimentally loosed release candidate one of Linux 4.14 a day before the 26th anniversary of the Linux-0.01 release, and told penguinistas to expect a few big changes this time around.

“This has been an ‘interesting’ merge window,” Torvalds wrote on the Linux Kernel Mailing List. “It’s not actually all that unusual in size – I think it’s shaping to be a pretty regular release after 4.13 that was smallish. But unlike 4.13 it also wasn’t a completely smooth merge window, and honestly, I _really_ didn’t want to wait for any possible straggling pull requests.”

Microsoft Announces General Availability of Azure App Service on Linux and Web App for Containers

By

Info Q

-

September 19, 2017

Microsoft recently announced the availability of Azure App Service running on Linux and support for Web App for Containers. With this recent news, Microsoft is expanding its developer reach by providing more options for developers when bringing their apps and technology stacks to Azure. When provisioning web apps, developers now have the ability to choose an underlying Operating System of Windows or Linux. They also have the ability to ingest containerized applications from popular container repositories.

Today, developers can take advantage of Azure App Service features like integrated CI/CD, deployment slots and auto-scaling. Microsoft claims that more than 1 million cloud applications have been deployed on Azure App Service to date.

When provisioning underlying infrastructure, developers now have an option to take advantage of built-in images for ASP.NET Core, Node.js, PHP and Ruby all on Linux.

How To Deal With A DDoS Attack

By

-

September 19, 2017

You’ve got an irregularly high amount of traffic coming into your server. So much, in fact, that it’s slowing down your server and other clients are timing out trying to access it. Looks like you’re under a DDoS attack. DDoS, or distributed denial of service, is a specific way to attack and destabilize a server, by flooding it with traffic from one or more sources.

On a Linux server, you can identify the multiple connections flooding your server using the netstat utility.

$ netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -3

There are generally two kinds of DDoS attacks. The first kind floods your inbound network connection, which interferece with valid clients trying to connect. The other kind is when the attacks targets a specific service, like your email server, which eventually either stalls from increased server load, or starts rejecting all incoming requests completely. Usually, DDoS attack are deployed through botnets – a large amount of independent computers and servers that have been compromised and made to operate together to flood target networks.

When you’re under an attack like this, it’s difficult – if not impossible – to connect to your server remotely. Instead, use reserve connections such as IPMI/KVM. You can analyze the traffic and where it’s coming from using tshark, tcpdump, or iftop.

Most hosting providers usually just add the infected servers to a “blackhole” where they just drop all incoming packets, while insisting that you add DDoS protection services such as CloudFlare, Akamai or something comparable. It’s a good idea to have these services ready ahead of time, as well as contacting your hosting provider to discuss DDoS protection.

A common preventative tactic, is to use proxy or CDN servers to hide your actual IP address from the public. You can configure your server to accept requests to your IP address only from other trusted addresses, having the rest of your traffic going through the proxies. This serves a dual purpose of also protecting you against threats that try to circumvent your proxy.

You can use utilities like Uptime, W, or PS to check for cases when it’s just a single process that’s being targeted.

Checking log files is a good move, as they could often contain traces of the servers the attacks are coming from, their subnets, and User-Agents used to make requests to these servers. Though, it’s important to use separate utilities to parce the log files, such as Head, Tail, Grep or Less, since opening an entire log file at once can further stall your already system.

For instance, if you have a Nginx based web server that’s receiving a large amount of requests with the string WordPress appearing within the user-agent, you can block all of these requests with just one line of script

if ($http_user_agent ~ WordPress) { return 444; }

You can do the same using iptables, ipset or Fail2ban

# iptables -A INPUT -p tcp --dport 80 -m string --algo bm --string "WordPress" -j DROP

For users that are less experienced using tools like IPTables, this might be a bit complex. If you’re running a Nginx server, you can use the ngx_http_limit_req_module module (convenient name, I know) that will allow you to restrict the amount of requests per second your server will handle from specific IP addresses.

Attacks like this can also exploit vulnerabilities that can arise when software is configured improperly, for instance when it comes to things like DNS and NTP amplification. Reinstalling and reconfiguring said software would be very pertinent in that case, making sure to get the latest patches and being extra careful during the course of setup. In some cases, vulnerabilities arise when software and services go out of use, but continue to run; allowing extraneous access paths to your system. Always remembering to stop and remove unused software and services is just as important to prevent and stop attacks.

Hopefully, this article was helpful. As always, catch us on Facebook and Twitter for future articles, product updates, or if you have any questions for us!

-Until next time!

Future Proof Your SysAdmin Career: Communication and Collaboration

By

Sam Dean

-

September 18, 2017

Today’s system administrators are wise to arm themselves with specialized technical skillsets, but sysadmins interact with people at least as much as they deal with systems, software, and security. Strong communication capabilities, problem solving, teamwork, and leadership skills are therefore not to be underestimated.

In fact, a previous article emphasized the fact that across all levels of a system administrator’s career, these skills are key.

Communicate technical concepts to non-technical people
Solve problems quickly
Write proposals
Communicate with upper management
Document processes

Not all people are equally proficient in these areas. In fact, as Lynn Taylor, a national workplace expert, noted in speaking with Forbes: “Having good people radar is harder to teach than technical skills, but is a requisite.”

Effective communication

The good news is that solid training options are available to help you improve communications and people skills, including options specifically focused on IT and technical personnel. According to Allan Hoffman, an expert on tech jobs, taking a seminar or course is a good first choice for workers such as sysadmins who want to improve communications skills. “To excel as a technical professional, you need to learn how to communicate your ideas and work effectively with others,” he writes.

Global Knowledge offers a course called “Customer Communication Skills for IT Professionals,” with curriculum completed in two days. The class covers such topics as clearly communicating technical concepts to non-technical users, active listening, and conflict management strategies. Downloadable course details are available here.

The American Management Association offers a similar course, according to Hoffman. It has a three-day “Communication and Interpersonal Skills: A Seminar for Technical Professionals” course that has received good notices.

In “How Can Sysadmins Foster Better Employee Communication” Tim Mullahy notes that too much reliance on jargon can undermine a sysadmin’s communication effectiveness. “When discussing the details of a system update or scheduled downtime with non-IT employees, avoid using highly technical language,” he advises.

“Jargon could make you sound like you know what you’re talking about, but it can also teeter on the edge of talking down to people,” writes Fathom’s Caroline Bogart. “If someone doesn’t understand what you’re saying, they’re not going to feel very intelligent.”

Project management

Many of today’s sysadmins are directly involved with supporting the rollout and maintenance of cloud platforms and other complex projects. And, sysadmins with strong project management and collaboration skills are needed to help lead such efforts.

Project management for sysadmins is covered in the Sysadmin Casts series of podcasts. The basic methodology laid out in this podcast series has been used by sysadmins to coordinate complex, multi-month projects.

Many sysadmins also use specific project management and collaboration tools. Trello is an example of a popular collaboration-focused tool, and you can find others here. LibrePlan is a free, web-based project management application that sysadmins can leverage, and it is available in mobile versions.

In the final installment of our series, we’ll look at other open source ways to broaden your skills and examine the connection between open source experience and improved employment outcomes.

Learn more about essential sysadmin skills: Download the Future Proof Your SysAdmin Career ebook now.

Future Proof Your SysAdmin Career: New Networking Essentials

Future Proof Your SysAdmin Career: Locking Down Security

Future Proof Your SysAdmin Career: Looking to the Cloud

Future Proof Your SysAdmin Career: Configuration and Automation

Future Proof Your SysAdmin Career: Embracing DevOps

Future Proof Your SysAdmin Career: Getting Certified

Future Proof Your SysAdmin Career: Communication and Collaboration

Future Proof Your SysAdmin Career: Advancing with Open Source

Two Open Source Licensing Questions: The AGPL and Facebook

By

Linux.com Editorial Staff

-

September 18, 2017

In many settings, open source licensing today is considered a solved problem. Not only has the Open Source Initiative (OSI) largely contained the long feared issue of license proliferation, the industry has essentially consolidated around a few reasonably well understood models.

Copyleft licenses such as the GPL, which require users who would distribute the software to demonstrate reciprocity by making available their changes under the same license (hence the usage of reciprocal to refer to these licenses) exist at one end of the spectrum. So-called permissive licenses, which include the Apache, BSD and MIT licenses, and generally ask very little of users of the code, are at the opposite end. In between are MPL-style licenses, which more selectively apply copyleft-style reciprocity requirements.

Demand for Open Source Skills on the Rise

By

Esther Shein

-

September 18, 2017

Interest in hiring open source workers is on the rise, with 60 percent of companies surveyed looking for full-time hires, compared with 53 percent last year, according to the 2017 Open Source Jobs Report.

Hiring managers from 280 global businesses, along with 1,800 open source professionals participated in the July study by The Linux Foundation and tech career firm Dice.

That’s good news if you have open source skills; indeed, 86 percent of professionals say open source has advanced their careers. The not-so-good news is 89 percent of hiring managers are finding it difficult to find this type of talent, which is in line with last year’s finding of 87 percent. The specific areas hiring managers say open source talent is in short supply are developers (73 percent), DevOps (60 percent) and SysAdmins (53 percent).

It’s no wonder then that 67 percent of managers are eyeing these hires more than other areas of business in the next six months. Fifty-eight percent say they will hire more open source professionals in that timeframe with expertise in cloud (70 percent), web technologies (67 percent) and Linux (65 percent).

Because of the challenges hiring open source professionals, 47 percent of employers say they are willing to pay for employees’ certifications, which is up from 33 percent in 2016. Meanwhile, 55 percent of hiring managers are making formal training a priority and seeking certification in new open source hires.

Training Opportunities

Additional training and certification are being offered by 33 percent of manager respondents who say these are incentives to retain employees, which is up from 26 percent last year. Among the ways training is being provided: 63 percent of respondents say they offer online/virtual courses, while 49 percent pay for individual training, and 39 percent provide live training instruction onsite.

Most hiring managers surveyed (73 percent) say developers are the main position they are looking to fill. They also need DevOps Engineers (60 percent) and Systems Administrators (53 percent).

Cloud technology such as OpenStack and Cloud Foundry ranked as the most sought-after area of expertise among 70 percent of employers, up from 66 percent last year. Web technologies was next, with 67 percent of hiring managers citing a need for that

knowledge, compared with 62 percent last year. Demand for Linux talent remains strong, with 65 percent of hiring managers looking for those skills, down slightly from 71 percent in 2016.

The technologies with the greatest influence over hiring decisions are cloud (62 percent), application platforms (56 percent) and Big Data (53 percent).

This year, cloud/virtualization was cited as the most desirable open source skill among 60 percent of hiring managers, followed by application development (59 percent) and DevOps (57 percent).

Open source professionals weigh in

Open source professional respondents rank five skills closely in demand:

Cloud (47 percent)
Application development (44 percent)
Big Data (43 percent)
DevOps (42 percent)
Security (42 percent)

Additionally, 77 percent of professional respondents say the ability to architect solutions based on open source is the top most valuable skill in their job, followed by 66 percent who say experience with open source development tools and 65 percent who cite knowledge of new tools.

Cloud technologies skills will be the most important skill to have in 2018, according to 69 percent of open source professional respondents, followed by big data/analytics (57 percent); containers (56 percent) and security (55 percent).

In the coming weeks, we’ll be looking at individual skills in more detail, examining specific hiring needs and training opportunities.

You can download the complete 2017 Open Source Jobs Report now.

This Week in Numbers: New Monitoring Methods for Kubernetes

By

The New Stack

-

September 18, 2017

Our new report, The State of the Kubernetes Ecosystem reports on a survey of 470 container users, 62 percent of which were at least in the initial production phase for the Kubernetes open source container orchestration engine. After further screening, we were able to get detailed information from 208 people about the storage and monitoring technologies they use with Kubernetes.

Prometheus is by far the most cited tool among our survey respondents for monitoring Kubernetes clusters. Heapster, however, has also gained significant adoption among our group. Traditional monitoring vendors are not faring as well, although usage levels for their tools appear to increase when they are being integrated into a larger, custom monitoring platform.