Crate.io, the winner of our Disrupt Europe 2014 Battlefield, is launching version 2.0 of its CrateDB database today. The tool, which is available in both an open source and enterprise version, started out as a general-purpose but highly scalable SQL database. Over time, though, the team found that many of its customers were using the service for managing their machine data and, unsurprisingly, decided to focus its efforts on better supporting those clients.
“The main message is that we hit the nail with the machine data focus that we now have for CrateDB,” Crate co-founder and COO Christian Lutz told me, and added that it was basically the customers who educated the team on what they needed. “We took a look at what they were doing and the database market is so crowded — but we have this mix of SQL and NoSQL,” Lutz said, “and IoT is going to be the fastest growing market for databases.”
The management of DNS entries works fundamentally differently for clouds than for classic setups; OpenStack Designate is a meaningful and well-functioning DNS as a Service implementation.
DNS is normally one of the first services set up for new infrastructures in the data center; otherwise, it is hardly possible to use the other computers. Many people often only realize how crucial DNS is when it stops working, such as when a website cannot be accessed because the responsible DNS server has crashed. In short: DNS is a fundamental prerequisite for virtually all other services an admin sets up.
The topic of DNS has several dimensions: On the one hand, computers must be able to resolve hostnames to communicate with other computers within and outside the setup. On the other hand, the management of your own DNS entries is done in the appropriate DNS servers: A website that can only be accessed via IP address is rarely desirable; a web address of the expected structure is preferred (e.g., www.example.com ). To that end, a corresponding A record (or AAAA [quad-A] record for IPv6) must be stored for a domain, and a corresponding PTR record (which refers to the A record) must be created in the DNS file for the respective address space.
One of the most fundamental challenges of monitoring modern cloud applications is the ability to see all the components and their dependencies. The lack of visibility is a critical problem that is worsening as the instance lifespan reduces, components span private and public clouds, and external service dependency increases. As the pace of software development and complexity of application continues to increase, the visibility challenge for operations teams can be summarized as “driving at 100mph with blindfolds!”.
An emerging solution to help with visibility is application maps. In this post, we will describe application maps, their use cases and cover some popular techniques used to generate application maps.
What is an Application Map?
An application map is a topology map comprising of nodes and edges where:
The nodes represent groups of processes or compute instances
The edges represent the network or IPC interactions between nodes (i.e between groups of compute instances).
There are multiple characteristics that need to be highlighted for application maps:
Grouping of instances is a crucial aspect because otherwise a map at individual server, VM or container level can become overwhelming. Grouping of compute instances serves similar purpose as “resolution” on a Google map. The below map examples from Netflix are good illustrations of the notion of groups and resolution. Figure 1a, is grouping of multiple instances and its “resolution” is low. It is like you zoomed-out on Google map and are seeing the map of a region at state or country level. Figure 1b, is the zoomed-in view which shows specific services.
Application-level details should be present in the application maps rather than merely presenting infrastructure map of hosts, VMs or containers. That is, the user should be able to visualize the services such as databases, DNS, Service Discovery, REST/HTTP endpoints, etc. on the application map.
Golden signalssuch as latency, error rates, throughput and saturation metrics should be captured and displayed for nodes and edges. These metrics enable operations teams to quickly understand the health and performance of application components.
Figure 1: Application Maps with Nodes Representing Groups of Compute Instances
Application Map Benefits
Visibility: Naturally, the biggest value of maps is the ability to see all the components and their dependencies.
Incident Response: Application maps can greatly expedite incident response. The dependency chain on a map shows all services participating to fulfill transactions. In the absence of maps, incident response is greatly hampered in trying to first identify all the services involved in a transaction and then often manually correlating metrics to find root cause.
Monitoring and Alerting: Using application maps, operations teams can easily identify the services that are on critical path such as those serving end user requests. Operations teams can then define the Service-level Objectives (SLOs) for the critical services and set alerts/paging for them. This can greatly reduce the well known problem of alert fatigue.
Capacity Planning and Forecasting: With the knowledge of critical services, operations teams can ensure appropriate capacity allocation for them. Application maps also highlight potential single points of failures and congestion hotspots.
Auto-documentation: Application maps provide automated documentation of all components, their dependencies and capture changes over time. Without maps, the information is scattered in configuration manifests, CI/CD systems, and very often inside operators’ heads!
Security: Maps are beneficial for identifying security vulnerabilities in an application. For example, a map can be used to identify if two services that are not supposed to talk to each other are doing so.
Application Mapping Techniques
Application mapping approaches can be categorized into static and dynamic approaches. The static maps are essentially modeling techniques such as CloudCraft, Yipee.io, Spotify’s System-Z. Figures 2 shows an example of static application map generated using CloudCraft.
Figure 2. Static application map created with CloudCraft
In this post we will focus on the dynamic application mapping approach which fall under two categories (1) end-to-end tracing (2) ingress and egress (individual) tracing.
Figure 3. Application Mapping Techniques : APM, Tracing SDKs and Operating System Level Tracing
End-to-end Tracing Techniques:
APMs: Application performance management (APM) techniques require code embedded agents on all processes that tracks code execution path. For some languages, agents can get end to end trace by dynamically injecting a trace ID (i.e. custom HTTP headers, Thrift fields, gRPC) to piece together requests and responses across services. AppDynamics, New Relic, and Dynatrace are the popular products in this category that leverage code profiling and transaction tracing to generate maps. Figures 4, shows example of application map generated from AppDynamics. APM techniques are hard to keep up with newer technologies and require an N*M support matrix. For example, for APM techniques to support MySQL tracing, they need to track in all languages. As new programming languages are released, APM techniques need to go and support all the combinations. For example, to release Node.JS APM, vendors need to support all HTTP frameworks, MySQL clients, Postgresql clients and so on. This provides an example of an N*M support matrix for AppDynamics.
Figure 4. Application map generated from AppDynamics
Tracing SDKs and Proxies: These techniques allow developers to embed tracing SDKs in the application code and use them to track entry points and exit calls. These SDKs don’t look at code execution but instead just eject headers in requests to correlate. Some techniques apply sampling to help scale in production. SDKs emit spans, which contain the unique trace ID and other metadata/tags. Some popular products in this category include OpenTracing, Datadog APM, AWS X-Ray, Finagle, linkerd and Envoy. Figure 5 an example of a application map generated from AWS X-Ray.
Figure 5. Application map generated from AWS X-Ray
Pros and Cons of End-to-end Tracing Techniques:
Pros:
Help in root cause analysis: Few SDKs help with needle in the haystack (root cause) analysis, e.g. provide rules to record very specific transactions (e.g. for a user). Practically, in production, sampling is enabled in production and heavy recording rules are often avoided unless root cause analysis is being performed.
Trace exact path of requests: With tracing techniques, we can track requests as they pass through multiple services, and get the timing and other metadata throughout. This information can then be reassembled to provide a complete picture of the application’s behavior at runtime. Tracing exact path also helps understand request concurrency better and to re-architect the services to make parallel or asynchronous requests, if needed.
Cons:
Overheads: Tracing techniques need to store individual traces, which can be challenging in production unless sampling is applied.
SDKs or agents needs to be embedded everywhere in the stack in order to get coverage. This can be tricky when calls are made to legacy services or OSS components. Also tricky when different languages are used, e.g. mix and match Node.JS, Java, Python, Go across services.
Some techniques use tracing proxies (e.g. linkerd) to inject headers, but application still needs to be aware and has to pass on the context (i.e. headers) when making further calls to other services for the entire glue to work. For more details refer this post.
Individual traces don’t add much value as no one has time to go through millions of recordings. All tools ultimately aggregate traces to build a more meaningful cloud application map. In the following section, we describe how aggregation of traces results in exactly the same map as generated by individual tracing techniques.
End to end trace is often misleading as it does not capture the load on services (i.e. what other traffic was present) when a trace was recorded. The slow performance on services is often due to traffic load. Hence, aggregating traces is the only way to see something of value.
Ingress and Egress (individual) Tracing:
Logs: Some practitioners have built maps using logs gathered from application stacks or proxies. Some technologies such as Apache web server and Nginx proxies can provide detailed logs for each request. Splunk and Elasticsearch have general purpose graph interface to plot all kinds of relationships. However, this technique is very impractical and requires emitting standardized logs on each service request on each service and OSS. Logs also have huge storage overhead.
OS Tracing: Operating systems provide various tracers that allow tracing not just the syscalls or packets, but also any kernel or application software. For example, tcpdump is a network tracer in Linux. Other popular tracers are eBPF, DTrace and Sysdig. Figure 6, shows an example of application map generated from Netsil’s Application Operations Center (AOC) using packet capture and service interaction analysis.
Figure 6. Application map generated from Netsil AOC
Pros and Cons of Ingress and Egress (individual) Tracing Techniques:
Pros:
Ingress and Egress techniques provide universal coverage as protocols don’t evolve as often as programming languages and frameworks.
Ingress and Egress techniques yield exactly the same map as tracing techniques do after aggregating a large number of end to end traces without having to inject and carry forward trace IDs.
Ingress and Egress techniques can map anything that talks over the network, even those technologies where trace ID injection is impossible – e.g. DNS calls, MySQL, Postgresql, Cassandra, Memcached, Redis etc.
Raw data collection is lighter weight than code embedded APMs (done inside OS kernel). Though there are overheads when the collected data is processed locally (often in user space).
New technologies are relatively easy to support (no need for N*M support matrix), making this approach more pervasive and future proof.
More accurate and representative of real behavior in production. Packets are often said to be the ultimate source of truth.
Cons:
Ingress and Egress techniques need to deal with reconstruction of application context from network data. Some protocols such as MySQL have complex session state machines.
Ingress and Egress techniques don’t work when encryption is employed within the cloud. But this not a problem when SSL termination happens at the load balancer or when using IPSec.
Some techniques can have high storage overheads when the reconstructed data is stored in form of events rather than rolled up time series – e.g. PacketBeat.
Ingress and Egress techniques can’t tie together related requests and exact fan-out behavior of entry points. Though with modern microservice patterns this is less of a problem as fewer API endpoints exist on services compared to monolithic applications.
Hard to track specific business transactions end to end without automatic trace ID correlation. Though possible in some solutions by triggering recording by doing deeper payload analysis using regexes or certain behavior such as a 500 server error.
Conclusion
End-to-end trace map(s) are hard to gather (across languages, teams, large codebases) in real-world apps and the only valuable information they uniquely provide is the exact fan-out pattern of very specific calls. The maps that provide actionable insights and are useful for DevOps workflows are the ones that aggregate (individual and end to end) traces to build a holistic view. The least friction way to collect individual traces is either via logs or OS tracing tools. Netsil has a unique approach that leverages individual service interactions to build the real-time application map. The beauty of Netsil’s approach is that you don’t need any code change or instrumentation. We will describe Netsil maps in more details in the second part of the blog. Meanwhile you can easily get started with Netsil and have complete visibility into your application today. Get started free with Netsil.
“The question of whether Software Defined Networking is a good idea or not is closed. Software Defined Networking is how we do networking,” said Amin Vahdat, Fellow & Technical Lead For Networking at Google, during his Open Networking Summit (ONS) keynote. At Google, they’ve gone head first into the cloud through the Google Cloud Platform, which Vahdat says has expanded their network in new and exciting ways. They built one of the largest networks in the world over the past decade to support Google services, like web search, Gmail, and YouTube. But with the move to the cloud, they are now hosting services for others, which is pushing them into new functionality and capabilities.
Google’s cloud architecture is built around disaggregation with storage and compute spread across the entire data center. It doesn’t matter which server holds a particular piece of data, because they’re replicating the data across the entire data center. The networking challenge with this approach is that the bandwidth and latency requirements for accessing anything anywhere increase substantially, which pushes their requirements for networking within the data center, Vahdat points out.
Software Defined Networking (SDN) has been evolving at Google. Vahdat says that in 2013, they presented B4, a wide area network interconnect for their data centers. In 2014, it was Andromeda, their network virtualization stack that forms the basis of Google Cloud. In 2015, Google had Jupiter for data center networking. Now, they have Espresso, which is SDN for the public Internet with per-metro global views and real-time optimization across many routers and many servers.
“What we need to be doing, and what we will be doing moving forward, is moving to Cloud 3.0. And here the emphasis is on compute, not on servers,” Vahdat says. This allows people to focus on their business, instead of worrying about where their data is placed, load balancing among the different components, or configuration management of operating systems on virtual machines. With networking playing a critical role in Cloud 3.0, there are a few key elements to think about: storage disaggregation, seamless telemetry, transparent live migration, service level objectives, and more.
Vahdat suggests that, “the history of modern network architecture is going to begin at ONS. In other words, this community has been responsible for defining what networking looks like in the modern age and it’s really different from what it has been.”
Watch the video to see more about how networking and the cloud are evolving at Google.
Interested in open source SDN? The “Software Defined Networking Fundamentals” training course from The Linux Foundation provides system and network administrators and engineers with the skills to maintain an SDN deployment in a virtual networking environment. Download the sample chapter today!
The Linux Foundation’s diversity scholarship program provides support to those from traditionally underrepresented or marginalized groups in the technology and open source communities who may not otherwise have the opportunity to attend Linux Foundation events for financial reasons.
We firmly believe the power of collaboration is heightened when many different perspectives are included, so these efforts benefit the community, not just those who participate.
Linux Foundation scholarship recipient Khushbu Parakh
In 2016, The Linux Foundation awarded more than $75,000 in complimentary registration passes for diversity scholarship recipients to attend 12 Linux Foundation events in a variety of industries — from automotive and cloud computing to embedded, IoT, and networking.
Khushbu Parakh was one of last year’s scholarship recipients. She is a junior developer who favors Python and says she’s truly fascinated with it. Parakh is also a Google Summer of Code mentor with the Anita Borg Organization.
“Besides that, I’m a geek. I like new ideas and pushing the envelope of the possible uses for computing,” Parakh said. “However, technology isn’t just about building a nifty new widget. I like being on the cutting edge of the what is going to be (the) next hurdle.”
Linux.com asked her for her thoughts on the program and The Linux Foundation event she chose to attend. She named a long list of benefits that she reaped afterwards, including landing an internship at Avi Networks whom she met at the conference, winning a hackathon, and being awarded with further training and research support. She also said she’s sharing what she learned through her work mentoring young girls.
Here’s what else she said.
Linux.com: Why did you apply for a scholarship?
Khushbu Parakh: I applied because I wanted the opportunity to network with developers who had the same interests but a variety of experiences in computer science and related fields. From this community, I can learn more about their areas of study as well as the problems they encounter in their research and within their professional development. In addition to expanding my technical knowledge through networking and workshops, The Linux Foundation offers the opportunity to meet women in computing through lunches during the conferences. Further, with graduation soon approaching, I was looking for a variety of options open to my current skill set. I also wanted to find a career path that fits my personal goals of creating an environment that fosters and encourages young females to pursue all science, with an emphasis on computing.
Linux. com: Which event did you attend and why?
Parakh: I attended ApacheCon: Big Data Seville 2016. I was looking forward to doing my research on macro connections so the conference gave me nice exposure and a chance to meet the mentors. I ended up spending more time using tools from Kaggle, a Big Data repository.
I have also applied to MesosCon Europe 2017 where I not only want to learn more but also to contribute to open source. I want to have a hands-on session on technology and get to know the ways that it can help me.
Linux.com: How did you benefit from attending? What did you gain from attending the event?
Parakh: Four great things happened to me after attending the conference:
I am shortlisted to pursue my research in Big Data (Macro connections) by my research professor at the University of Zurich.
I and my team were the winners in a hackathon. We contributed to DC/OS Mesos dashboard by adding new features.
I got my internship at Avi Networks whom I met at the conference.
Linux.com: Will you be sharing what you learned at these events? If so, how?
Parakh: I participate in Science Immersions, local meetups, and sessions in a program designed to expose students from economically disadvantaged backgrounds to different types of STEM. There I am a role model to young girls, talking to them about my research and other aspects of being a female in STEM. My technical expertise is currently in cloud computing. I am working on load balancing using OpenStack and Google Cloud Platform (GCP) monitoring the performances using the functional automation testing.
Particularly, the young girls learn about different perspectives of STEM in an entertaining and educational manner that instills excitement and love for STEM. Learning these skills at the conference helped me to be a mentor in Google Summer of Code which encourages students all over the world to contribute to FOSS. I felt immensely happy to see that I can bring change, at least in the life of some people who want to do something to better the world.
LinuxCon + ContainerCon + CloudOpen China | June 19-20, 2017
It’s been a long time in the works, but a memory management feature intended to give machine learning or other GPU-powered applications a major performance boost is close to making it into one of the next revisions of the kernel.
Heterogenous memory management (HMM) allows a device’s driver to mirror the address space for a process under its own memory management. As Red Hat developer Jérôme Glisse explains, this makes it easier for hardware devices like GPUs to directly access the memory of a process without the extra overhead of copying anything. It also doesn’t violate the memory protection features afforded by modern OSes.
TensorFlow is a beast. It deals with machine learning algorithms, it uses Bazel to be built, it uses gRPC, etc., but let’s be honest, you are dying to play with machine learning. Come on, you know you want to! Especially in combination with Docker and Kubernetes. At Bitnami, we love apps, so we wanted to!
You just need to add bi-modal in there and you will hit buzzword bingo.
Jokes aside, TensorFlow is an open source library for machine learning. You can train data models and use those models to predict or infer information. I did not dig into TensorFlow itself, but I hear it is using some advanced neural networks techniques. NN have been around for a while, but due to computational complexity, they were not extremely useful passed a couple of layers and a couple dozen neurons (20 years ago at least 🙂 ). Once you have trained a network with some data, you can use that network (aka model) to predict results for data not used in the training data set. As a side note, I am pretty sure we will soon see a marketplace of TF models.
When it comes to scientific computing, few names are more well known than Stephen Wolfram. He was the creator of the Mathematica, a program that researchers have been using for decades to aid in their computations. Later Wolfram expanded Mathematica into a full multi-paradigm programming language, called Wolfram Language. The company also packaged many of the Mathematica formulas, and a lot of outside data, into a cloud-based service and API. So at this year’s SXSW Interactive, we spoke with Wolfram about how to use this new cloud service to add computational intelligence into your own programs.
These days, digital grabs a lot of headlines that trumpet how it’s radically changing customer behaviors. This typically means that IT departments have to deliver new features faster even in the face of more demanding requirements for availability (24/7) and security.
DevOps promises to do exactly that, by fostering a high degree of collaboration across the full IT value chain (from business, over development, operations and IT infrastructure). But there’s a problem.
While many software-development and operations teams have made steps toward DevOps methods, most enterprise IT-infrastructure organizations still work much as they did in the first decade of this century: They use a “plan-build-run” operating model organized by siloed infrastructure components , such as network, storage, and computing.
A recent federal district court decision denied a motion to dismiss a complaint brought by Artifex Software Inc. (“Artifex”) for breach of contract and copyright infringement claims against Defendant Hancom, Inc. based on breach of an open source software license. The software, referred to as Ghostscript, was dual-licensed under the GPL license and a commercial license.
This case highlights the need to understand and comply with the terms of open source licenses. … It also highlights the validity of certain dual-licensing open source models and the need to understand when which of the options apply to your usage. If your company does not have an open source policy or has questions on these issues, it should seek advice.