Steering Kubernetes Through Uncharted Territory


Taylor Thomas is a Cloud Software Engineer for the Software Defined Infrastructure team at Intel working on Kubernetes, CI/CD, and the Snap open telemetry framework. The team also uses Kubernetes to run a large part of their services, and Thomas will describe this work in his upcoming talk “Off the Beaten Path: An Explorer’s Guide to Kubernetes ” at KubeCon. In this article, however, he provides a preview of some challenges that the team has encountered.

Taylor Thomas, Cloud Software Engineer at Intel What are some examples of uncharted territory that your team has encountered?

Taylor Thomas: My team works in research and development, and we use Kubernetes to run a large part of our team’s services. Because of this work, we have found many examples of uncharted territory, and I’ll cover most of these in my presentation at KubeCon. However, there are two really good examples we encountered recently.  

The first is how to use readiness probes with services that have both a “ready to synchronize with peers” and a “ready to accept requests” state. Readiness probes are a feature in Kubernetes that allow you to control when services are in a ready state based on customizable call-outs. If the container a service points to is not ready, the service will not route traffic to a container. By default, containers are considered to be in a ready state as soon as the process is running. In practice, this doesn’t account for the fact that services are not always ready to accept requests immediately on start because there is some initial configuration that needs to complete first. Our Cassandra instance is a perfect example of how this can be problematic. When we put in a readiness probe, it created a race condition where each node couldn’t be ready until it synchronized with its peers, but couldn’t synchronize with its peers until they were in a ready state.

ConfigMaps are Kubernetes objects that allow users to specify configuration data that can be mounted into a container at runtime. We ran into specific use cases where we wanted to mount multiple ConfigMaps into the same directory or when we wanted to place ConfigMaps into a directory with other config files. When we updated to Kubernetes 1.3, all of a sudden, we found that all of our files were no longer mounted. This prevented Jenkins from starting with the proper configuration. How did you address these issues?

Taylor: For the readiness probes, after some digging, we found you can add an annotation to the pod like this: “true”

When we created a separate service for the Cassandra inter-node communication used for initialization with this annotation, everything started up again and the readiness probes worked perfectly.

Our workaround for the mounted file involved a little imagination, but it has worked extremely well. In our Jenkins container startup script, we created a dead symlink to a separate path in the container where we will mount the config file.This allowed us to put each config file into it’s own directory as the symlink points to the correct location. How can the documenting of solutions ease future pain points?

Taylor: Both of the problems I mentioned did not have any documentation. In the readiness probe example, we had to search around until we found a GitHub issue with the same problem. This then pointed to a PR where the annotation had been merged. In both examples, it took time to diagnose the issue (which can be difficult with all the moving parts of Kubernetes) and then search for a solution to the problem. Had these been documented, both issues would take less than 10 minutes to fix. Documentation helps everyone reduce the time spent troubleshooting and searching. What should companies consider before installing/deploying Kubernetes?

Taylor: Kubernetes abstracts away many of the pain points of having to orchestrate containers. However, it creates new problems to solve (and document!) and forces you to adopt a new paradigm. It comes down to the common problem of the value you gain from it. In our case, it has sped up our ability to iterate and release features without impacting users and has made it easy to spin up something as complex as Cassandra or Jenkins in a matter of seconds. It has come at the expense of a lot of learning and researching. So, if you are just hosting a webapp or other simple application, the added complexity may not be worth it. If things are bigger though and you need the ability to scale, I would highly recommend giving Kubernetes a try! Have the complexities of Kubernetes increased with its popularity?

Taylor: With more users always comes more feature requests, so the simple answer to this is yes, it has. Even so, I don’t find that a problem. Adam Jacob at Chef recently gave a great talk I loved in which he talked about embracing complexity as long as it brings ease. So the question I like to ask is “Does Kubernetes make things work with ease?” To that, I would say yes. It is still growing and there are still issues to be solved, but it has given us a scalable system we wouldn’t have had otherwise.

Registration for this event is sold out, but you can still watch the keynotes via livestream and catch the session recordings on CNCF’s YouTube channel. Sign up for the livestream now.