Using Mesos Quotas to Control Resource Allocation

26

Did you know that Apache Mesos supports quotas? It has since version 0.27. In an ideal world, we could fine-tune quotas to manage resources for maximum efficiency, reining in hogs and making sure that services get what they need without going overboard. In the real world, it’s a little more challenging. Should quotas be limits or guarantees? Persistent or dynamic? How granular should quotas be? Why hasn’t Quota seen wider adoption? Alex Rukletsov of Mesosphere answers these questions, and more, at MesosCon Asia 2016.

Mesos provides role quotas. These roles reserve resources for one or more frameworks in a cluster. These resources are not tied to any particular agents, cannot be hijacked by other roles, and are guaranteed to be available, assuming the cluster provides adequate resources. Multiple frameworks can use the same role. Some examples of use cases are:

  • Dividing a cluster between two organizations
  • Ensuring that persistent volumes are available only to frameworks registeried with that role
  • Giving some frameworks higher priority than other frameworks
  • Guaranteed resource allocation

Rukletsov explains how Quota’s builders expected it to work: “A request comes in, and we check the capacity, whether there are enough resources in the cluster to satisfy the request, and we perceive these requests in the registry, and is it necessary for failover, and then we basically exercise the request if we can do it, and everyone is happy.”

But the real world is rarely immediately happy, and Quota has some limitations. “First, resources that we laid away for Quota, they are not offered to other frameworks, which means if you layaway two CPUs in your cluster for future use of that production web application, these resources currently will not be offered to anyone else…Another limitation is that Quota is only on limit, instead of guarantee and delimit.”

When you layaway two CPUs for some future use, it would be nice to let a different framework use them until they are called for, instead of letting them sit idle. But it doesn’t work this way. “This production framework says I now want my two CPUs back”, says Rukletsov, “So you should have the mechanism how to preempt these resources and reuse them and give them back to the production framework. We don’t have this in Mesos now, we’re currently working on that.”

Handling limit vs. guarantee is challenging to implement. Then you need revocable and non-revocable resources. The current status is resources are not easily revocable, and this probably will not change as this already provides limit and guarantee in a single mechanism.

Watch Rukletsov’s talk (below) to learn about common pitfalls, rebalancing, frameworks that hoard resources, how enforcement works, capacity checks, balancing unused resources with leaving enough headroom for transient demands, and much more.

https://www.youtube.com/watch?v=xs6TI_SdL8M?list=PLbzoR-pLrL6pLSHrXSg7IYgzSlkOh132K

Interested in speaking at MesosCon Asia on June 21 – 22? Submit your proposal by March 25, 2017. Submit now>>

Not interested in speaking but want to attend? Linux.com readers can register now with the discount code, LINUXRD5, for 5% off the attendee registration price. Register now to save over $125!