I’ve been working on Kubernetes networking a lot recently. One thing I’ve noticed is, while there’s a reasonable amount written about how to set up your Kubernetes network, I haven’t seen much about how to operate your network and be confident that it won’t create a lot of production incidents for you down the line.
In this post I’m going to try to convince you of three things: (all I think pretty reasonable :))
Avoiding networking outages in production is important
Operating networking software is hard
It’s worth thinking critically about major changes to your networking infrastructure and the impact that will have on your reliability, even if very fancy Googlers say “this is what we do at Google”. (google engineers are doing great work on Kubernetes!! But I think it’s important to still look at the architecture and make sure it makes sense for your organization.)
I’m definitely not a Kubernetes networking expert by any means, but I have run into a few issues while setting things up and definitely know a LOT more about Kubernetes networking than I used to.