This post is a follow up to Docker on AWS: 1 Week In.

I’ve spent a month or so messing around with Docker on AWS. The goal was to reduce costs and make our sandbox environments more performant. My conclusion is that Docker Universal Control Plane is not suitable for our organization and I recommend you stay away as well. Here’s why.

Previous experiments with Docker Swarm revealed large latencies in the internal GET /networks API call. This especially problematic if you are deploying containers on an overlay network (e.g. you use docker-compose). UCP suffers from the same problem. My friend and I traced the problem (starting with an investigation into IO Wait times) back to the same issue when starting a ~25 container application.

I contacted Docker Support on this issue and others. We went through the various back and forth. The Support Engineer replied that the UCP team is well aware of the issue and a fix is planned for version 2.x. There were three suggested workarounds:

  1. Remove all replica controllers
  2. Move all infrastructure to same availability zone
  3. Tune Docker’s internal key-value store parameters.

I tried option 3 but it did not change anything. This means things are best degraded (or at worst fundamentally broken) out of the box. This was a red flag for me.

Options 1 and 2 are unacceptable for high availability environments. Truthfully it surprised me this was a suggested production work around. They replied that their customers are not using multiple availability zones so generally it wasn’t an issue. This surprised me as well. This makes me wonder what the customer base is actually using UCP for?

It’s concerning Docker Inc. would ship a production product with issues out of the box on their reference infrastructure. They’re charging a license fee to boot!

I recommend you avoid UCP until at least version 2 and until it supports the swarm features in Docker 1.12. Currently UCP 1.x uses the legacy swarm implementation. It does not make sense to get adopt the product in it’s current state.

However I’d like to give thanks to the support engineers helping me in debugging and assessing issues. Also big thanks to the official reference AWS implementation for merging my PRs and accepting feedback. The product may not have worked out for me, but now there are improvements for everyone to use.

Good luck out there and happy shipping.