Docker containers are a fantastic way to encapsulate complex build processes. Software requires a host of dependencies. Dependency sprawl is especially problematic in polygot teams. It becomes infeasible to maintain multiple configurations on engineer’s machines and then on CI systems. Say you need to produce a JAR, WAR, or even generate a static site. You can do that with Docker containers.

The Easy Way

Most examples (even the official images) do something like this:

docker run --rm -it -v "${PWD}:/data" some_image --output /data

Where some_image writes files to /data. I do the same thing to generate my Ruby dependencies:

docker run --rm -it -v "${PWD}:/data" -w /data ruby:2.3 \
  bundle package --all

This mounts the current directory at /data then instructs the container to write back to /data. Tada! This is how you can use a container to generate artifacts on the Docker host.

A Fix

This is the easiest way to get data in/out of containers. However it creates a problem since Docker containers are often run as root. This approach may litter the file system with root owner artifacts depending on your Docker setup (e.g. the Docker daemon runs directly on your host or the Docker daemon is running in a VM). The problem usually reveals itself on CI. These machines usually run Linux with a native Docker install so container run as root and write root owned files back to the bind mount. The workaround is to run the container as the current user (-u $(id -u)). Here’s an example:

docker run --rm -it -u $(id -u) -v "${PWD}:/data" -w /data ruby:2.3 \
  bundle package --all

Now the container runs as $USER, thus files generated by the container are owned by that $USER. This solves probably 90% of use cases. There are scenarios where this may not work. This solution does not work with remote Docker hosts (e.g. something like Swarm). docker-machine on OSX solves this by mounting $HOME as a shared directory in the VM so file system mounts (inside $HOME) work transparently.

The Correct Way

docker cp works in 100% of scenarios without any workarounds. The cp command copies files into and out of containers. cp works on individual files or directories. Files are copied directly while directories are tarred. This solution requires a few more commands but works 100% of the time. Let’s see some examples.

Here we assume the Docker image contains everything required to build the artifact(s).

docker run --name builder my-image script/build path/to/something.jar
docker cp builder:path/to/something.jar something.jar
docker stop builder
docker rm builder

This example runs the container. The path/to/something.jar is an example. Note that --rm is not used with docker run. This ensures the container is not removed so we can copy files out of the container. Next docker cp specifies the container:path and copies that to something.jar. No permissions or anything to worry about. Finally the container is stopped and removed.

Here is another example from [Docker & Node boilerplate][node]. The example uses docker create and docker cp to prepare a container for installing node.js dependencies from package.json. package.json is copied into the container, the container runs, then node_modules is copied out, finally the container is stopped and removed.

$(PACKAGE): package.json
	mkdir -p $(@D)
	docker create -i --name installer -w /data $(NODE_VERSION) npm install
	docker cp package.json installer:data/package.json
	docker start -ai installer
	docker cp installer:/data/node_modules - | gzip > $@
	docker stop installer
	docker rm installer

Here docker cp streams a tar file to stdout. That tar is extracted to the proper directory.

I hope this post clarifies how to implement the build container pattern. It’s astoundingly useful when done right. tl;dr: Use docker cp as described.

Good luck out there and happy shipping!