The SlashDeploy blog was previously deployed from my local machine. I figured it is time to change that. This post walks through the continuous delivery process step by step. You may be thinking: the blog is just some static content right? This is true and it is a perfect test bed for applying continuous delivery to a small (but important) piece of software.

Achieving continuous delivery is no small task. It requires careful engineering choices and sticking to a few key principals. First and foremost: continuous integration. Every change must run through an automated test suite. The test suite should verify that a particular change is production ready. It is impossible to have continuous delivery without continuous integration. Production bugs must be patched with regression tests to ensure they are not repeated. Second: infrastructure as code. Software requires infrastructure to run (especially web applications/sites/services). The infrastructure may be physical or virtual. Regardless of what the infrastructure is, it must also be managed with code. Consider a change to that requires a change in web server. Their must be code to do so. That code (by the first principal) must also have continuous integration. Finally there must verification criteria for each change. This signals a deployment completed successfully or not. Automation is the common ground. These principles must be applied and baked into the system from the ground up.

10,000 Feet

This blog is statically generated site. Any web sever can serve the generated artifacts. This requires from infrastructure. A CDN should be used as well to ensure a snappy reading experience. Finally the web server needs a human readable domain name. OK, so how do we make that happen? Use jekyll to generate the site. Use CloudFormation to create an S3 website behind a CloudFront CDN and an appropriate Route53 DNS entry. GitLab CI runs everything. Right, those are the tools but what does the whole pipeline look like?

  1. Generate the release artifact
  2. Run tests against the release artifact
  3. Run tests against the CloudFormation code
  4. On master?
    1. Deploy the CloudFormation stack
    2. Test CloudFormation deploy went as expected
    3. Copy release artifact to S3 bucket
    4. Test release artifact available through public DNS

This whole process can be coordinated with some bash programs and some make targets. Time to dive deeper into each phase.

The Build Phase

make builds all the content and dependencies (e.g. jekyll plus all other ruby gems) into a docker image1. The image can be used to start a dev docker container or to generate the release artifacts. Next make dist target generates the release artifact. The docker cp is used instead of -v "${PWD}:/data method2. The release artifacts ares kept in dist/ for testing.

The release artifact (a directory of files in this case) is run through the following tests:

  1. The root index.html exists
  2. The defined error.html exists
  3. The sentinel file exists
  4. robots.txt blocks access to error.html
  5. robots.txt blocks access to the sentinel file
  6. Each HTML file has a tracking snippet

You may be wondering about the sentinel file. The sentinel file uniquely identifies each release artifact. The file name includes the git commit that built it. It lives in _sentinels/GIT_COMMIT.txt. Its sole purpose to indicate a release artifact is available via the CDN. The sentinel file name should should be unique to bust caches. If it were not (simple sentinel.txt with unique content) it would subject to any cache rules the CDN may apply (such as how long content can lives in edge nodes before rechecking origin). This would wreak havoc on deploy verification.

Each test focuses around production behavior. The first two assert the release artifact should function properly behind the CloudFront CDN. The sentinel tests assert this build stage meets the next stage’s requirements. The robots.txt test assert proper things are not included in search engines. Finally tracking (page views, browser, etc) is important so it must be included.

Infrastructure

I have touched on the infrastructure a bit. The infrastructure is an S3 bucket behind a CloudFront CDN with a RouteR53 DNS entry. CloudFormation manages the whole bit. The bin/blog coordinates the AWS calls. The deploy command is the heart. This either creates a non-existent stack or updates an existing one. There are also utility commands to get stack status, outputs, and importantly for testing. The validate command validates the CloudFormation template through an API call. This eliminate errors such as invalid resource types, missing keys, syntax errors, and other things a complier might point out. Unfortunately this does not assert a template will work. Deploying it is the only way to know for sure. This is a key limitation with CloudFormation3. However it is enough for this project. Finally the publish command copies files into the appropriate S3 bucket.

The Bash code itself passes through shellcheck to eliminate stupid mistakes and to enforce coding style. This is desperately needed to write sane Bash programs.

Deploying

Deploying has two discrete steps each with verification criteria. It shakes out like so:

  1. make dist to generate the release artifact
  2. bin/blog deploy to deploy infrastructure changes
  3. Poll the bin/blog status until the state is green
  4. bin/blog publish to copy the release artifacts into S3
  5. Poll the public DNS until the sentinel file is available.

There is single script (script/ci/deploy) to get the job done. The coolest bit is a simple Bash function that will execute a function N times at T time interval. This is a simple timeout style function. It is used to handle the asynchronicity of each step. The deploy script can vary the interval depending on how long a change should take. This is more important for CloudFormation changes since some components update much more slowly than others. Route53 compared to CloudFront is one example.

The Complete Pipeline

  1. Setup
    1. make check
    2. make clean
  2. Build
    1. make dist
  3. Test
    1. make test-shellcheck - Lint all executable shell programs in the code base (bin/blog + pipeline scripts)
    2. make test-dist - Run release artifact tests mentioned earlier
    3. make test-blog - Validate CloudFormation template
  4. Deploy
    1. Poll for UPDATE_COMPLETE or CREATE_COMPLETE stack status. This ensures the stack is ready to recieve a potential update.
    2. bin/blog deploy - Deploy infrastructure changes
    3. Poll for UPDATE_COMPLETE or CREATE_COMPLETE stack status
    4. bin/blog publish - Upload release artifact to S3
    5. Poll with curl for the sentinel file on bin/blog url

Closing Thoughts

The entire pipeline turned out well. This was a great exercise in setting up continuous delivery for a simple system. The practices applied here can be applied to to large systems. Here some other take-aways:

  • GitLab CI is awesome. I have been using [buildkite][] at work for sometime. Gitlab CI attracted to me with it’s agent based approach. This enables me to keep my runners under configuration management and deployed with proper AWS InstanceProfiles. GitLab with integrated CI support is immediately better than GitHub. All in all I’m very happy with GitLab and it’s CI offering. I recommend you check it out as well.
  • CloudFormation testing. It would be nice if a set of changes could be applied in a “dry run” mode. This would increase confidence on each change.
  • Splitting the deploy script. I am uncertain if I would split the deploy script into two parts: one bin/blog deploy and verification, the other for bin/blog publish and verification. I did not do this because I did not want to move the shared poll/timeout function into a separate script. The script in its current form is about as long as I want it.
  • Regenerating the release artifact in the deploy phase. Generally this is bad pratice. The test phase runs against a particular artifact, that artifact should be deployed. This project is simple enough that this is not a problem. The build phase should upload the artifact somewhere (GitLab does seem to support artifacts) then the next steps should pull it down for whatever needs to happen. I also skipped this because I like to keep the scripts executable on the machine itslef. This way if the CI system is down or for other reasons the process can still complete.
  • make check. This is a life saver. make check is a smoke test of the system running the pipeline. It does not need to be exacting but simply testing for availablity of depenencies (e.g. docker, aws, or jq). This is especially helpful when build steps execute on various hosts and/or the project relies on things outside of GNU/BSD core utils.
  • Sharing Bash functions. I know the poll function will be reused across many projects going forward. It would be nice to solve this without copying and pasting between projects. I considered if such functions could be distributed between environments with configuration management but that is much too heavy weight for this problem. Larger teams may definitely encounter this problem if there is a lot of Bash.
  • aws s3 sync with --delete. bin/blog publish uses the sync command under the covers. --delete was not added until doing this work. This option ensures files not on in source are deleted from destination side, a.k.a. “delete removed files.”

Finally enjoy the relevant source files. The files are linked to the versions introduced in the continuous delivery merge request. They may have changed since this post was published.

  1. Curious why? Docker works extremely well for development and build piplines. It encapsulates project tool chains excellently. This approach especially excels with datastores. I have found these vary more widely than tool chains. I try to run as many dependencies as possible as docker containers given there is not sufficient complexity. I do not dockerize jq or aws since they are installable on the host easy enough and are not project specific.

  2. It is common to see -v "${PWD}:/data when encapsulating tools as docker containers. This is the easiest and first thought of solution to get data in/out of the container. This creates a problem though since docker containers run as root. This approach may litter the filesystem with root owner artifacts depending on your docker setup (e.g. the docker daemon runs directly on your host or the docker daemon is running in a VM). This is solved by running the container as the current user (-u $(id -u)). However file system mounts do not work on remote docker hosts. docker-machine on OSX solves this by mounting $HOME as a shared directory in the VM so file system mounts (inside $HOME) work transparently. docker cp is a sure fire way to get data out of the container regardless of how docker runs. It is more verbose but always works. See make dist in the Makefile for an example.

  3. The validate-template API call is only semantic verification. CloudFormation cannot validate things that may only come up when actually doing the change. These are things such as such as account limits, incorrectly configured permissions, creating things in wrong regions, potential outages in AWS, or unexpected capacity changes. The only way to know for sure is to deploy the stack and see how it shakes out. Naturally you can have a stack for verification purposes. I opt-ed out because the template is simple and should not change much past the initial revision.