Questions and Discussions related to CI/CD, release engineering, configuration management

C is for consistency

Let's take a look at Nomad CLI:

$ nomad --help Available commands are: agent Runs a Nomad agent agent-info Display status information about the local agent alloc-status Display allocation status information and metadata client-config View or modify client configuration details deployment Interact with deployments eval-status Display evaluation status and placement failure reasons fs Inspect the contents of an allocation directory init Create an example job file inspect Inspect a submitted job job Interact with jobs keygen Generates a new encryption key keyring Manages gossip layer encryption keys logs Streams the logs of a task. node-drain Toggle drain mode on a given node node-status Display status information about nodes operator Provides cluster-level tools for Nomad operators plan Dry-run a job update to determine its effects run Run a new job or update an existing job server-force-leave Force a server into the 'left' state server-join Join server nodes together server-members Display a list of known servers and their status status Display the status output for a resource stop Stop a running job validate Checks if a given job specification is valid version Prints the Nomad version

Now you are probably not familiar with Nomad concepts, so let's run through them:

There is a job, when job is submitted it creates a deployment. Whenever you update the job configuration and submit it to cluster, it starts new deployment.

Deployment creates a group of allocations - allocation is an instance of the app you are deploying. The deployment may be a group of N allocations.

And evaluation is the process of allocating apps for a certain deployment.

Now, back to the interface.

Why there is an nomad alloc-status and not nomad alloc status? nomad node-drain and nomad node-status and not nomad node [status,drain,..]? Why nomad server-members and not nomad server status? plan and run are for jobs, why they are on the top level? and fs and logs are for allocations

And now compare with the kubectl's:

kubectl [command] [TYPE] [NAME] [flags]

read more
It is time for refactoring

Refactoring is a process that does not bring any value to the product.

You should consider doing refactoring when:

product is in active development state adding new feature require more time as planned you can spend extra time for refactoring

If you have checked all three points you can begin refactoring.

At first do following things:

write how product should be designed create a criteria when you new code is good enough make an estimation how much time it would take

If you can determine time that would be spent on refactoring or this time is enormously big, then you better to write a new product.

read more
Visualizing Kubernetes application deployments

OpenShift team created a nice way to describe application deployments

The only question is - how to make this autogenerated from the set of YAML files.

read more
Orthodox docker and kubernetes intro for java devs

Docker and Kubernetes Recipes for Java Developers by Arun Gupta

read more

Articles and HowTo's

Kubernetes demo

In this demo-like tutorial I am not going to explain how Kubernetes works. Instead I will show how you work with Kubernetes.

For the overview of the Kubernetes key concepts, try this talk

Note: All shell commands prefixed by $ are executed locally on a dev machine without admin rights.


Things you need to setup to run through this tutorial:

Kubernetes cluster kubectl command line tool Docker registry Kubernetes cluster

If you do not have a cluster available, download minikube utility and initiate the local cluster via:

$ ./minikube start

It will fetch the virtual machine image with preconfigured one-node Kubernetes cluster and run it on your local machine.


You need to install kubectl on your local machine and configure it to work with the Kubernetes cluster you got in the prevous step.

For example on Fedora:

$ sudo dnf install kubernetes-client

kubectl reads its configuration from ~/.kube/config file.

Minikube generates kubectl configuration file automatically. It should look similar to the following:

apiVersion: v1 kind: Config preferences: {} clusters: - cluster: certificate-authority: /home/bookwar/.minikube/ca.crt server: name: minikube users: - name: minikube user: client-certificate: /home/bookwar/.minikube/apiserver.crt client-key: /home/bookwar/.minikube/apiserver.key contexts: - context: cluster: minikube user: minikube name: minikube current-context: minikube

If you use a remote cluster, you need to create or update this file manually with relevant credentials

Verify the configuration by running

$ kubectl cluster-info Kubernetes master is running at Access to Docker registry

To work with containerized applications you need a registry of container images. In this tutorial we need to push images to the registry from dev machine and pull them from the cluster. While one can (and, I believe, should) setup private registry for this purpose, it is way out of scope for our simple tutorial.

Thus, in this tutorial we are going to use DockerHub, which is configured by default in minikube.

To be able to upload images to Docker Hub, sign up through its web interface. Then login to the registry from local machine by running:

$ docker login Ready?

Here is what we are going to do:

write an application and test it, package it into a docker image, test it and publish to registry deploy the application to the Kubernetes cluster and test it, roll out an update and, you get this by now, test it. Step 1: Application

Proper containerized applications should work transparently and should never depend on a particular container instance. Thus you would never rely on the local IP address or hostname of a container in real life. But for the purpose of this demo we will use application which exposes the container internals.

We create a very simple Python application which listens to the port 5000 and replies with the list of host ip addresses.


Create file ./ with a Flask application :

from flask import Flask import subprocess app = Flask(__name__) @app.route('/') def hello(): ip_a = subprocess.check_output([ "hostname", "--all-ip-addresses" ]).split() return "IP information:" + " ".join(ip_a) + "\n" if __name__ == '__main__':,host='') Build

Oh, come on, it is Python.


Run it:

$ python * Running on (Press CTRL+C to quit) * Restarting with stat * Debugger is active! * Debugger pin code: 190-291-556

Access http://localhost:5000 to see it working:

$ wget -qO- http://localhost:5000 IP information: Step 2: Container Image Create ./Dockerfile

We start from fedora:latest image, install runtime dependencies, then copy our application from the host
into container, and then define container endpoint to run the command python on start.

FROM fedora:latest MAINTAINER bookwar "" RUN dnf install -y python-flask hostname && dnf clean all COPY ./ /app/ WORKDIR /app ENTRYPOINT ["python"] CMD [""] Build and tag the image:

Let's use name local/my-ip and version 0.0.1:

$ docker build -t local/my-ip:0.0.1 . Test image

Run container locally

$ docker run -p 8888:5000 local/my-ip:0.0.1

Note that while application uses port 5000 inside the container, we link it to port 8888 on a host network.

Thus now we can access http://localhost:8888 and see the report with internal IP address of the container:

$ wget -qO- http://localhost:8888 IP information: Push container to the registry.

Tag it as bookwar/my-ip (here bookwar is my user at DockerHub, use yours)

$ docker tag local/my-ip:0.0.1 bookwar/my-ip:0.0.1

Push to the registry

$ docker push bookwar/my-ip:0.0.1

Now image is available at DockerHUb and anyone can use it via the bookwar/my-ip:0.0.1 name.

Step 3: Deployment Create a deployment object

We call it my-ip and set replica counter to 5 pods.

$ kubectl run --image=bookwar/my-ip:0.0.1 --replicas=5 my-ip Check that 5 pods were created: $ kubectl get pods NAME READY STATUS RESTARTS AGE my-ip-3794442940-1m52n 1/1 Running 0 3m my-ip-3794442940-2642d 1/1 Running 0 3m my-ip-3794442940-61lqf 1/1 Running 0 3m my-ip-3794442940-97nx1 1/1 Running 0 3m my-ip-3794442940-fdpvv 1/1 Running 0 3m

Pods are listening on the local network inside the cluster and are not accessible from the outside.

Create an exposed service: $ kubectl expose deployment my-ip --type=NodePort --name=my-ip-service --port 5000 Find out the service node port

Now there is a service which redirects every request to it to port 5000 of a pod in the my-ip deployment group. This
service has type NodePort, which means that it is exposed as a port on every cluster node. To find our the exact value
of a NodePort, we can check the service details via describe subcommand.

$ kubectl describe service my-ip-service Name: my-ip-service Namespace: default ... NodePort: <unset> 30346/TCP ...

Note the 30346 NodePort assigned to the service.

Check the service

Now it is easy to reach service from the outside via NodePort by accessing the

$ wget -qO- IP information:

Here we use the same IP which we got from running kubectl cluster-info command.

Let us also test it with the debugging pod

Run the pod:

$ kubectl run -i --tty busybox --image=busybox --rm --restart=Never

Using shell prompt inside the pod call the service via command line several times:

# / wget -qO- my-ip-service.default:5000 IP information: # / wget -qO- my-ip-service.default:5000 IP information:

Note that we are using the DNS name and internal port 5000 as we work with internal cluster network.

You also get different IP addresses in response, as requests get balanced to different pods behind the service.

Step 4: Rolling out an update

Deployment objects in Kubernetes come with the update strategy. By default, it is set to RollingUpdate.

$ kubectl describe deployment my-ip Name: my-ip ... StrategyType: RollingUpdate RollingUpdateStrategy: 1 max unavailable, 1 max surge ...

Let's update the base container image for our my-ip pods.

Edit the

We add the "Hello world!" string:

from flask import Flask import subprocess app = Flask(__name__) @app.route('/') def hello(): ip_a = subprocess.check_output([ "hostname", "--all-ip-addresses" ]).split() return "Hello, world! Here is my IP information: " + " ".join(ip_a) + "\n" if __name__ == '__main__':,host='') Build, tag and push new 0.0.2 version of the image $ docker build -t local/my-ip:0.0.2 . $ docker tag bookwar/my-ip:0.0.2 local/my-ip:0.0.2 $ docker push bookwar/my-ip:0.0.2 Bump version of an image used in deployment object

Image version is stored in the configuration of our deployment object. There are several ways to change it, let's
use the intercative edit of a deployment:

$ kubectl edit deployment my-ip

Running the command will open the editor with a yaml configuration of a my-ip deployment. Find the container spec
block and edit the image version:

spec: containers: - image: bookwar/my-ip:2.0.0

Save and exit. The rollout procedure will start immediately.

Verify the update

Call the service via external port again:

$ wget -qO- Hello, world! My IP information:

New version is rolled out and we've got a different string.

read more
@channel and @here considered harmful

Let's talk about Slack best practices. Or should I say worst practices?

In particular about @channel and @here

For those who don't know: the @channel in Slack alerts everyone on the chat and @here alerts everyone on the chat who is online. Both commands are widely used, but both are, in fact, harmful.

The Problem

There is a team A-Team and a channel #a-team-channel. The audience of the channel is: 10 members from the A-Team itself, and 500 members of other teams who came to the channel to discuss the issue with the A-Team or just to browse what is going on there.

Now you come to the channel and want to get the attention of one of the A-Team members.

If you use @channel keyword you alert 510 people at once distracting them from what they are doing. If you use @here - you alert about 300 people, and maybe no one from the A-Team itself. And this is extremely counter-productive and leads to people disabling Slack notifications altogether.

What are the alternatives?

Slack supports two more alerting mechanisms: keywords and groups.

User Groups

Groups provide the more formal way of managing groups casts.

Pros: You can create a group and add people to it. Group members themselves don't need to do anything on their side.

Cons: Group management requires admin rights, it is not flexible, and can not be self-managed. If you change your current role from duty engineer to the research engineer for a week, you can not simply leave the 'on-duty' group and must contact Slack admin to do that for you.


With keywords you can configure per-user alerts and set the notification if someone mentioned a specific word in chat. For example I have an alert set whenever someone mentions the word 'jenkins'.

But keywords can also be used in a more organized way. If A-Team chooses its personal keyword, like a-team, and every team member subscribes to it, then this keyword can be used instead of 'channel' and 'here' casts. It is going to be more direct and straight to those people you actually need.

Pros: Fully flexible. Team members decide and manage which keywords they care about.
Cons: Requires self-discipline. If you are the duty engineer, you must go and subscribe to "on-duty" keyword to get alerted.

Critical remark

Note that the crucial step here is to promote the keyword or user group to the people outside of the team. It needs to be discoverable the same way as e-mail address or Jira project. Add it to any landing page your team owns.

We also found it most effective to announce important keywords in the channel topic. Simply add line “To contact A-Team use keyword a-team” to the #a-team-channel topic and after some time people will learn to use it.


The groups/keywords approach reduces fragmentation, as one don't need to create separate channels with smaller number of participants to limit the alerting power.

It increases the cross-team presence, as you can join more channels in a browsing mode, without increasing your alerts stream. You can watch and recap on what has happened in the A-Team channel during the day, even if you are not the direct responsible person and don't get alerted.

It improves the overall usage of Slack as alerts become more specific and go directly to the people you want to target.

Example 1: Support Channel

Suppose we have a Datacenter Engineers and Office IT Engineers teams. Each team has its own responsibilities but they also share a lot of common knowledge.

The Classical Way says we create a #dc-team and #it-team channels. The typical dialog then looks as follows:

The Support Struggle Act 1. In #dc-team user: @here Server ABC is not responding, please check!! # Alert includes ~50 server users dc-eng: logs? user: .. dc-eng: ask at #it-team Act 2. In #it-team user: @here Can not reach server ABC, please check!! # Alert includes ~100 office users user: dc-team send me here it-eng: logs? user: .. same logs again.. it-eng: some more logs? user: ... it-eng: ask at #dc-team Act 3. In #dc-team again angry user: @here @channel PLEASE HELP!!! angry user: it-team sends me back here, because they said..(and here follows the complete misinterpretation of what it-team has actually said) ...

Here user is bounced between channels and forced to repeat the entire context of the discussion. And I’ll leave up to you to calculate the amount of people alerted in these conversation.

Now the Keyword Way says we should have just one channel, which is the #support channel. DC Engineers respond to the dc-team keyword, and Office IT Engineers to it-team.

The above dialog would look as follows:

The better way In #support user: dc-team, server ABC is not responding, please check!! # Alert ~5 people dc-eng: logs? user: .. dc-eng: it-team, ^^ # Alert ~7 people it-eng: some more logs? user: ... it-eng: @dc-eng ^^ # Alert 1 person <here the it-eng and dc-eng engineers start to dig in together>

Here two teams can be cast into conversation independently but they share the context of the discussion. They also can talk to each other directly, which eliminates the misinterpretation problem.

Example 2: Development Channel

Suppose there is an ABC project with Dev, QA and Ops teams working on it.

The common pattern here is to create #abc-dev, #abc-qa and #abc-ops channels. Issues which come from such a division are well known and I won’t even bother you with the play to show it.

The better way

Create one #abc channel. And add abc-qa, abc-dev and abc-ops keywords, which would cover different aspects of the project. When project fails due to network issues in the datacenter, alert the abc-ops, when nightly tests start failing — use abc-qa, when there is a new feature planned — discuss it with abc-dev, but keep it all in the same channel.

Even when you haven’t gone full DevOps yet, you can add a lot of transparency by reorganizing your communication channels.


When using keywords and groups in Slack you do not need to align channels to team structure. The team structure is now covered by user groups or keywords, while channel can be aligned to the content. It gives you the flexibility, reduces fragmentation and increase overall cross-team collaboration based on projects and topics rather than formal structure.

read more
Example of Git Workflow for BitBucket
Branching strategy Master branch

There is a master branch which holds current state of the project. master branch must provide a working state of the code all the time.

If master is broken - development and merges to master branch are blocked until situation is resolved.

To keep master in a working state, direct pushes to master branch are disabled. Every change to master branch must come from a pull-requests which passes tests and gets approval from code reviewers.

Feature branch

Every change is developed in a dedicated feature branch feature/<TASK-ID>-some-meaningful-description.

---- A ---- B ---- C ---- D ---- master \ \ \ E ---- F ---- feature/TASK-123-add-gradle-scripts

Feature branch is merged to master via pull-request.

Release branch

Release branch is created manually from a certain "good enough" point on a master branch and must be named release/<version>.

Release candidate is built from release branch.

Direct push to release branch is forbidden. Changes to release branch come via pull-requests from bugfix-branches.

Once release is deployed to production, release branch needs to be merged back to master.

Bugfix branch

bugfix-branches for release/smth are the same as feature-branches for master.

bugfix branch created from branch release/X.Y.Z can only be merged to the same release/X.Y.Z branch.

Never merge it to master, feature-branch or any other release branch.

Tips and Tricks Always create feature branch from master

Never create a branch from another feature branch.


---- A ---- B ---- C ---- D ---- master \ \ \ E ---- F ---- feature/TASK-123-add-gradle-scripts \ \ \ G ---- feature/TASK-345-add-functionality-XYZ


G ---- feature/TASK-345-add-functionality-XYZ / / / ---- A ---- B ---- C ---- D ---- master \ \ \ E ---- F ---- feature/TASK-123-add-gradle-scripts

The main goal of this rule is that we need to avoid merges and keep history as linear as possible. With merges you can no longer see the linear history of the changes, and can not navigate with them easily.

There might be the case when you have part one of a feature implemented in a branch and it is ready to merge as it is. But while it is still on review, you want to keep working on this new codebase.

In the ideal world you should wait till the feature branch is merged. The idea is that you start additional improvements and refactoring of the code only when it is already accepted to the mainline and you can be sure that there will be no new changes. While code is on review, it might be that you will need to change it, which then will cause rewriting of all the new code you have written so far.

If it is impossible (the review is pending but you need to keep working on the feature), the other option would be:

Step 1. Create new branch for next part of the feature ---- A ---- B ---- master \ \ \ E ---- F feature/TASK-123-part-1 \ \ \ G ---- feature/TASK-123-part-2 Step 2. Create pull-request for feature/TASK-123-part-1 (the F commit) and keep working in the branch feature/TASK-123-part-2 (commit G) Step 3. Once feature/TASK-123-part-1 is accepted, rebase feature/TASK-123-part-2 on master ---- A ---- B ---(merge-commit)- C ---- D ---- master \ / \ \ / \ \ / \ E ---- F G ---- feature/TASK-123-part-2 Step 4. Keep working on feature/TASK-123-part-2 as an independent feature branch.

The third step is very important as it will eliminate the complexity in the merge of feature/TASK-123-part-2 to master later on.

Use small independent commits

If you can split the task into series of independent commits, create independent feature-branches for each of them.

The smaller your feature branch is - the easier it is for review and testing. Ideally, every feature branch should contain just one atomic commit. And it should be merged to master as soon as commit is ready and passed test and review.


---- A -- B -- C -- merge ---- D ---- master \ / \ / \ / E ------ F feature/TASK-123-add-gradle-scripts-and-clean-env.yaml

But better:

F feature/TASK-123-clean-environment.yml / \ / \ / \ ---- A - B - merge -- C -- merge -- D -- master \ / \ / \ / E feature/TASK-123-add-gradle-scripts

Do not wait for the end of the sprint or for full feature implementation to merge the working code.

Never merge to feature-branch, rebase

Merges bring complexity and increase amount of work required to track changes and manage branches. Avoid them and use rebase instead.

Rebase reapplies your changes in the same order you did them on top of the master branch, thus it keeps history straightforward.


---- A ---- B ---- C ---- D ---- master \ \ \ E ---- F ---- feature/TASK-123-add-gradle-scripts $ git checkout feature/TASK-123-add-gradle-scripts $ git rebase master ---- A ---- B ---- C ---- D ---- master \ \ \ E'---- F'---- feature/TASK-123-add-gradle-scripts Rebase often

The smaller the footprint of your change is, the easier it is to handle. The earlier you spot the merge conflict, the easier it is to resolve.

Rebase your branch onto master at least once per day.

Use push --force

git push --force is a dangerous command as it rewrites history of the branch.

It is strictly forbidden to do push --force for master and release branches, as these branches are used for collaboration, and their history is critically important and must be kept consistent.

But for feature branches git push --force is recommended. It is, in fact, required for rebase to work.

Feature branch is independent and short-living branch and owned by one developer. So one can alter its history without affecting anyone else's work. More to that, it is critically important to keep history of feature branch clean and readable, as it is targeted for review.

Thus, while you are working on a yet unmerged feature branch, use interactive rebase, squashing and amending technics to clean the history, modify comments and adjust their order. And then use git push --force for this branch to publish it to Bitbucket.

If you create pull-request from a feature branch, and then change and update the branch with push --force, Bitbucket will automatically update the pull-request for you.

read more

Continuous Integration theory

CI System vs CI Pipeline

For any code you write, there are several steps needed to transform it from a set of text files to a certain release artifact or a running service. You go through these steps manually at first, but sooner or later you decide to automate them. And this is how the CI/CD pipeline of a project is born.

But there are different ways how you can organize the automation.

The classical approach (CI System) is formed around a standalone CI system (for ex. Jenkins, Buildbot, Bamboo..). In the CI system you configure scenarios (jobs, build factories, plans..) and triggers. You also can add dependencies, manual triggers, parametrization, post-build hooks, dashboards and much more. CI system usually becomes the world of its own, with limited number of almighty administrators, managing multiple interdependent projects at once.

There is also a new, "postmodern" way of setting up a CI, which is essentially inspired by the Travis CI and its integration into GitHub. And if we would follow the common trend, we'd call it System-less CI Pipeline.

In this approach the recipes, triggers and actions of the CI/CD pipeline are stored in a configuration file together with the project code. The CI system (yes, there is one) is hidden somewhere behind your repository frontend. You do not interact with the CI system directly, rather see it responding to your actions on the codebase: pull-requests, merges, review comments, tags - every action can trigger a certain recipe, configured inside the project. Feedback from the CI system is also provided via the repository frontend in a form of comments, tags and labels.

To highlight the difference between these setups, let us consider multiple projects (see picture).
ci system vs pipeline

As soon as the code enters the CI system, it is no longer owned by the project. It becomes the part of a larger infrastructure, and follows the lifecycle defined on the system level. This allows complex dependency graphs, common integration gates and cross-project communication.

In the pipeline case, the pipeline carries the project code through the set of steps defined on the project level. Each project gets an isolated, independent lifecycle, which doesn't generally interact with the outside world.

And which one is better

You probably can guess my answer by now: yes, you need both.

But let's dive into a common discussion point.

CI system silo

Silo is one of the scariest words in the modern IT environment, and also one of the most deadly weapons. And the argument goes as follows:

"Standalone CI system creates a silo, effectively cutting out the dev team from the project lifecycle. Pipeline is better because it lets developers to keep the full control of the project"

As a Continuous Quantum Integration Engineer I would love to dismiss this argument altogether as in the end it compares apples to oranges, but it does have a point. Or, better to say, I can see where it comes from:

Widely known and widely adopted classical CI systems, like Jenkins, were not designed for collaboration and therefore are extremely bad it. There are no "native" tools for code review, tests, approvals and even readable configuration. Projects like Jenkins Job Builder, which address some of these issues, are still considered to be alien for the CI ecosystem. High entry barrier, outdated and unusable visualizations, lack of code reuse, no naming policies and common practices.. together with generic code-centric development culture (in which CI is not worth any effort) this leads to complex, legacy CI systems. And none of us wants to be there.

Thus, from a developer point of view the choice looks as follows: either you work with some unpredictable unknown third-party system or you bring every piece of the infrastructure into the project and manage it yourself.

dev point of view

Now, given that I admit the problem, I've got some good news and some bad news:

Bad news are that moving to CI Pipeline doesn't make you immune to the silo issue, it encourages you to create one. In fact, the pipeline is a silo by the very definition of it:

An information silo, or a group of such silos, is an insular management system in which one information system or subsystem is incapable of reciprocal operation with others that are, or should be, related. - Wikipedia

As a project developer you might have a feeling that pipeline improves the situation, while the only thing which has improved is your position with respect to the silo wall: you are inside now.
Pipelines Silo

Good news are: there is another way!

To solve the collaboration problem you don't give every collaborator a personal isolated playground, you give collaborators tools and processes to work on the shared environment. And it is exactly what good CI System is supposed to be.


It is an integration system, after all.

read more
Docker Registry Infrastructure

Once you get exposed to the Cloud, DevOps and Everything As A Service world, you realize that almost every piece of software you might need has already been written by someone (or provided as a service). Why is not it all rainbows and unicorns then?

Because you do not need a tool, a software or a service. You need an infrastructure.

Start with the right problem

The most common issue in dealing with the infrastructure setup is that you keep forgetting about the task.

Of course it was hanging somewhere in the beginning, when you gathered the initial list of tools, which might be worth looking at. But then, once you get your hands on the list, you ask yourself:

Which one is better - DockerHub, Quay, Artifactory or Nexus?

And you are doomed.

As soon as you ask that question, you start to look into feature lists and marketing materials for the mentioned products. And then you build your entire choice strategy on figuring out how to get more by paying less.

The truth is: you do not need more. Usually you need just right amount of stuff to do the right task. And you can not replace one feature with five others you have got on a summer sale.

So let's do our homework first and get back to the task.

The task

Again, we start with the question:

We need a Docker Registry, how do we get one?

And again, we are wrong.

No one needs a Docker Registry for the sake of having a Docker Registry alone. It is an implementation detail we have slipped in our problem statement. So let's remove it and find out what we are truly looking for:

the infrastructure to build, store and distribute Docker images

Now this is something to talk about.


Docker images are widely known as a delivery mechanism. And every delivery can work in two ways: it might be you who packages and delivers the software to the customer (or target environment), or it might be some third-party which packages the software and delivers it to you.

In the latter case you might get out just fine without the need to build a single Docker image on your own. So you can skip this part and focus on storing and distributing. Even better, you may forget about Docker images. Treat them as any external binary artifact: download the tarball, install, backup and watch for the updates.

If building Docker images is your thing, then welcome to the club and keep reading.


Let's say you use Docker images for packaging the apps to deploy them to production environment. You have Dockerfiles stored in your source code and CI/CD pipelines set to build them in your build service. And your release schedule is close to one release per hour.

With that kind of setup your Docker infrastructure is, in fact, stateless. If by some reason you lose all the Docker images currently available, you can rebuild them one by one directly from sources. Then you do not really need the storage function and can focus on building and distributing.

If, on the other hand, you need a long-term storage for Docker images, for example those which you have released and delivered to your customers, things get slightly more complicated.

In real life you are most likely to need both: there are base images you update once per year, there are images which you can not rebuild at all (there shouldn't be, but there usually are) and there are images which you build daily in hundreds.

And it totally makes sense to manage them differently. Maybe with different tools?


In the end it all comes to the distributing of images to the target environments. And target environment could be a developer workstation, CI system, production or customer environment. And these environments are usually distributed all over the globe.

Now Docker registry should connect to all three by the closest distance possible. Can you find a place for it? I guess not. And here is where caching, mirroring and proxying are brought into play.

With all that in mind, the natural question to ask is: how many registries you actually need? It appears, more than one.

Build, store and distribute, shaken, not stirred

Let set up the context and consider the following situation: we develop an application, package it into container images, and then deploy them to several remote production environments (for example eu and us).

Registries and their users

Users working within the infrastructure can be divided into three groups: contributors (Dev), build and test systems (CI), and the customer or consumer (Prod).

Dev users create and share random containers all day long. They do not have rules and restrictions. Dev containers should never go anywhere but the workstations and temporary custom environments. These containers are generally disposable and do not need a long-term storage.

CI users are automated and reliable, thus they can follow strict rules on naming, tagging and metadata. Similarly to Dev, CI users generate and exchange huge amounts of disposable images. But as CI provides the data to make decisions (be it decision to approve a pull-request or decision to release the product), CI images must be verifiable and authoritative.

Prod users do not create containers, they work in a read-only mode and must have a strict policy on what is provided to them. The prod traffic is much lower in terms of variety of images consumed, while it might be high in terms of the amount of data fetched.

We could have tried to implement these use cases in one Docker registry: set up the naming rules, configure user and group permissions, enforce strict policy to add new repository, enable per-repository cleanup strategies.. But unless you invest huge amount of time in setting up the workflows, it is going to be complex and error-prone. It is also going to be fragile and hard to maintain. Simply imagine the update of a service which blocks your entire infrastructure.

The other way to do it is to setup several registries with separate scopes, concerns and SLA's.

Dev users get the Sandbox registry: no rules, free access, easy to setup, easy to clean, easy to redeploy.

CI users get the CI Pool registry: high traffic, readable by everyone, writable by CI only. It should be located close to the CI workers as it is going to be heavily used by CI builds and test runs.

There should be also Prod registry: only CI users can publish to it via promotion pipeline. This is the registry which you need to "export" to remote locations. It is also the registry you probably want to backup.

Depending on your workflows, you might also want to split the CI Pool registry into CI Tests and CI Snapshots. CI Tests would be used for images you build in pre-merge checks and in the test runs, while CI Snaphots are going to be the latest snapshot builds for each of the applications you develop, working or not.

In the following example of the infrastructure layout we've added also the remote Archive registry. Its primary purpose is to be a replica of certain important DockerHub images. It allows you to be flexible in keeping and overriding upstream images:

0_1501591850558_docker registry infrastructure.png


Finally, we come to that tool question again. But now we are equipped with the knowledge of how we want to use them.

We have high network throughput requirement for CI registries, thus we probably don't want to use a managed service priced by network usage. For random images in the Sandbox we do not want to pay for the number of repositories we create (in Docker, image name represents a repository). For caching instances we'd like to have an easy to maintain and easy to test open-source solution, which we can automate and scale. For archive we may want an independent remote registry with low network requirements but reliable storage options.

We now can choose and build the hybrid solution which is tailored for our needs, rather then go for the largest and most feature-reach service available and then unavoidably get into building up the pile of workarounds and integration connectors around it.


It looks scary at first: you have just wanted a place to put the image and now you get multiple registries, scopes, network connections and interactions. But the truth is: we have not created the additional complexity, we've exposed the layout, which has been there in your infrastructure from day one. Instead of hiding the complexity under the hood, we are making it explicit and visible. And by that we can design the infrastructure, which is effective and maintainable and solves the problem at hand, rather than uses those nice tools everyone is talking about.

read more
Do not blame the CI

In the previous article I explained the underlying idea behind the Continuous Integration concept. Now let's get a bit more practical and talk about how this idea appears in software development.

CI starts with the code

CI is a development practice. This is the official meaning of the term and you could find it all over the hardcore Software Architecture and DevOps conferences, in books and high-level discussions. Unfortunately it is rarely known as such among real world programmers, even though they make up the target audience of it.

Therefore, when introducing CI, the hardest obstacle you see is rarely technical (in the end any tech problem can be solved with the right amount of Python scripting). The true complexity comes from the understanding that CI is not something for infrastructure team to automate, it is something for development team to follow.

Let me elaborate.

CI needs good code structure

The possibility to merge changes into the mainline at least once per day does not come for granted. If every change you do in the codebase has a footprint of 100500 files, the only thing you get at the end of the day is the pile of merge conflicts.

Tight coupling, huge files, mixed concerns and responsibilities.. everyone knows these are signs of a bad code quality. CI exposes them and makes them harder to avoid.

If you find that you bump into someone else's conflicting changes all the time, do not blame the CI, consider it to be a good reason to change the code structure.

CI needs good discipline

CI practice assumes that you keep the mainline in a releasable state. Every time there is a concern about the current state of the codebase, one should stop doing anything else and deal with the issue.

No matter how it is important for you to merge your particular change, it must be delayed.

More to that, do not try to fix the issue in place adding more untested code on top of the broken state. Revert to the known working state and unblock others.

If master is failing, do not blame the CI, revert the change to previous known state and deal with the commit which introduces the regression separately.

CI needs good communication

CI practice needs you to be able to work independently. One part of it is to reduce the footprint of your changes, the other is to make sure that you do not overlap with someone else's work .

You don't want to spent time writing that helper function just to find out that your colleague has just merged it into master.

So when you find yourself implementing new feature on top of the functionality, which has been just removed by someone else's refactoring work, don't blame CI, but rather introduce the code and design review process and improve your communication channels.

CI needs good management

CI practice requires you to merge unfinished functionality into the code.

It saves the time spent on integration efforts as integration is done by small steps in a controlled way. It helps to track the current progress of a feature implementation, as there is no unpredictable integration explosion waiting in the very end. It also gives you the flexibility to release a product on time even if planned features were delayed or canceled.

The downside is that every unfinished feature introduces a technical debt. And once you've decided to cancel development for a certain subproject, you can not simply get out and move on to the new one. There is a cleanup you must perform.

If your code is full of deprecated feature toggles, do not blame the CI, but create the policy to track feature implementations and find time for regular cleanups.

CI is not magic

CI can not magically make your life easier, your development faster, your code better, your tests greener and your salary higher. It provides you with tools to do that.

So, generally, just stop blaming the CI :)

read more
What is CI?

The Ultimate Source of Truth, Wikipedia, defines continuous integration as the practice of merging all developer working copies to a shared mainline several times a day.

My version goes a bit deeper:

Definition: Continuous Integration (CI) is a practice of reaching the goal by doing small changes one at a time, while keeping the main artifact in a releasable state at all times.

Why do I need a different definition? Let's take a closer look into it.

First of all, by this definition CI is not limited to the area of the programming or software development. Indeed, one can and should consider CI practice applied to any kind of production workflow:

Production starts with an artifact. It can be a software application, but it may as well be a book, or picture, or building. Artifact has a certain current state, for example, the concept of a book written on a napkin. And then there is a goal - the target state of an artifact we plan to reach (i.e. 450 pages in a hard cover).

Production workflow is the process of modifying the artifact, carrying it through the sequence of intermediate states to the predefined goal.

To add continuous integration to the picture we need a certain notion of quality: the way to differentiate the releasable state of an artifact from the unreleasable one. For a book we might say, for example, that book is in a releasable state if all its chapters are complete. The continuous integration of a book then would be printing the new chapter as soon as it is ready.

Remark: As you may see the nature of continuous integration is actually quantum. As it follows from the definition, continuous integration is performed by applying small atomic changes - "quants". And all continuous integration workflows work with discrete chunks of data, rather than continuous streams.

Another important note is that the above definition doesn't imply a specific implementation of the production workflow.

In the day to day conversations "doing CI" often means setting up a build service which runs certain tasks, build scripts and tests, triggered by certain events. But the CI concept itself does not require deployment automation, test coverage or Slack notifications of failed builds.

These technicalities are indeed useful and come naturally from applying the CI approach in software development, but these are helper methods, which should always be measured by and aligned to the generic idea.

In other words: while running automated tests is a good practice for implementing continuous integration development, continuous integration approach can not be reduced to just running automated tests.

The third thing I want to point out is that in the definition there is nothing said about some third-party, another actor, which you need to integrate with. There is a simple explanation for that: I believe that even when there is one and only acting entity in the workflow (one developer working on the codebase, one author writing the book..) there is still place for integration. Generally speaking, whenever you are involved in any kind of long-term process, you should treat "yourself today", "yourself yesterday" and "yourself tomorrow" as independently acting third-parties, which might work in parallel on different parts of the project and should exchange their work via documentation, code-review and comments just as the usual collaborators do.

read more

Everything offtopic


Q. What is this place

The idea is to have a disscussion board dedicated to the wide range of topics related to the Continuous Integration and more generic development workflows and management policies. To share experience, ask questions and build a vendor-independent community for CI, Build, Release and DevOps Engineers.

Many of those topics could be included in the term "Devops", but we think Continuous (Quantum) Integration suits us better. Conventional term devops is currently focused on "automating all the things". In our opinion CI is a wider topic which covers not just the automation tools but also the workflows and policies, for example, branching strategies, release requirements, team duties, some company-wide architecture decisions.. and much more.

Q. How to join

Just sign in and start writing. Register an account directly or use GitHub SSO.

Questions and discussions go to the Main track category, longer articles and how-to's to the Knowledge Base.

(Obviously Knowledge base articles can be discussed just as well)

Q. What are the terms

All articles are available under Creative Commons Attribution 4.0 International License (CC BY 4.0)

Q. What is the forum configuration

Virtual server with CentOS7,
Nginx as a web proxy,
NodeBB as forum engine,
Redis as a database

SSL is provided by Let'sEncrypt

read more

Looks like your connection to was lost, please wait while we try to reconnect.