In this episode Vipul, Rahul and Vishal discuss deployment lifecycle of Rails on Kubernetes including Rolling restart with zero downtime.

Key Points From This Episode

  • Image building and Image caching
  • Moving from docker hub to Jenkins
  • Reduction in build time after moving from docker hub to Jenkins
  • Tools involved in managing cluster
  • Kubernetes namespacing and labelling


[0:00:00.8] VIPUL: Hey everyone, welcome to our new episode at All Things DevOps. Today we have Rahul and Vishal again with us and we will be discussing about Kubernetes and how we have been using it at Big Binary to develop deployment tool on top of it so that we could use it for deploying real applications.

Hi Vishal, hi Rahul.

[0:00:18.7] RAHUL: Hey.

[0:00:19.5] VISHAL: Hi Vipul.

[0:00:20.7] VIPUL: Hi. So I guess like last time we discussed quite a few things about why we chose Kubernetes over Rancher as well as how it’s suited best for this tool that we are working on. Before we begin, would you like to give maybe Rahul, would you like to give a brief about how the deployment cycle is for various application?

How exactly – what exactly is this app and how it get – what exactly is this real app and how it gets built or deployed? What other things are involved in this?

[0:00:51.2] RAHUL: Yeah, sure, as we mentioned previously, there are four modules of this project and they are interlinked with each other but containerizing and deploying it on Kubernetes, first challenge was to couple them all together and setup for deployment lifecycle.

Obviously, we have our code base on Github and whenever we commit, we have to deploy that change in any of the deployment app, maybe it is a EC2 or Heroku with containers, we have to build an image and we have to deploy it on Kubernetes.

Yeah, first thing is like, build an image then push it to some Docker registry, maybe Docker hub, or self hosted Docker registry. Then trigger your Kubernetes deployment so that your Docker image is pulled and deploy it in rolling restart. Meaning that without zero downtime.

This is all the basics steps to what we have in our deployment lifecycle and important part where we had to work on was about image building, automation of image building and caching and also deploying all the apps and zero downtime. Taking care of somethings like configuration changes, how to do that in run time and how we can take care of all the interlinked services like databases and other things.

Yeah, I think we can talk more about how we have achieved image building and caching. Vishal, can you brief about like how we started on image building?

[0:02:27.0] VISHAL: Yes, in our application, we have a base image, but we have used ruby 2.2 image and our rails application is relying on ruby 2.2. We can start a base image like that. After that, we installed some aptitude packages in the Docker file and the initial of these, the initial step is to update the list of packages using update command and just to you know, be sure that we are caching things properly.

The standard where is to update command as well as update install command on the same length using – you can mention it as the update and person on person update install and the list of packages of the databases.

Because of this, the daily instruction so this line will be colorized instruction in Docker file and this instruction will be cached when you try to build another image from the same Docker file and the cached instruction will be used and start of, you know, doing the same thing over again when you build another image. After that…

[0:03:56.1] VIPUL: As you might have like just Plain old ruby or what else is this apart from ruby or installing rails does that might have?

[0:04:04.3] RAHUL: For that, we have split our application in kind of micro services architecture and we run services like unicorn for website of engine. We also have one of the web socket process, which is running on ruby’s Thin server and we have sidekiq as well.

We build different – we build the same image but instead, we are passing as an argument to container while running those images. Let’s say if we want to run only web image, we’ll just pass an argument, “Hey, this container needs to be started with a pod type or container web and it will start only unicorn process.” Same applies for sidekiq and other processes as well. We are using the same image for running all the four services and that is helping out studio the build time and deploy time. This is where we are achieving caching and image building.

There is good support from Docker Hub for image caching and whenever we are pushing a commit to build an image, it automatically uses the previous build and it took – uses it from cache and our image is built. While we were building an image, I think when Vishal used all the build and stuff, he used to stuck on like on image building while using Docker Hub. Because Docker Hub used to take around 10, 15 minutes to build our image and it did not support a parallel rails in our initial release plan.

We searched for some tools which could speed up our deployment process and I think this is where we led to use something like Jenkins. With Jenkins, really moved our image building process faster and that we started to run out of build in parallel and reduced the build time as well. Yeah, we Vishal, how easy was it from moving Docker building thing from Docker Hub to Jenkins?

[0:06:05.4] VISHAL: Yeah, initially we tried to build images on Docker Hub and due to that they are lot a virtual machine which has a minimum configuration, it takes so much time to build an image actually.

The same image, I was trying to build on my local macro machine, it was taking just around 5 to 10 minutes but on Docker Hub, it was taking around 20 to 30 minutes and that too was happening sequentially, which means if I submit more than two builds, there were actually – and in Docker Hub was picking up one by one.

Yeah, actually, we were discussing about like how caching was done. Docker has so many ways to reduce the build time and optimize the build size as well. There are some standard ways. In most of the cases what happens — so Docker, what the lines which we write in Docker file, those are called as instructions and in most cases, the instructions are, you know — I mean, the caching is performed by comparing just the instructions and that could help Docker to decide whether to use previously cached instruction to invalidate that cache and rebuild the same. For each instruction, cache actually – the child image will be produced and that will be considered as a leap of cache leap or that particular instruction.

That will be later on can be used by Docker as the cache product particular instruction. This happens but most of this happens but most of this is like there are instructions. One is to – one instruction can be from them we can specify the base image tag to promise one of the instruction and that is – let’s say expose them command.

These are kind of instructions which are, you know, simply compared or directly by Docker to check whether they can be – there is a cache layer available whether we use it or not. There are some minor instructions like [inaudible], which I mean, that there is some different approach that’s taken by Docker to check whether to use the cache layer generator to those instructions.

What happens, all the copy instructions are used to copy or add the tiles form the build context to the build that is being generated. So what happens is that Docker looks at the last modified and the last access, these kind of some meta data which Docker looks at it and something is changed and it actually generates check sum best kind of meta data and this meta data is modified somehow, which is build context for building the image, it will be used to compare and desire for their two use, you know, previously cache layer for that particular ad or instruction.

[0:09:49.9] VIPUL: How significant was like moving from say the Docker Hub to how you are exactly building it right now in Jenkins? How significant was the movement? What are the build time changes which you are getting on top of like doing it on something like Docker service. How did the dashing improve or how did the time improve?

[0:10:10.0] VISHAL: Okay. We talk about like a Docker wasn’t helping to reduce the build time because we already tested it on local machines even with the small configurations and we were able to build those images in less time as compared to Docker Hub and so there is another side released by Docker which is Docker flow.

Both of these services, we were not able to see the improvements and we also looked at some other options like grey and what not but talked about that, let’s try to build it on our own, or Jenkins server. I mean, we setup Jenkins and installed a few packages like Git to pull the source code from git hub.

Another package was cloud based Docker build and published plugin, we’ll mention the link to this again in the notes of this podcast. This cloud-base Docker build and publish plugin on Jenkins it is used to – it provides a UI to configure like from which Docker file. So actually we can specify the path to the Docker file and the directory that we have cloned using git plugin. We can specify the part and in the options provided by this cloud based Docker build and publish plugin, we can specify that.

Docker build command should be run and that Docker file along with some options like what should be built on the names and what not. Also, we can use this plugin to push the build image to any registry. We still use Docker platform, I mean, Docker Hub to push our build images and actually it happens in just a few seconds before Jenkins, which needs on AWS where there in no network problem actually.

The initial improvement we saw is that actually it was our own machine so we allotted high, I mean, we used machine which had high configurations or high resource, or I mean more resources actually. Because of that, we were able to run more builds in parallel actually. The best implement we saw is that we ca manually trigger the builds. As compared to Docker where it wasn’t allowed to modify us the Docker build command, which is actually run on their service. Here on Jenkins, we were able to modify the Docker build command as per our need. We can specify the environment where we say build arguments on Jenkins.

Another thing was, we were able to see like what is being cached and I mean, it was easy to debug like what is really happening and we were, you know, it was easy to fix it. There are any issues while building the image and another improvement was now, it was our own server and we had more resources as per our need, we can scale it up or down as per our need based on the usage of resources and based on the usage that we were able to see on that machine. Now, the build was taking just around four to five minutes and that was a big improvement for us.

[0:14:04.3] RAHUL: That reminds me of like on Docker we used it for whole 20 minutes and then other build is to queue. That was really a core optimization by Vishal and how we reduced the build time setting up our own Jenkins server optimizing the build time.

[0:14:22.7] VIPUL: Gold. For what all places are like what alll branches, was your deployment based on? Are you doing this for every phase of branch or are you doing it just for final deployment to changing and level up production? How was the building process?

[0:14:40.0] VISHAL: Yes. It was one of the problem that we faced with deploy hub. Earlier we were trying to build the images on Docker when we were not able to see the different parameters because there was some restrictions on modifying the Docker build command, which is actually a run on Docker Hub or Docker cloud.

On Jenkins, we used, in startup, we’re using simple build. We use parameterized build on Jenkins. I mean, the parameterized build is basically build triggered using an URL, which we can specify developers which are already known to Jenkins server and those parameters, I mean, we use the Jenkins parameterized to URL from outside. Actually, our use cases to trigger Jerkins build from our own internal tool from there we trigger the build.

We ask user, like prompt user, “From which branch do we want to build the image?” And usually this is or this app from this particular branch and for this particular environment like production and changing or something like that. I want to build the image and perform and deploy.

So all of these import is submitted to Jenkins as part of the parameterized build and that parameterized build then are Docker build command using our cloud based Docker build and publish plug in. That information is then used in our instructions to actually run some commands while building the image. Environment is one thing that we –

[0:16:41.0] VIPUL: So you are asking the user to actually specify which branch and then you’re deploying? You’re not really performing deployment like if a user come in general and building that branch.

[0:16:50.0] VISHAL: Yeah, I mean, that is not the use case part of best image is basically build all the time when a commit is, you know, pushed onto the master branch. But for the rest of the branches, we do not have any automated ways to build images because the application that I mean, where we are using this internal tool, it basically has multiple components and as of now, there is no need but we are in the process to automate that as well. We have Jenkins integration and all there we want some automation too, you know?

User should be able to build based on, I mean, the ticket transitions between you know, some three different steps then we would be able to trigger the Jenkins spheres and perform and deploy.

[0:17:52.3] VIPUL: The next question I had for was for Rahul. I believe both of you built some tool which is based on top of Kubernetes and I believe this is just kind of a UI or something? Why exactly did you build this tool? You didn’t have any existing — Kubernetes does not have some tool which is already on need. What other tools that are out there, which provides the functionality on top of Kubernetes for doing this UI kind of deployment?

[0:18:19.9] RAHUL: Yeah, in terms of like specific to Kubernetes deployment, as we all know Kubernetes handles all the things in terms of just on manifest and when we want to do any change to our resource either we have to modify our deployment template and apply it or we just have to edit it using Kubectl but using kubectl for production deployments is obviously not a recommended way.

We obviously checked in Kubernetes and community and it has a really good tool, called “helm” and it manages all the deployment releases and that what’s really take. We are on public cloud and we were deploying all our application on to our Kubernetes cluster.

As per our need, we wanted something in the early assemble phase. In this particular project case, the climate was to have a single click deployment or single command deployment. Though, helm was good, even but we wanted to make our Kubernetes cluster most simpler even for non technical people.

Even the helm was good, we thought of using it as like a deployment tool so we came up with our own small tool, we — it works just like helm, it updates all the manifests and using that UI which is like coupled to our cloud provider, it also modifies the database instances and other things.

That is just a tool, but just talking about helm, there is also Red Hat’s Openshift, which manages the deployment things as well and that is also one of the good tools. If you are not – I will again say that this is really opinionated on each user like wich tool to use, there are community tools from Netflix, from ZenDesk and all. People are coming up with their own tools and utilities as per the application need.

[0:20:25.7] VIPUL: There is no one way of doing the deployment is what you…

RAHUL: Yup. This is all about deployment tools if we want to talk about like cluster provisioning, maybe there are various ways to deploy a cluster as well. When we talk about deploying apps on public cloud, I don’t think there are – well known plugin providers provide Kubernetes.

At least AWS doesn’t, there are some tool out or there are too some community tools to build up your own cluster and maintain it high availability. We explored kubespray, we explored kubeadm, we explored kops and decided to go with kops. kops is a Kubernetes cluster provisioning tool, we are using which we can implement HA and disaster recovery on Kubernetes cluster. We started off with it and it seemed really good.

More towards upgrading that you want to upgrade form one Kubernetes version to higher Kubernetes version, we can do it without down time. Also, it’s a supports private networking using calico like a flannel, weave and other tools but we don’t use as kubenet. The important part we were impressed using kops is like it’s supported private sub netting out of the box. In our traditional architecture, we had all the productions machines that were behind the VPN and we had our natting instance using which are traffic used to go.

Using private sub netting using kops, we deployed all our nodes in private subnet even after some private subnet and we had only one jump box or bastion which has traffic flow.

Using that tool down the line – behind the scene it uses, private sub nets of AWS as we won’t be going that much into details and we used calico as our networking. Instead of kube net, kubnet is the default networking of kubernetes.

That reallly made provisioning cluster easier and we were managing our Kubernetes clusters using kops, that is one of the important tool. I would mention helm, kops and some of the tools like open shift who were like –

[0:22:46.9] VIPUL: All these are I believe generic ones, there is no specific tool which is targeted at rails or something? Like the application. It’s mostly all generic tools I guess we’re talking about.

[0:22:55.5] RAHUL: Yes, these are all generic tools and as far as, we are talking about rails, Kubernetes build with yaml files. You wanted it probably it at django application not Rails or PHP. We just have to pull our image so this is all related to Kubernetes and these are all generic, meaning you can use it for any application and deploy it on Kubernetes clusters.

[0:23:20.6] VIPUL: Why I mean this is because you both discussed about specific things like migrations, which is like changing their database or acid compilation. These things I think are marked out of the box by some tool, see you had to do specific things for performing these actions.

[0:23:40.0] RAHUL: Yes, obviously, even we talk about rails deployment, we have to run DB migrate and if we are having multiple pods, we have to make sure, db:migrate is executed only once.

Let’s say we have three pod but there’s no point running db:migrate on all of the three pods even though they’re identical. We came across something called Kubernetes jobs which helps us running some custom commands and while starting deployment, let’s say like, our deployment way is push the commit on manual trigger, we have build in Jenkins and pushed our Docker image to docker registry.

We want to deploy, we have pulled our latest image in our deployment manifest and we’re deploying. While deploying, we just – when we pull an image but before applying a deployment for other service, we just executed, our DB migrate job which just runs only ones before deployment.

That helps us to keep the DB migrate task out of web pods or web containers and executed only once. This is how we achieve custom tasks.

[0:25:00.7] VIPUL: Got it.

[0:25:01.2] RAHUL: So Kubernetes jobs is a one.

[0:25:03.6] VIPUL: So after you have this whole image set up and you know the different build task, which I know are like the DB migrations, image building and then you actually wanted to run the deployment. Can you walk me through how exactly or what is your cluster set up or where is this whole thing being deployed?

[0:25:26.0] RAHUL: Yeah, sure. We use a public cloud like AWS and all our clusters are deployed on AWS. We are in private subnet, we use calico in networking and our Kubernetes cluster is only accessible privately. When we deploy our apps we are only allow application URL’s are public meaning only port 80 and 443 are open and we can access our application using that. So when we deployed it on AWS, first thing as we mentioned is like we had to be with service discovery.

And even though the services were really managed beautifully in Kubernetes using service name spaces and service labels, that really helped us. But for cloud providers, Kubernetes gives us a service type of load balancer especially for our web service as a load balancer, it automatically clears an ELB or AWS managers security group and our application is up and running on ELB URL. But this is where we obviously, we don’t want to give a client…

[0:26:33.0] VIPUL: Does this mean one there each running? The application of ELB you mean this is able to by Kubernetes?

[0:26:38.5] RAHUL: Yeah, this is a Kubernetes feature. If you are on Azure it will spin up a new load balancer. On Azure, if you are on some other cloud provider, let’s say Google Cloud it will spin up a Google cloud with load balancer. So you have to specify your service type either it is load balancer. There is a node port which will expose your service on one of the Kubernetes node on some other with report. Anyway when we have ELB URL obviously we don’t want to give a ELB as an end point to our customer or our own client.

So we have to use some DNS, meaningful DNS and we have to keep an update of the DNS either way even the services recreated on Kubernetes or should serve the same application you wanted. So there are some add-ons like route 53 add on for AWS which keeps our configures that outfit with route 53 domains with Kubernetes ELB as well. So this is how like when our app is deployed and our services up and running, our ELB end point is mapped with a route 53 domain and client accesses it using route 53 domain.

[0:27:53.9] VIPUL: How well like what are tools are more in this management that you are doing for the cluster apart from make this nice looking what are all the things do you do for resource management or do you have any schemes that you are tools that you employ for doing this?

[0:28:07.8] RAHUL: Yeah, the interesting part with this project was like it as it is platform as in service for a customer and a customer didn’t have multiple clients. Let’s say he had that on 20 to 25 clients and we had only single cluster. So first requirement was like how to segregate between all the clients, meaning that when all the client is running on some node, all his pods and how do I restrict other clients or other application to not use the same nodes.

So in Kubernetes there are name spaces which helps you in isolating the naming convention and there is something called labeling. So we can label our nodes meaning that in our cluster there are 10 nodes. We will label them, two of the nodes will be labled like client is equal to ABC and when we are deploying or deploying manifest, we will assign that label to our pod specification. Meaning that when I am mentioning client is equal to ABC, all of my ABC client pods that are deployed only on node which has label ABC. So that helped us really like segregating the environments for different clients and labeling is one of the things. Also with name spaces they had –

[0:29:25.5] VIPUL: So what part of Kubernetes does the scaling of or does the management of name spacing this, it’s like in built?

[0:29:32.4] RAHUL: Yeah, so name spaces are in built feature. We can create name spaces and keep separate territories and Kubernetes API itself. So on our new application provisioning we by default create it in new name space. So we’ll just say ABC production is our new client, create name space ABC production, create a web deployment, create a background deployment for him and create all sort of services deployment for him.

And all the related secrets and all are in name space or specific client and the labeling feature is also a native feature from Kubernetes. We can label our nodes and we can use that node selector using node selector parameter between our deployment manifest. We can specify to our pods say go and run on this pod so that Kubernetes schedules that pod which matches the label.

[0:30:25.6] VIPUL: So in your provisioning cycle, when you are provisioning these things is Kubernetes doing all of the provisioning for all of your resources on AWS or do you also need to bring up like let’s say some other service like elasticcache or database? Or everything is being handled from within Kubernetes?

[0:30:47.4] RAHUL: No, not everything gets handled in Kubernetes. As of now, we are using Kubernetes only for stateless application and for database and related database applications. We are still relying on cloud providers because still it is a matter of debate and it could take another podcast to discuss if either to run databases on Kubernetes or not because still there are stateful sets and some people are running and creating — Running or at least finding out in production.

So as we are on AWS, we choose to use RDS and for our background jobs, we use Redis so elastic cache also. The elastic cache is the one which is available from AWS. So interesting part also is like when we were talking about single click deployment or single command deployment, if we are provisioning resources for Kubernetes, we can just create it from existing deployment.

So here it comes, we started off with some declarative tools like Ansible and Chef. So initially we started off if we are creating one new application we will create a database RDS for him. We’ll retrieve the end point of that RDS and that end point will be created as in secret on Kubernetes cluster and the name of that secret will be embedded in our deploying manifest. The same goes for elastic cache as well. On first provisioning, we have to create all other related resources if you are using some object storage like S3 create to S3 bucket for it and give the appropriate permissions and also restore the credentials.

So there are services like where we need to use S3 credentials and this is where something like kube2iam, which allows us not to feed our AWS keys or any application since we are using it into portal deployment specs. Instead, you create an AWS role and just to give some role specific permission to the kube2iam. You can just Google it out and that is a good add on. So this all to get started but initially if you are provisioning, you have to come up with your own scripts.

So then further when we move to our internal tool, we choose the cloud provider API like AWS, SDK to manage all of these database stuff. But I would say that is only a one-time thing when we are creating the app and further it is our Kubernetes deployment templates itself.

[0:33:18.5] VIPUL: Got it. We’re mostly running out of time but before we move on I just have another question. So Rahul you mentioned a couple of things, first you mentioned about the stateless versus none like RDS ideas for that. So you don’t deal with any kind of state in any of your pods or do you have some pod where you do use some kind of state and if you do, how are you managing the state? And the second thing you mentioned about credentials and you also said about manifests, so how exactly is that being managed and what other things do you have in case of manifest?

[0:33:53.7] RAHUL: Yeah, for your first question regarding stateful apps are and as we are running our databases on some services like AWS that is using RDS or we are using databases directly on top of EC2. So yeah, we have some of persistence storage on our Kubernetes cluster. Kubernetes terms says it as in persistent volume or persistent volume claims, which claims all the data. But that is for some specific apps but we are not destroying all our relation in database on persistent volume claims because it has its own problem.

The first thing is when we are using persistent volume claims, we can’t scale up and down because volume is attached to single node of single pod and when we are scaling up, AWS doesn’t allow us to scale a single volume with multiple instances. So that is first and prime limitation by AWS and up on when we recreate a pod and we reapply the plugin there are chances that AWS stucks your volume meaning that how fastly the Kubernetes applies its deployment templates, AWS sometimes does not able to detach the volume from the running node and attach it back to the next running node. Where Kubernetes should be able to run the next pod.

So these are some of the runtime issues and we will first take some time and this is where we don’t use it for critical applications but yeah, we are effectively using it for some of the applications where really we have to store some GB’s of data like 40, 50 GB’s of data onto this and that is working good as far as we don’t really deploy frequently on those app. So persistent volume claim is one of the good choice.

But I would say if you are really interested in using stateful apps I would rather say try, you can try stateful apps. Now also there are something like open EBS coming up and Rook as well. So I would say what is happening all over the company to get it up and running even Kubernetes 1.7 and 1.8 says that they should officially people should move to use Kubernetes for stateful apps as well.

[0:36:13.9] VIPUL: And that’s exactly what we use that’s? Like what state, what kind of state would you store? What app, what service would you go?

[0:36:20.3] RAHUL: So apart from database, there is one off app which is chosen data when our client uploads. So there is one application, the client uploads a big file like I think those are CSV or EXCLE files and those have to be stored. In this case, we use persistent volume and that is looking great. So that is one use case that we are using it in. So far it is working good because we don’t deploy — and for that particular application, we don’t need to auto-scale like we do for web and background pods.

The other day when traffic was in peak, we have to spin up something numbers of web pods. If we would have been using persistent volume over there either we would have to on runtime we would have to spin up 10 new persistent volumes. Our AWS would have limited over there so that you can share a single wall under 10 new ports.

[0:37:20.3] VIPUL: And what about manifests so do not pick up?

[0:37:23.8] RAHUL: Yeah, as we did not really choose helm so we create and manage manifest with our internal tool. So as of now we are using Ruby’s SDK for that and we are executing in our database and whenever we printed a deployment, Ruby app just creates the YAMAL templates and it applies with the correlated values. So this is all happening with Kubernetes API and when we are applying, we are applying it to the particular cluster, which is pointing a detail kubeconfig.

So YAMAL and just on manifest are in control of our Ruby app because those are the really important thing when we are trying to deploy. So the Ruby apps manages it really well but people like Openshift and Helm, if you are just starting off you can either choose Helm or Openshift, which is just the management of manifests.

[0:38:32.7] VIPUL: Makes sense. All right, I think that’s all I have for this podcast. We will be discussing more things like we will go a little bit deeper in authorization or a couple of other things that we left out for this discussion. Again, thank Vishal, thanks Rahul. It was wonderful having you again.

[0:38:51.7] RAHUL: Thank you, Vipul.

[0:38:32.7] VISHAL: Thank you, Vipul.