In this Episode Rahul and Vipul discuss about the various strategies they are using to implement application scaling on kubernetes. Rahul talks about how they deployed HPA for memory based autoscaling. Also talks about Custom metrics autoscaling and how they scale on Query per second(QPS). Further he explains how the ingress controller implementation helped them in configuring path based routing.

Key Points From This Episode

  • Rahul’s experience at DevOpsDays CapeTown 2017.
  • Autoscaling applications using HPA.
  • Custom metrics autoscaling.
  • How ingress controller can be used to use path based routing.


[0:00:04] Vipul: Hello everyone and welcome to another episode for “All Things DevOps!” Today I have Rahul with me and we will be continuing our discussion on Kubernetes with regards to auto-scaling as well as some of the things that Rahul is working on for getting the most out of how we can do auto-scaling. Before we begin welcome Rahul and I think you’ve been just DevOpsDays do you want to give a recap of how that went?

[0:00:38] Rahul: Sure. Hi Vipul, it’s nice one more time on this new episode of “All Things DevOps.” So, yeah, a couple of weeks before I went to DevOps Cape Town to talk about deploying production-ready Kubernetes clusters. It was a two-day conference and I enjoyed. My talk went well and I had a lot of questions about how to use Kubernetes in production. Some people are trying to use Kubernetes and what issues they are facing on. Some enterprise companies trying to move all of their architecture to containerized world mostly on kubernetes and what challenges they are facing.

So after my talk, I met some of the people who came up with their questions like, how do they simplify the deployment pipeline using tools like Jenkins? What are the things they are trying to minimize their build time using base images? So, a couple of attendees approached me about minimizing their build image time. For one of the users, they were deploying similar rails kind of setup on kubernetes. They were migrating from traditional server architecture to containerized architecture and developing their apps with minikube and deploying them on Kubernetes clusters.

So over there we just found that some of the docker image commands or some in the docker file, the docker image was not segregated or isolated. All of the things were in single docker file that used to take their deployment time a little more and that also used to build one single big image and they had to reuse it on each deployment. So, I answered some of this kind of questions and there was one attendee who asked me about like, “how do I make sure that my Kubernetes software which I am deploying on my own data-center or on bare metal or on-premise with my complete set of hardware or data center?”

So I suggested him like, “Kubernetes is open-source software and you can anytime build up your own cluster using tools like kops, kubeadm, kubespray, tectonic or some of your simple bash scripts.” So he looked like a HIPAA compliant, HIPAA compliant client and he wanted to deploy his software with all of the standards mentioned by them. So he was little curious about from where the Kubernetes setup images, so when we deploy a multi-master Kubernetes cluster we have multiple master nodes and to start up a cluster there are some necessary images from where Kubernetes automatically fetches it.

If you are creating a cluster using kops, kops will automatically by default get the hyperkube image which is a collection of all the necessary components of Kubernetes, Kubernetes etcd, Kubernetes Control Manager, kube-scheduler and all. So it’s more concern about like from where the registry images are coming if those are coming from Google container registry from how I can specify their… like I don’t want it from the specific vendor, I want it from my own. So I pointed it to him that you can do it by using this flag but by default, those come from Google container registry.

So these kinds of discussions were really great to have. Then apart from DevOps and Kubernetes, one thing I liked about this conference was there were some open sessions. The few random topics were selected by organizers based on the suggestions from attendees and the topics were like the monolithic app. There was also one of the open spaces regarding remote work culture and one other with DevSecOps like DevOps security and operations.

So I attended all the workspaces in the different rooms and I got really interesting questions when I was attending remote working open-spaces. Some… we being, we being big binary working completely remote, I sketched our work-flow and some of the people were really interested in how to adapt to the remote life.

[0:05:32] Vipul: So, how about some of the other talks like, any interesting talks that you talk to or hear? I think you mentioned something about storage or something interesting?

[0:05:48] Rahul: Yep, so conference had almost all kind of talks. The keynote was from Ken Morgage and it was really… the topic itself says like “You can’t buy DevOps,” so that went… the talk was really good and he depicted the work-flow and how the deployment process right from the times of mainframes to today’s era like, containers or server-less. How it changed and how we should… kind of how we work closely with teams like dev… Earlier it was sys admins and DevOps and now like, site reliability engineers and all. So the keynote was… keynote from… I think it was from Ken Murgage, “You can’t buy by DevOps.” That was one of the most enjoyed talks, but we first started with the CAP theorem and then there was one talk from Jack Oberlin about continuous testing and the final frontier of DevOps. So most of the talks were mainly focused on automation so there.

[0:07:00] Vipul: Right, right.

[0:07:02] Vipul: Yeah, the other talk from De Wet Blomerus is what I learned from applying for 107 jobs. That was one of the interesting talk and apart from technical talks…

[0:07:13] Vipul: 107 jobs?

[0:07:15] Rahul: Yeah, so he was like, I mean he was an experienced guy worked for few years and then he just depicted like what I learned from applying at 107 jobs and how finally he decided to choose GitLab because it was a remote work and he found like, that is it now and I will go for this job. So that was a nice talk. It was not that technical but that was nice and after that, another one talked from Spencer Krun, a speaker from IBM. He talked about like what can we learn from eSports?

So showing the sports theme, he showed like how we can handle the best practices in DevOps and all. So after that, we had open-spaces. So there were only three talks on that day. The day started with three talks and followed by open-spaces, so then we had some workshops. There was a workshop on Azure from Sheriff El Mahdi and there was a workshop on Ansible from the guy from Linux Professional Institute. So it was a full packed schedule for two days and one of the talks about the state of from storm Joubert who is like this, I mean Storm Joubert talked about security for everyone and he focused about some security concerns and what may just we should take on.

Then we had a Minikube talk like, how to start development using Minikube and Whiteny Tennant describe like how to use Minikube in your development pipeline. So yeah, this was really a fun conference and after that at the end Sheriff El Mahadi and then..again took a workshop upon like OSS based DevOps on Azure. So after the workshop he really showed us like, how Azure is working towards open-source software and how they are coming up with new technologies like Azure container instances and all. Yeah, this was all about conference and…

[0:09:46] Vipul: Quite interesting, how did you find community in and around South Africa.

[0:09:57] Rahul: Yeah. So this was my first time to South Africa and I haven’t been really in touch with some of the communities like, most of the open-source communities that they start from Europe or USA or from other countries but about South Africa I was really curious and I think the community is good at least in Cape Town. There are continuous meet-ups happening around almost and everything so I was there for DevOps Days Cape Town, but they have their monthly meetup Cape Town, DevOps Days meetup, also they have other conference called “ScaleConf” which is, ScaleConf which is in the month of February or March which is mostly focused on scaling of applications and infrastructure. By my observation almost like, I think most of that attendees were like if I’m not wrong they were mostly working on technologies like .Net, Java, Ruby on Rails and at least I would say like 10-15% people were trying to use container technologies in production while 30-40% people were trying to get into containers and adapt containers in their workflow and other people being involving other DevOps automation were also present over there. So I would say a community is good and there is their own discussion, I mean there is one independent slack where all the people gather in South Africa.

[0:11:49] Vipul: Nice. So let’s move on to our discussion that we are going to have. Today we will be discussing a bit about… I think you’re facing a couple of issues with auto-scaling and how you’ve been doing auto-scaling in Kubernetes, so I thought that that will be an interesting topic to discuss. So can you give a brief idea of what auto-scaling is and how exactly are you using… how exactly is the auto-scaling setup right now in the app that you’ve worked on?

[0:12:23] Rahul: Yep, sure. So auto-scaling is really important part of infrastructure provisioning or deployment of app and if we don’t have good practice placed for auto-scaling we end up doing something called over-provisioning. Let’s say if I’m not sure about my auto-scaling or I don’t have auto-scaling, if my infrastructure is handling 1 million requests per day and I don’t have auto-scaling which in it for high availability to… or my infrastructure I’ll over-provision it to handle 2 million requests at least because I don’t have auto-scaling, because I don’t know from where… on which day my 1 million requests will be scaled to 1.5 million or 2 million. So avoid such things like over-provisioning and dynamic scaling. The concept is auto-scaling, we scale as we needed. So with Kubernetes or with containers we have a couple of scaling strategies, we have Horizontal Pod auto-scaling from Kubernetes which scales automatically based on CPU usage.

Once we decide a threshold and we apply a threshold in a manifest of deployment like, we say when this CPU usage of Pod is 80%, it will automatically auto-scale and that works great. It is calculated based on the mean of pods. Let’s say a particular deployment is running four pods, it will calculate a mean CPU usage of all the four-pods and accordingly it will scale, but as of now, we don’t have a direct provision from Kubernetes to auto-scale based on memory so we can write our own. It doesn’t take a lot of time to write our memory based auto-scaler using Kubernetes API. Just like CPU based auto-scaling, we wrote our own memory based auto-scaling which calculates the memory used by the pods using heapster.

Heapster is a time series database which monitors the cluster health and gathers the metrics how many bytes of memory is used, how much cores of CPU is used, using that we scale based on memory and CPU. That went well but one other day we came across like, even though we had memory and CPU based auto-scaling, we did not have a consistent count of requests. One day over the weekend the count is really very less and in peak hours or in peak days the requests count will go up.

[0:15:13] Vipul: Can you give an example of such number like, do we have any numbers for how much the requests are.

[0:15:20] Rahul: Yeah, so for one of our applications, a production application we generally serve 100-150 requests (that’s a small app) in one business day, but at peak time it used to go to 600-700 requests. Even though we had auto-scaling placed for memory based and CPU based sometimes our load balancer used to give up, because the tasks were not memory consuming or CPU consuming but the requests were timing out. So we faced a need of like query per second auto-scaling or request based auto-scaling. So with memory and CPU based we started to research on how to get request based auto-scaling up and running.

So for that, we came across strategies like, I think one of the contributors to Kubernetes is Luxas. He has shown how to use Custom Metrics auto-scaling. So in Kubernetes terms, we can say like, “how to scale based on custom metrics?” So we just get metrics and decide a threshold and scale it. So we started our work on like, how to get Query per Second auto-scaling, QPS auto-scaling and ended up writing some other tool with the help of Prometheus operator. The Prometheus is an important component used for monitoring and we use that…

[0:16:534] Vipul: I want to, I want to circle back on the issue though like you said… so initially you already had this setup where… so you have like, we have scripts where… what they do is they are monitoring the CPU and memory usage, correct?

[0:17:08] Rahul: Yeah.

[0:17:09] Vipul: And based on that CPU and memory usage we have the scale up and scale down, and this scaling is what? This is virtual scaling or you are doing hardware scaling?

[0:17:16] Rahul: So this was a horizontal scaling.

[0:17:20] Vipul: So it’s actually adding resources?

[0:17:23] Rahul: Yeah. That was just adding up resources. So for some use cases what was happening is, even though the CPU usage is under threshold 80% and even though memory usage is under 80% or the threshold defined, but web server or app server was giving up because there were a lot of requests flooding in and it was just giving up. So at that time, there was a need to scale our web server or tweak the configuration of our web server whether it is Unicorn, Nginx, Puma or Passenger. So memory or CPU based auto-scaling works great, but for such production use cases, we need to have a count, request count and based on that we can scale.

So I think a lot had been talked about it in Kubernetes 1.4 version itself. The feature was introduced the Custom Metrics auto-scaling was asked by users and it was introduced in alpha in version 1.4. In further in version 1.6 it came to beta I guess and with version 1.7.2 it is stable and it comes with Kubernetes 1.7. So using Custom Metrics auto-scaling you can just gather metrics using heapster cAdvisor or Prometheus and it just use that graph with your Horizontal Pod Autoscaler just like the CPU or memory works and use those metrics and specify threshold values. Let’s say, if your two Pods are handling 200 requests per seconds, define a threshold if there are 250 requests coming just calculate the mean and add third pod.

[0:19:12] Vipul: Actually let me interpret also. I may be repeating this again and again, but… so assuming you have a service as you said like which is using 80% of the CPU and say 80%… like uses 70% of the memory, you are saying that the… so even if there is a little bit of resource availability the request for erring out like how is that exactly? Your services were being adequate… sorry inadequate in serving them or what was, what was causing them to…

[0:19:45] Rahul: I mean we had defined some thresholds with our pods so when you…

[0:19:51] Vipul: Oh, ok.

[0:19:53] Rahul: When you deploy your pod on Kubernetes you will specify that this particular pod can use one gig of memory or two gigs of memory. In that pod, we were running Nginx. They have some parameters like request body size, temp file size and how much process nginx worker can use them and the…

[0:20:12 ] Vipul: So basically the resource usage was actually not mapping it like, you would have deviance on the memory and CPU and that would not… it doesn’t directly always correspond with the number of requests that you get.

[0:20:28] Rahul: Yep, so in this case it was and in the nginx config we had allocated some 500 or one gigs of memory only for nginx even though the entire memory for pod is two gigs of memory, but all the requests were initially handled by nginx and then they were reverse proxy to Unicorn, but what happens when your Nginx gives up?

So, in that case, you want to scale your Nginx config and this is… actually there are other reasons as well based on the application in it. So nginx is just one of the pinpoints, but that let us to reset something like QPS or request per auto-scaling and we just started about it. I hope here I have answered your question correctly.

[0:21:16] Vipul: Yeah, you did.

[0:21:17] Rahul: Okay, so we started off with researching on Custom Metrics auto-scaling and let me tell you even though with the version 1.7 they said like, it is inbuilt but getting it working was not that easy. We actually had to modify a lot of configuration on Kubernetes master and as we were running our Kubernetes cluster using kops, we had to do some things manually like changing API server flags because kops doesn’t get support for API aggregation layer flag to be added automatically. So we ended up doing that manually. We created an issue on upstream. I hope it is still under… issue is still open, the work might be going on. So for that, we also created our own API version. We created an operator, a Prometheus operator and then we needed a count of our request.

So first thing is like, how to scale based on request? We need our count; we need to tell our deployment that scale… because we have this number of requests. So we got a count of our request from the…

[0:22:39] Vipul: nginx?

[0:22:41] Rahul: Yeah, actually nginx but in our case, we were deploying our Kubernetes cluster on AWS and our service type was load balancer. So initially we got a count from ELB and if the HTTP request count is 100 or 200 accordingly we set a threshold and we autoscale based on QPS. So I think almost all of the work is done but we are going one step forward and instead of ELB now we are looking into something like Nginx controller. I think it’s the most recommended way if you are deploying your web services on Kubernetes because it offers many features like path based routing and also it operates on the L7 layer, layer 7 and using that you can configure multiple domains and also it supports SSL termination on ingress itself.

[0:23:46] Vipul: I would like to discuss like how exactly have you setup because I understand there’s ELBR and then you have… it is being routed to your pods, can you explain a bit about how the whole of this routing is setup and how exactly is… So we have our cluster and you said you are using ELB, how are the requests being routed to the pods on the Kubernetes cluster?

[0:24:15] Rahul: Yeah, sure so that isn’t a big deal, I mean in Kubernetes we have services when we create a deployment to use Kubernetes service we just specify labels and when we define a service type as load balancer, Kubernetes will automatically create a load balancer in AWS and all our traffic which will be going to that app will be routed through that Kubernetes service which is for the external world it is ELB but for Kubernetes cluster it is Kubernetes service which will route a traffic and it is intelligent enough. The Kubernetes service component and the service discovery of Kubernetes is really nice.

So the Kubernetes itself routes all the load balancing thing in round-robin no matter how many pods. If you have tens of pods or number of pods and if you have single service, it will distribute traffic efficiently. So that was handled by Kubernetes but there was a need… we needed to route our traffic when we move to containers we always tend to have micro-services architecture and try to… as well as more and more things.

So we wanted path based routing. Let’s say for /API request or /ABC request further we wanted that /API request should go to the only pod labeled with API. So how do we do that with Kubernetes service or ELB? So for ELB the current ELB doesn’t support path based routing. There is Application Load Balancer which is from AWS but Kubernetes doesn’t support the integration of ALB directly. So ingress is the thing where you can use path based routing where I just gave configuration route all request to the pods labeled with /api and route all backend request to pods labeled with the back-end. So this isolation and more and more micro servicing let us try to use ingress and we are in process of deploying ingress on production.

[0:26:46] Vipul: This is quite interesting because you actually don’t have to create a micro-service like, a whole set of different service and code and deployment but ultimately you’re leaving it to a place where you could auto-scale particular endpoint or part of application just for… and keep all of the code bases in a monolith which is ultimately beneficial for development.

[0:27:07] Rahul: Yep, because I think when we were trying to auto-scale based on requests per second most of the requests were /API request and if we were auto-scaling complete pod where unnecessarily we’ll be auto-scaling the complete services. So /API request might be handled by your Unicorn, Puma but /foreground request will be handled by your CDN or just by Nginx. So separating this two services into two different pods when requests are bombarded on your /API pod only your API pod will be auto-scaled and your frontend pod is needless to auto-scale. So that helped us in designing the architecture in this way and I think so far so good it is efficient and saving some cost on auto-scaling as well.

[0:28:00] Vipul: Awesome. Nice. How is the ingress integration going on now?

[0:28:08] Rahul: I think ingress integration… the beauty of Kubernetes like there’s a lot of documentation available but of course with any software you have some limitations. in our case, we are deploying our Ruby on Rails application on Kubernetes and in the most cases we use Unicorn and Puma and we prefer running those services on UNIX socket over the pod. So with ingress, we do have one limitation. Ingress doesn’t support routing directly to the UNIX sockets as well because… I mean UNIX sockets are for Inter-process communication and we can’t access the UNIX socket outside of the host.

So we came up with a strategy called a sidecar container. In the same pod, we have the other nginx containers running which reverse proxies to the container or application container and which is serving unicorn on port and ingress is handling that traffic. Another way is to use socat. Socat is the traffic or port forwarding utility, so our unique socket is forwarding traffic to an arbitrary port and in turn, ingress is talking to it. So yeah, I mean that was some hacky way but we got working on that and as of now, but I think this is a unique case, don’t know how many people deploy rails on Kubernetes and they explicitly use ingress. So yeah, we were…

[0:29:42] Vipul: Ingress in its own like, how is the request tracking being done? How’s the count of being tracked? Like it’s also using some other utility for tracking the request and the count for the request?

[0:29:58] Rahul: I think we are using Nginx ingress. So there are many types of ingress. You can use Haproxy ingress and you can use your cloud provider ingress like GCP if you’re on GCP. So we are running Nginx ingress and…

[0:30:15] Vipul: Sorry, my question was with, you’re doing auto-scaling, where are you getting all these requests counts from?

[0:30:20] Rahul: Yep, so there are annotations in Kubernetes and I haven’t checked really if we have a ready-made annotation for that, but there are various ways in Nginx where I can write a rule and get a request count. So as of now, our approach is to get it from Nginx itself, so we’ll modify Nginx ingress configuration such a way like we’ll count our request and we will route it to some endpoint and using that we will auto-scale.

[0:30:53] Vipul: Interesting. I think we can… if you have any other remarks, we can come to some conclusion, if you want because we are running out of time. But this has been an interesting discussion about how you are solving issues related to auto-scaling.

[0:31:10] Rahul: Yes, I’m… like as we are using day-by-day in the ecosystem with regards to containers and Kubernetes is growing and as we face more and more issue we do some new research and finding solutions. So this kind of reasons like ingress this was exceptional even though I dug into the community, reached out to people on slack, went through the Kubernetes community and there is no such provision. So the best way was like, I had to get my thing work on while… I mean with ingress as well it was not going to support because ingress operates on L7 while Linux sockets are on TCP protocol and they are on L4 or L5 on network layer itself.

So this was a unique case and yeah it is interesting with auto-scaling based on custom metrics and yet there is no support for request per second, even there is no stable support from Kubernetes for memory based auto-scaling. So the journey is both fun and challenging and we always tend to keep learning.

So I think auto-scaling, monitoring, and logging are the three important parts of any application deployment when you are running your application either on containers or any other platform. Once you have deployed your application… of course auto-scaling and monitoring other parts which we have to keep an eye on it and it has to be really up to the mark and perfect.

[0:32:56] Vipul: So thanks all for your time! It was again quite an interesting discussion. Thanks again!

[0:33:04] Rahul: Thank you.