Links Mentioned in This Episode
- BigBinary on Twitter
- Joe on Twitter
- Rahul on Twitter
Rahul: Hello, and welcome to the new episode of “All Things DevOps” podcast. Today, we have Joe from Heptio. Tell you a little bit about Joe, he is co-founder and CEO of Heptio. I will let him introduce himself. Hi Joe, welcome to this new episode of “All Things DevOps” podcast. Can you just introduce yourself?
Joe: Sure. Nice to meet you and nice to be on here. So my name is Joe Beda. I am the CTO and founder of Heptio. We are a company specializing in bringing Kubernetes and cloud data technologies to a wider enterprise audience. Before this, I was at Google for about 10 years where I helped to start the Kubernetes project after being there for quite a while working on cloud stuff.
Rahul: Awesome. Thank you. So recently I have been hearing a lot about Heptio and its work in the Kubernetes ecosystem. I myself have tried some of the tools like ark, Sonobuoy and some of those are really helpful in maintaining Kubernetes and Kubernetes related clusters and all. So just wanted to know like as you already said, like you have almost more than 10 years with this container ecosystem. So what was the point where Kubernetes evolved? Don’t know if you were at that time you were in Google or somewhere, but where was the start of Kubernetes or the project where Kubernetes evolved?
Joe: Yeah, so Google, gosh probably it’s somewhere around 2003, 2004, started building out this system internally called Borg and it pioneered a lot of the ideas that we see Kubernetes today. It’s not just about running containers or running workloads and containers, but finding out, figuring out how do you assign those containers to individual machines that are running on in the network, how do you find those things and how do you manage all that stuff and scale. So if you fast forward several years after that, I started this project at Google called Google compute engine which is Google’s virtual machine as a service business. And part of getting that off the ground and making it work well with Google systems, was building it on top of Borg and so my first experience with cloud was building a virtual machine product on top of a container platform. And as cloud and GCE became more critical to Google’s business, the next step was to try and get Googlers inside of Google, engineers inside of Google using the exact same platform or the exact same mechanisms that folks outside of Google were using.
So this meant that we either got engineers inside of Google building on top of the amps which to them would have felt like a big step backward once they are used to something like Borg. Or, we had to go through and bring Borg to the wider world and so we decided to do the second thing. It wasn’t really practical to bring Borg to the wider world as is, so we started a new project based on all the ideas that were proven out in Borg over that time. And just because I didn’t start Borg, I was around at Google, as it was getting off the ground but I didn’t start that project. But we did start Kubernetes as a way to bring the ideas that were proven in Borg out to a much wider audience. And then so that’s about – that’s 3 or 4 years ago now and then open sourced it and it’s just been pretty crazy ever since then, it’s really been an exciting thing to be part of.
Rahul: Awesome. That’s really interesting. Any of the like these big upstream projects are being built. So one thing, at what point it was meant like a Kubernetes will be open sourced or like what was the moment when Google thought like this should be an open source project and what made a company like Google to release such a huge project as an open source management project?
Joe: Sure. I mean it didn’t start out being a very large project. I mean that sort of happened over time and it started out pretty modest. It was part of the strategy to actually make it open source from the start and it actually took quite a bit of convincing, we probably spent 3 or 4 months doing slides and talking to Google execs and really making the case to do this as an open source project. The reasoning behind that thought was historical if you looked at systems that – so Google had talked about its internal systems with things like MapReduce, and GFS and Chubby which is their lock server. Had talked about all these things by writing papers. And then what they found is that other folks would take these papers and they would independent implementations and I’m thinking things like they dupe based on GFS and MapReduce and they would make independent implementations of these. And it would ignite a thriving ecosystem but because those implementations were slightly different, slightly incompatible with the way Google approached the problem, what that meant is that Google could not benefit from the larger ecosystem that formed up around some of the ideas that they had proven out.
So that experience combined with the goal of essentially creating some level of leadership that could be turned into a product at Google with something like Google Kubernetes engine, really was the plan for doing Kubernetes as an open source project. But it really took quite a bit of convincing to really get folks on board with it.
Rahul: Okay. So you just mentioned about Google Kubernetes engine, there is also a Kubernetes service, ACS, sorry, Azure Container Service and EKS are also coming up. So I few weeks backs I came across one of your product, Kubernetes and distribution, which it promises as a cloud-native platform for building Kubernetes cluster. So would you just briefly about that?
Joe: Sure, yeah. So one of the things that our goal, that we want to do as a company, is that we want to make Kubernetes more accessible to a larger set of enterprises and I think there is a lot of folks who really want to have somebody that they can lean on so that they can get a production-ready Kubernetes cluster up and running. And they want to make sure that they – if they hit problems, they can go ahead and talk to somebody. A lot of these folks are running in environments where there is no supported cloud mechanism, whether it’d be GKE or AKS or EKS. And even if they are running on those, administering a Kubernetes cluster and making it work inside your organization does not end with getting the cluster running. There is still quite a bit of knobs and decisions and policies that you need to make decisions about. So that’s where we came up with HKS, the Heptio Kubernetes subscription which is a support model where we will make sure that you are successful running Kubernetes.
And we call it the undistribution because our goal out the gate, was not to create something that’s significantly different from the upstream experience. We really want to support to some degree, the open source Kubernetes that everybody else is using. And we found that a lot of folks really had a hard time if they were used to the distribution model, really wrapping their head around this. So that’s why we came up with the tagline called it the undistribution. So the idea is that folks really do want a – they want some aspects of the distribution but they don’t want other aspects of the distribution. So we will make sure that you’re successful, will answer support tickets. If you hit a problem, we can get you a hotfix. So all that stuff is part of making sure that you will be successful with Kubernetes. What we won’t do is create a super unique only available with us installation and management experience and we won’t create a set of tools or a set of customizations to the cluster that are not available elsewhere.
So we want to provide all the good things of the distribution without the experience of diverging from upstream. And I think this is becoming more and more important as companies really do want to stay close to upstream and they really want to make sure that they don’t sort of pin themselves into a corner by taking dependencies on aspects of a distribution without thinking about it.
Rahul: So companies really want like the upstream – more of the upstream product, whatever it may be not in Kubernetes, if some of the companies offering the in service, people will expect like it should be as compatible with the upstream. So that’s a reality and this Kubernetes and distribution really sounds interesting. Does it supports the hybrid cloud architecture or it’s just for bare metal?
Joe: No. We will support folks running on cloud also. Our model is that we can help you get up and running and provide some best practices around getting Kubernetes installed and configured correctly. But we’ll also support you as long as you have a cluster that passes our set of performance test plus some other tests that we are continuing to develop. We are happy to support you however and wherever you are going to run a cluster. So that whether that be on a cloud or whether that be on param, and yeah, so that’s – it’s really a matter of if you have a cluster that’s functioning well, we can help you make sure that it keeps functioning well. If you don’t already have a cluster, we can help you get up and running with that.
Rahul: Yeah, sounds interesting. So I myself have been managing a few Kubernetes clusters for me and I know the thing like how they are like it’s really not a straightforward. We have to always be on our toes while configuring at high availability. And I think this undistribution will really help out. So I think a few months back, you or Heptio open sourced couple of the projects and those are really interesting, one is Ark and another is Sonobuoy. So I do use Arc and it’s really helpful for taking back up of your Kubernetes cluster and restore it to one of the or other kubernetes clusters. The important tool what I am really excited is about Sonobuoy. The way it scans the clusters and with regard to security aspects, it will also the thing that is really nice. So would you just say a line or two about like how Sonobuoy started and you open sourced it and this is really something that you won’t find a lot of similar projects in the community?
Joe: Yeah, so Sonobuoy started with this idea that as we’re supporting clusters, a lot of times you don’t know off the bat whether there is some problem that you may not be aware of with that cluster. It may be misconfigured, there may be some other type of issue that you might be hitting. And a lot of times the support mechanism for that is a lot of like, hey, can you run this command and then tell me what it says, can you run this other command and tell me what it says. It’s incredibly painful for both the customer and anybody who is offering support. And so the idea behind Sonobuoy is, well let’s run as many of these tests as we can automatically, let’s go ahead and collect those results so that we can minimize the amount of back and forth when people are doing support. So that’s why we started building it but at the same time, a lot of talks started coming up around conformance in Kubernetes, how can we come up with a test to make sure that a Kubernetes cluster is conformance and that it’s going to be compatible and sort of act in a way that you expect it to act.
It turns out that there is a lot of overlap between being conformant and being well configured. And so the first application of Sonobuoy, was for being able to run the conformance test for Kubernetes and then show a result that I ran these tests on this cluster. And so we launched Kubernetes with around that conform – I mean we launched Sonobuoy around that conformance message as the Kubernetes conformance efforts were starting to gel and it’s now probably the – I would say it is the suggested in a supported way for showing that a cluster is conformant, Kubernetes conformant. Then we went on and built a tool to help to guide people to run this and interpret the results. So that’s called Sonobuoy scanner and that’s at scanner.hecteo.com if you want to try that out. But the whole system is extensible and so basically what Sonobuoy does is it runs a bunch of diagnostics on a cluster, takes the results, packages those up into a tarball that you can either download or instruct it to upload to some sort of repository.
And so over time, we are going to be adding more and more tests to Sonobuoy in terms of plugins and such. And we’ve seen that other folks have done something similar. So you mentioned the security scanning, folks have extended Sonobuoy to collect those extended results above and beyond the conformance test that we started with.
Rahul: Okay. That sounds nice and this is really helpful. I tried it one time but it really was helpful in identifying my conformance test on my cluster. So moving again back to Kubernetes ecosystem, now it’s been almost couple of years, people have started moving to Kubernetes or at least a year where people really shifted their production workloads to Kubernetes. So one thing being a Kubernetes user and maintaining Kubernetes cluster, we are always not sure about deploying stateful apps or database services to Kubernetes. Is there are better ways to where we have file system support from third-party services right through open AVS and also process through following claims from cloud providers and all. So still this is always a matter of debate like whether we should deploy stateful apps or database services on Kubernetes cluster. So how would you suggest for a production-grade user who is moves from a traditional server-based architecture to container ecosystem and more specifically using Kubernetes. So how he or she will be handling stateful apps?
Joe: Yeah. So the first thing that I think the advice that we give folks is, be very cognizant of where you put your state wherever possible. I think if you can isolate your state into a database, or if you can isolate it into a service that is sort of dedicated to managing state, I think you are going to have the best time with that. And I think this is generally considered sort of good practice as you are doing sort of multi-tier apps, that sort of thing. But it’s easy to lose track of that a lot of folks are on – they have a history where they are – they didn’t always sort of following that idea. And then it comes down to sort of not stateful apps but stateful services like your database or what have you. And then my next piece of advice to folks is a couple of things, is that if you don’t have to manage a database, don’t. So if you can use something like Amazon RDS or some other type of managed database service, that’s going to probably be your best experience. Managing databases, managing stateful services is hard regardless of whether you are running on Kubernetes or not.
It’s very easy to lose data, it’s very easy to miss backups, it’s very easy to misconfigure these things and get unpredictable performance. So if you can avoid doing that, I would say go ahead and do it. The next piece of advice I tell folks is that if you have a stateful service that’s running well off of Kubernetes, there is no huge need to move it on to Kubernetes. You can talk to a database from a Kubernetes cluster even if that database is not running on the cluster. So that’s the next piece of advice, is that if you have something that’s working, don’t find a way to – don’t try and create your own world more – make your world more difficult that you have to. Now, I think it’s possible to sort of split this up a little bit. You might have your production database running on the cluster and the way you have been managing it. But then have some sort of test and development and staging database that are running on a cluster where the requirements around durability and management are not quite the same.
And the last piece of advice is that if you are going to run something on top of Kubernetes, you definitely want to understand the fact that Kubernetes is a more dynamic system. Unless you take efforts to be able to change it, it’s more likely that things will be restarted and moved around with Kubernetes than off of Kubernetes. This is especially true if you are running in the cloud where you – one of the ways that folks do upgrades of Kubernetes is by essentially draining and deleting nodes, then spinning up replacement nodes. So anytime you are doing that, you are going to see downtime for any of the workloads that happen to be running on the node that’s being drained. If those things are stateless, you generally have them replicated, it’s relatively easy to replicate. And so users generally don’t see any sort of hiccups. If it’s a stateful application, by and large, most databases, most stateful applications do see some level of unavailability when you end up taking down one of the instances.
So just understand sort of the model there, how tolerant you are, how much you are looking for, automatic management versus manual management and there is a trade off in terms of the type of disruption that you can expect. That being said, I think we have seen huge strides in terms of the support for network block volumes that are being added to Kubernetes. We are seeing some really interesting work that I believe is hitting alpha in 110 around local volumes, which are essentially taking a claim on a local discounted node. Very excited about that stuff especially combined with fast SST and flash devices. So we are definitely seeing things move forward but its people who run databases and run stateful services are paranoid and they should be paranoid. And I think it’s going to be a while before that stuff is totally worrying free and turnkey.
Rahul: Yeah, that sounds right as you already mentioned, it’s we are using any third party database service, we should continue using that. And this is how as of now most people are using it but when it comes to completely shifting from the traditional server architecture to this containerization world. And this is the only point where people think of it. Another thing you mentioned like with the Kubernetes 1.10, the local volume and as SST support is coming, so if that comes and if that works then I am sure like it will really solve this problem. But as of now, we are really handling stateful applications in the better way on Kubernetes. So also you mentioned like –
Joe: By the way, I just looked it up and the local volumes are behind a feature gate. So an alpha support in 1.9 but the folks are continuing to work on that. So it’s definitely interesting stuff.
Rahul: Yeah, that’s very interesting. Thanks for that information. So also you mentioned about regarding this the managing Kubernetes cluster, you said like Kubernetes, a lot of things get restarted. Nodes are restarted. They have to be drained properly. So again, I have been managing some of the Kubernetes clusters, so this is a known problem like nodes will be restarted, they will fell with some of the other areas regarding resources though we have applied quotas and I have specified limits. But this is again, either doc or demon or any container runtime demon will start the responding, though we don’t have much of the workloads on nodes, still this seems to be a visible problem across all the Kubernetes clusters. There are some of the tickets like open on Kubernetes upstream as well as contended and extended and all. Sorry, content deadline and exceeded. So this seems to be okay with later versions like 1.7.2 or 1.8 but still, we do come across this issue.
So just wanted to understand like we have been [inaudible 00:22:58] for Kubernetes funded and contributed. So how is this behind managed for any of the cloud provider or Kubernetes clusters, like once we have masters, we have all those nodes and why this sort of scaling issues I would say, if we have number of pods scheduled on one of the nodes, it will rather than spinning up new node or giving some a human friendly error message, that node is out of capacity. Though these days, it gives us that kind of information, it will go in such a state that we have to drain it and restart it and bring back the new cluster. So just wanted to understand both in technical as well as in an architectural way like how Kubernetes is designed or how was that design of this cluster implementation and if possible, yeah, people are working to sort it out, if possible, what are the best practices to avoid such things for production-ready Kubernetes cluster?
Joe: Okay, so there is a lot to unpack there. So the first thing is that if you are seeing nodes fail, the underlying container runtime, I think that’s probably an issue and about with the runtime or perhaps the file system that the runtime is using. That stuff has been evolving and getting better over time and I think we are getting to the point where there are alternatives to docker for the runtime that folks are spending a lot of time and work on. So things like CRIO or Containerd directly versus using docker. So I think there is a hope that over time, we are going to see that stuff get more reliable. OverlayFS with Overlay2 and docker is also a much more reliable than some of the Overlay implementations that are out there. So that’s continuing to be an evolving world and it’s something that I think will dominate sort of whether or not nodes get restarted.
The next thing is there is a question of capacity by default, if you don’t set any extra options then pods in Kubernetes will not have any sort of limits or requests so that means that Kubernetes will continue to try and put more and more work on to the set of nodes without having an idea of when a node is formed. So it’s easy at that point to overload the node especially if you do something like run out of memory and have swapping turned on, it can quickly essentially if you are not careful, could weigh more on a node than that node can handle. So that’s something to keep in mind is that having a quota and resource limits end up being a critical thing for running a reliable cluster. The next issue then is what happens when your cluster is full, what happens when you can no longer put anything on to a cluster. In that case, you will have pods that won’t schedule because there is no room for them to actually fit anywhere. And you generally want to have some extra capacity in your cluster such that if a node fails, all the work that was running on that node that’s critical, can be scheduled on to a spare capacity someplace else on the cluster. So you generally want to have a little bit of slack in your cluster, to deal with some level of failure they can see as you run larger and larger clusters.
If you are running in a cloud environment, there are ways to do cloud autoscaling where you can increase the size of your cluster. The way that those things are typically done today is, it creates a relatively tight coupling between the cluster autoscale and the underlying cloud implementation. There is an effort underway and I did a live broadcast, I do one every Friday called TGI Kubernetes. I did one on this thing called the cluster API which promises to provide a more generic way to approach doing cluster autoscaling. And then the last thing that I want to do is, especially this comes into account when a cluster is full. The last thing I want to mention is that it’s in alpha in 1.9 but hopefully moving to beta soon, is a set of features called Priority and Preemption. And this is the idea that you can tell Kubernetes which jobs are more critical than other jobs. And if there is no room to schedule a critical job, it will evict or essentially kill a less critical job that’s called preemption. So it will pre-empty the less critical job so that the stuff that really needs to run can run.
Rahul: That probably will help. Yeah, I would say it is both fun and challenging maintaining such clusters and we have been deploying few of the production workloads. And there were a couple of instances even though we have cloud autoscaling and resources and limits. At some point, it just doesn’t work timely, have to [inaudible 00:28:13] nodes but I would say that will come better by the time and the more and more [religious 00:28:19]. So another part where I usually enjoy myself working on Kubernetes is the power it all has but the two things make me more and more excited, one is about ingress. So I like ingress and I’d say that I have also tried contour ingress from Heptio and the power of ingress like with respect to routing, decoupling Elson and L flow related traffic and all the annotations and configuration parameters related to ingress. That also solves our problem of scaling services on public clouds like AWS, if we have fewer more apps running on a single Kubernetes cluster, we might go out of ELB services limit, the security groups limit. And this is where ingress really comes into the picture and helps it out.
So I just create – I mean I have tried contur ingress so just wanted to understand about like how for Heptio to come up with contour because there were a couple of ingress controllers already available in a community.
Joe: Yeah. So contour is our ingress controller that works with envoy which is a C++ based L3, L4 load balancer. And the reason why we wrote contour is that we wanted to have a completely open source community driven ingress controller and we have some plans to be able to extend it for new and sort of more essentially sort of larger types of deployments. So we are still just getting started with where we want to take contour. What we found is that a lot of the existing ingress controllers were either not built for sort of scale, they were sort of like a developer focused type of thing. Or, they were built on top of load balancing engines that were really built for essentially an old way of doing things where you had a configuration file. You wrote the configuration file, the configuration file is not updated that often and when you do update it, you have to do a fairly intrusive restart or reconfiguration of the load balancers.
We really like envoy because it’s API driven from the start and so envoy really is built to be dynamic. It works well with the dynamic nature of Kubernetes and so contour is essentially adapting envoy to the Kubernetes ingress model.
Rahul: That sounds interesting. And also, I would say this is a beauty of open source community project. We have a lot of options. So another day, we usually use nginix ingress but another day for one of our case, HaProxy is best suited. And we went along HaProxy and that just threw out of the box. So contour really looks an interesting as the best load balancer for NY Proxy. Another thing what I came across, again and again, is like we have applications deployed on our Kubernetes and we have all this autoscaling in place which works really great. We have application autoscaling based on a horizontal autoscaling such as like CPU metrics, memory metrics, and other custom metrics. So I have been trying out for few months like trying to get custom metrics autoscaling in place but I did get it working with Prometheus adapters and all. So is there something will you suggest or it is coming from Kubernetes community, default support for custom metrics autoscaling may be based on request per second. It is here available but it is not out of the box, we still have to create custom metrics and get metrics come from other operators or services. And then this is how it works and it ships all the data to HPA. So what are your thoughts about like scaling application with different types of parameters?
Joe: Yeah. So I have a, I think a somewhat controversial opinion of custom metrics autoscaling like that. So first of all, I haven’t done horizontal pod autoscaling with custom metrics, I haven’t deployed that myself. So, I don’t have a lot of expertise there. I think many times people apply autoscaling and often times, they don’t really need it. The first thing is that it assumes that you know where the bottleneck is in your application. And often times, the bottleneck is not at the front door but it’s someplace else in your application whether that be the database or whether that be some sort of cue or something like that. And so you will see bad feedback loops if you just autoscale the front end. It will actually sort of push the problems around so that you have to deal someplace else. And then also most of the time, autoscaling follows a diagonal cycle and so it’s really predictable when you are going to need more capacity and when you are going to need less capacity.
So I think a time-based schedule for increasing, for scaling up and scaling down based on sort of known patterns, often times will get you most of the results that you want to get with autoscaling without a lot of the complexity and uncertainty that comes by putting sort of a control around it. And I think it’s easy with autoscaling to– it’s essentially a control system and so there is the matter of like how fast you react, essentially what the dampening factor is and that can be really tricky to get that right. If you set it to be too low, then you can be spinning things up and down really quickly. If you set the dampening factor too high, then what you’ll find is that you don’t scale up fast enough when you actually need the load. So it’s a relatively tricky thing and I think it gets even trickier when you start dealing with the custom metric stuff. So my take on this stuff is, and I think for most customers and most users is just keep it simple to start with and I think autoscaling, in general, is not needed by most users.
Rahul: Okay. Yeah, these are all like problems at scale, so for the normal user’s autoscaling or custom metrics autoscaling is hardly so I agree on that. So –
Joe: It’s great if you are actually able to find success with it, that’s really good to hear.
Rahul: Yeah, I mean I have been doing and I have implemented, so we had HPA which is like CPU based autoscaling. We route our own memory based auto-scaler and now we have route our own query per second auto-scaler. So we wrote it with the help of ingress the traffic and also we wrote with the help of Prometheus adapter and Prometheus is a grid tool and it helps us a lot. So that is one thing about scale. Another scaling story I would like to hear from you is about scaling Kubernetes itself. So as of now, we have like large nodes in particular Kubernetes clusters but personally, what I am always excited into like, you also know that there is a federation of Kubernetes clusters but it is always challenging and risky. Have you tried something like that for larger and larger workloads instead of adding more masters, is it possible to create multiple Kubernetes clusters and club them all or just by using some federation techniques. Is it possible to scale Kubernetes in that way? I know this is just a rough thought. I don’t have any complaint about that but I would like to hear from you on scaling Kubernetes.
Joe: I mean there is a special interest group in Kubernetes focused on scaling and it’s something that I have been involved in for a while, not as much recently. But early on in Kubernetes, we focused on getting the abstractions right, getting the EPIs right and I think the first versions of Kubernetes went up to like a 100 nodes. And we have continued to find bottlenecks, fix those bottlenecks and move things forward and now, with the recent versions of Kubernetes, people are running it with upwards of 5,000 nodes. It really depends on what your run rate is, how many pods you are going to be putting on those nodes and such. And so we find it’s actually relatively rare that folks have clusters that are larger than that. It’s certainly possible but it’s the exception, not the rule. Most of the time when people are looking to do something like a federation, they are looking at being able to control multiple clusters that may be running in different geographic regions. And so you might have one cluster in Europe, one cluster in the US, one cluster in Asia. And you want to coordinate across those different clusters.
And the efforts around federation recently have taken a little bit of a reset and I think folks are learning some lessons around what works and don’t work in those types of scenarios. The one thing to keep in mind is that early on, I think one of the things that made Kubernetes successful is that we recognized and this was based on experience at Google, that the type of APIs and the types of Prometheus and the experience of managing a cluster of machines was fundamentally different from managing a single node. And I think this is one of the places where docker swarm and docker and Kubernetes diverge relatively early. The docker folks tried very hard to maintain the exact same EPI for both the cluster and an individual node whereas out of the gate, we recognize with Kubernetes, that the cluster really needed a different set of abstractions.
So I think there is a similar issue as you go from a single cluster to multiple clusters. I don’t think you can take multiple clusters, stitch them together and have it act like a single large cluster. If they are running right next to each other, perhaps you can do that but that’s typically not the case. At the end of the day, the speed of light really dominates these things and workloads, positioning where you could work, really starts to matter with respect to where your database and where your state is.
Rahul: Yeah. So I just wanted to understand like about the scale and it really makes sense like as of now, it supports 5,000 nodes per cluster and we should be dealing and that is I think more than enough and it works. But yeah, apart from this now, I would like to move to from technical discussion to something like how Kubernetes evolved and what’s the future going to be there…? So as of now, just from where the containerization started, there were a lot of orchestration tools available in the market. There was docker swarm, there is apache mesos, there is mesos and there are other containerization tools. So at this point, we can assume like Kubernetes is the most adapted production grade orchestration tools and it is one of the most trusted and people have started using it as a containerization tool. It’s been now a couple of years and people are happy doing things with Kubernetes, some of them have moved their production, some of them in the process to move it. And we can assume that now, Kubernetes is going to be the defacto in most cases as a deployment tool.
So we have all the process to be set up like CI/CD and that also work in progress, I see ksonnet or Ksonnet from Heptio, like it is solving that problem. Other thing is like some of the bigger companies are directly moving from like bare metals or private cloud to Kubernetes. They are planning to move their legacy apps as well. So this is all going good and the Kubernetes community is growing at a huge pace. And now there is something called server lace. So also people are trying to get into that though I haven’t heard really a good production story of the server lace applications other than some tiny apps. And there are efforts being made like how Kubernetes can be ported to server lace, that is cubless from Bitnami. There is another something called Fission which also deploys serverless apps with Kubernetes. So till Kubernetes, when it was used, it was clear that people wanted to solve their scaling and cloud problems and Kubernetes did the work they expected. So what is about serverless fits into it or is it just again, a buzzword and oh, one should not go. Because personally, I think all the apps are not meant for kind of server lace architecture. So what are your thoughts about Kubernetes and serverless?
Joe: Well, I think that the first thing to recognize is that serverless systems and I think when I say serverless here, I am really talking about sort of function as a service, things like lambda. These systems don’t exist in a vacuum. They really exist in the context of a larger platform. So if you look at something like lambda, it’s only as useful as the services and the events that actually plug into it. It’s really rare for somebody to have serverless that doesn’t access a database running someplace else, that doesn’t access some sort of cue, whether it be Kafka, Kinesis or what have you. That doesn’t access some other type of service or react to other events. So the value of serverless really lies into the larger ecosystem that it is plugging into. So that’s the first point I want to make about that. The second thing is that it’s a tool in a toolbox. I think what we find is that the more people start building on top of these systems, there are some things that are totally applicable for a serverless model and then there are other things that really require more control and more depth in terms of the platform that they are recognizing.
And sometimes, different parts of the application fit into those buckets and so I don’t think it’s altogether unreasonable to have a single application where maybe some of the application is running serverless, some of the application is running in containers and then perhaps some of the application is running on VMs or their bare machines. So totally reasonable to actually look at that across that spectrum. That being said as I have looked at the various serverless systems that sit on top of Kubernetes, the one thing that I’m looking for is something that plays well with Kubernetes and ingrates in a deep way into Kubernetes. And I think that there are some serverless systems that sit on top of Kubernetes and they use Kubernetes but from the point of view of the user, Kubernetes is really an implementation detail and serverless systems sort of stands on its own. There is other systems and I think Kubeless is a great example from our friends at Bitnami, where it works well till we use as much of Kubernetes as possible, in a way that’s compatible with the other services in Kubernetes.
So what this means is that the – when you launch a service using Kubeless, it’s available via Kubernetes service as if you launched any other type of service in Kubernetes. So really it becomes an alternate management paradigm for containers versus being a totally different system that sits on top of Kubernetes. Hopefully, that makes sense.
Rahul: Yes, I got it. I also had a chance to try some things on Kubeless and it really works well. So yeah behind this, on Kubernetes and yeah, unlike lambda what happens in background, we have hold of things like what is really happening when we deploy a Kubeless or serverless app and how the things happen. So this is really interesting, all this serverless and Kubernetes. So one question about like as of now, people are in process of moving from or moving their application from traditional server architecture to Kubernetes. So what would you suggest like what are the best practices one should follow if some of the peoples have deployed their workloads, what are the best practices to maintain them and important thing like how people should train themselves in this Kubernetes ecosystem because this is getting huge and huge day by day and we like in this ecosystem, a lot of new people are coming, they are learning themselves. There are also some of the certifications like Kubernetes certified administrators and developers. So also Heptio, I think Heptio, I think Heptio gives some of the training. So, what would you suggest for the people trying to jump into this Kubernetes ecosystem and maintain them?
Joe: So I mean the first thing I would suggest to folks is decide whether you are going to be learning about using Kubernetes or whether you are going to learning about administrating Kubernetes and I think one of the things that we wanted to get right off the bat, really enables say, one party whether it be cloud or a dedicated team inside of a larger organization to create clusters and then provide those as a service to folks that are using clusters. When it comes down to using the cluster itself, I think there is – I wrote a book on it, so I and Kelsey Hightower and Breton Burns wrote this book called “Kubernetes up and running”, that cover some of the core concepts mostly from the using the cluster point of view. But the biggest thing that I would say is understanding the idea, thinking about the stuff that you are running as a programmable sort of malleable thing. So I think as people start using Kubernetes, and you realize how easy it is to actually sort of start things up, shut them down, upgrade, that type of thing. It really starts changing your workflow in a pretty fundamental way. And I think understanding the core concepts and how you can start stringing those things together, ends up being a critical piece of the puzzle.
And then after you understand some of those basics building on top, things like CI/CD, ways to be able to automatically deploy based on changes, doing things like keeping your application description, your manifests and inversion control and then using that as your source of truth. I think all those things come after really internalizing how flexible and how dynamic the system really is.
Rahul: Okay, yeah, sounds good. So I think you are almost like briefed just all the things from where Kubernetes started and what are the best practices for it. So yeah, we love your work and Heptio’s work about contributing to a community and all the tools and we have been following Heptio’s work. So it was great having you on this “All Things DevOps” podcast and thanks for your time and sharing your views about Kubernetes and the ecosystem.
Rahul: Well, thank you so much for having me, it’s been great being on here.
Joe: Yeah, thanks, Joe, thanks a lot. Have a great day.
Rahul: You too, thank you.
Joe: Thank you.