Links Mentioned in This Episode
Rahul: Welcome to “All Things DevOps Podcast” at the Bigbinary, and today we have Tanmai from Hasura. So, Tanmai is the founder of Hasura, and I’ll let him introduce himself. Hi Tanmai.
Tanmai: Hi Rahul, hi everybody who’s listening in. I’m Tanmai, I‘m one of the co-founders of Hasura, and we’ve been working with Docker and Kubernetes for a long time. We started using Docker and Kubernetes both just around the time they were both hitting 1.0. We’ve been using them as tools internally to implement the platform and a toolkit that will make life easier and life better for developers who are looking to build out applications, so the primarily our aim is to accelerate process of kind of backend development and to accelerate the process of Devops, and internally we use Docker and Kubernetes.
Rahul: That’s great. So, a little bit…I’m just curious to know about, why did you choose Kubernetes as in platform? You have mentioned that you are working from what Kubernetes origin 1.1. I still believe 1.1 was not that production ready and Kubernetes was just evolving, so what made you guys to take a call that, “Hey, this is Kubernetes, and this is what we’ll be building our product or tools to simplify the life for developers?”
Tanmai: So, I think around that time one of the four pieces that we built out was Data-layer, and the data-layer is still one of the kind of the best pieces of the platform today in our opinion. What the data-layer does is that the data-layer basically creates the data API on top of an existing data base like postgres. So it gives you kind of a high performance Graph API on top of the database. So, it gives you API in JSON, it gives you API as in Graphql which we added recently. So this system basically, we would deliver this layer in fact and so we would have to Develop is the binary on top of existing databases. At the time Hasura as a platform hadn’t come together. At the time we were kind of just building up this. We were just working on this particular component. We were kind of shipping these binaries and we realised, “Hey, you know, this has to become little more convenient. This would be painful.” Building this binary and then moving it around. We just need a better guarantee that it works everywhere. And then there was an associate who is using python, so like shipping a binary and making that portable across different system is not so hard, but then we also had a piece in python and then we were like, “Okay, we need something to solve this problem.” And so then we came across Docker, and we came across containers being used for packaging.
We started using containers and we were pretty excited about it, and because we were so young in this ecosystem we started putting in production without knowing maybe probably shouldn’t have been put in production, but I guess we learned how to do it right. So, we started putting these components inside containers, and then we started installing these containers and deploying these containers, and when we were developing and then working with these components for different plants we realised that it’s just a gigantic plain to do things, like at the time Docker didn’t have that flex, right? So you would have a container and then it will not be supervised. This restart flag itself got later, or the biggest problem at the time, and the single biggest problem was port mapping. So if you are kind of developing with containers and you’re deploying containers on a system we establish heads staging environment for all the developers. You have to do this trick with port mapping, right? So you have to say that, “Oh, okay, the database come the data layer plus the API layers, plus the API gateway. These will run on these particular ports.” But now you have a shared staging environment, right? Where you want to run multiple containers together, and then you are like, “Client one is on port 30,000 to 30,100, client two is on port 30,100 to 30,200” So, we did a lot of these fairly tedious stuff, right? And we were like, “Hey, you know what? This has to become easier. We need some kind of orchestration to either set up these containers, figure out what the right ports are, set up the mapping, build little bit of supervision at least so that we don’t have to manually manage so much of it, capture the loads, keep it somewhere.” And then we started kind of figuring out what kind of tools to use, and around this time also or maybe a little later Dase Evan was coming out. They had written their own orchestration. They had written their own kind of orchestration system on for-for containers, right? And when we wanted to figure out how to do this orchestration we came across Kubernetes, and then we kind of studied the architecture and the abstractions, and a few of the pinpoints that we were looking at solving, and the elegance with those particular problems were solved by Kubernetes. We were basically sold, and as I said when we just started using Kubernetes, and it was absolutely not production ready, in fact I would argue that it’s hard to run today also for developers. You need to dedicate Docks people, you need to dedicate Kubernetes people to be actually running it. There’ll be some… so, we’ve run some of the largest head in Kubernetes structures in the world, we have a huge multi-client cluster that is running kind of code and coding production. It power so it’s not really production for our users, but it’s production for us. We run into all kinds of weird issues with Kubernetes and today things are just impossible to debug, things for which there’s been no kind of monitoring that has been implemented yet. So, it’s been that kind of long journey with Kubernetes for us, but making that decision at that time was absolutely the right thing to do, and along the way things like swarm and volumes came along, and we were like, “Hey, you know what? It’s not going to happen.” I think Kubernetes is basically it because is the plugging architecture and whatnot, right? A lot of things make sense. It was clear that the Kubernetes had an architecture that would make it easy for the ecosystem to add more tooling to the Dockers Station System, and that’s what happened over the last few years, that’s where it has become so exciting, it’s become so popular.
Rahul: Yes, great. I thought like Docker was there but did we needed orchestration tools and basically the problem is like port mapping, super region, then we have deploying and enrolled with Kubernetes, and I think that’s the reason most of the people choose Kubernetes over Dockers swarm because of its simplicity and it did what people expected it will come from a production ready orchestration tool, that sounds interesting. As you mentioned you are running your tools, something based on postgresql and postgres is a used database and as far as they start with containers and ecosystem it is said like containers would at least in all the stages containers is for stateless applications. So, when it comes to some services like databases and some persistent data retaining thing, what is the strategy on containerization, whether it maybe Dockers swarm, Kubernetes, apache mesos, ECS or something. If it is about eight full applications that is always going to be pinned, but do you have been using Kubernetes. So, I’m just curious like how are you managing database as in it’s obvious with your role in deploy with your container, this starts with Kubernetes, and how is your Kubernetes hosted, how is it scaled, and just a brief high level overview in case, just because or why are you still struggling or running legacy or database services on containers?
Tanmai: Sure, so I think… I mean it’s a very broad question. So, what I’ll do is specifically just zooming into postgres and zooming in just databases on that part. What we’ve been doing is we’ve been running postgres or we used to run was postgres as a replication controller the first time more than a year ago with our back buyer volume which we used to kind of manage with some ad hoc tooling around it, and then over the years we kind of moved most people on…we moved most of the instances of our platform to have a single instance of postgres. So, as a developer if you are creating an account in Hasura and you have an instance of the Hasura platform, then that instance come with its own instance of postgres and this is a single node version of postgres which runs on top of the deployment pack by PTE which is absolutely okay, i.e., almost zero management overhead because postgres itself is very resilient to arbitrary starts and being reshuffled around so, that’s generally not a problem. We end up having to do a little bit of not doing cattle things, but doing pet things. We end up maybe to do a little bit of that, for example if you need to freeze postgres to run on a particular node, and then you have the PVs also being mounted on that particular node. I mean you don’t have to actually came out at the PVs on that particular node, but you kind of want to run the postgres deployment stateful set on certain nodes. So, we kind of want to freeze that to make that decision making faster for the Kubernetes for the control loop so that it doesn’t have to shuffle postgres around the different places. So, we do a few small things like that to make it easier for us to run postgres, but honestly we are not in the business of running and manage database. And so that means that whenever we come across tooling which makes it easier for us to run a cluster of postgres databases then we kind of offload work to that, and depending on the different clients that we have, they have kind of different versions of manage postgres that they are comfortable with. So, you can either…some of our clients are either run on the single node version postgres that we provide because they don’t need that much skill that has a back-up, that has a regular like PAPR back-up, I’m not sure if you’ve heard of the postgres tool wal-e. So there is enough tooling around postgres in the postgres community to ensure there’s this kind of minutely back-up that is happening, it actually streams the wal.
So that kind of back-up plus single node setup works for some clients, for some clients they have it offered it on to RDS or on Google version of postgres. So that’s kind of what the system looks like. Internally our team have been playing around with things like patroni, but then we just get a feel for it, but if as soon as it’s mature enough we’d have a patroni controller that runs inside, our Kubernetes cluster to make life easy for us. So, that’s kind of how we currently have to handle postgres. There’s another stateful component that we need to handle, we provide redis as well as part of the Hasura platform, and so that is another stateful component that we have to manage which we’ve kind of being making do with using node selectable, using persistent volumes, and using deployments intelligently.
The problem with running these things as a stateful set right now is that there’re certain edge case that may happen, stateful sets as painful as managing independent deployments backed by persistent volumes. Especially in cases of kind of network failures or election needs to happen or a leader election needs to happen. It becomes a little painful, so what you end up having to do is the management overhead that you have is the same as you have to go in and debug the stateful set and working with that is equivalent to having multiple deployments each of them persistent volumes. So, if they jump from deployment postgres is not advantageous enough it doesn’t actually help because the tooling around something in that case is especially like leader election in case failure is not easy. So, the ecosystem that is still maturing because almost all these data bases need to have controllers, you need to have dedicated Kubernetes operators that will manage these databases. The databases will not run by themselves as a stateful set. That is probably never going to happen. So, that’s kind of our take on databases at this stage.
Internally, and this is especially not a problem for us at Hasura because each user or each Hasura user gets an independence instance of Kubernetes, gets their own Kubernetes cluster with the Hasura platform installed on that Kubernetes cluster. So, we don’t have these problems of having to do multi-tendency for 50,000 user on a single version of postgres or on a shared distributed postgres. So that’s not a problem that we have to actually worry about too much. So that’s kind of our take on this stateful component of data. The architecture, coming to the second part of your question, the architecture of what are Kubernetes deployments look like, so we have our own CNCF version of Kubernetes which runs on coreos and then has a few specific Hasura agents for monitoring, certain kinds of failures that some other tools that we have, but they don’t provide out of the box especially around failures of the Docker Demon itself. There are some cases where you run out of memory in node, memory in CPU where the Docker engine itself fails and becomes unresponsive or occasionally there are some case where you run into corruption of IP data corruptions or cases where you run out of PCB memory because one of the ports that was running had a socket leak. It was leaking sockets, and it wasn’t closing sockets while was opening them. So, after some point of time it would corrupt the node. So for monitoring these kinds of failures we have some of our like these homegrown agents that we deployed on these failure nodes. So, we have that particular distribution of Kubernetes that we use, but again we are not in the business of doing manage Kubernetes at the moment and so this is not something that we want to do, and so whenever possible we kind of want to offload that work too or manage Kubernetes vendor.
So, today, just for today we run using our version of Kubernetes, we run on all the clerk providers, we run on GKE, and we just went live with providing our Kubernetes, a single node Kubernetes cluster on Digital-ocean which is kind of good for early production or staging environments, and then we are going to be launching Hasura as a platform or allowing users to create Kubernetes clusters on any of the cloud vendors. So, on GKE, on Google cloud, on the Amazon. That’s kind of what our internal Kubernetes and architecture looks like.
The kind of automation that we do on top of that is to make it easy to migrate from any particular cloud vendor to another cloud vendor. So, we have and inside this is highly specified end of what provider…it basically in ingress code. So what providers, what node, what basics you have, and then you apply the configuration and we will lift and shift the entire Kubernetes cluster from within multiple intra-cloud providers, so maybe you want to move from Singapore to New York, or maybe you want to move from AWS New York to GCP US East too. So, that lift and shift of the Kubernetes cluster are also something that we have. It is also type that we build up. So, that’s kind of what the set up looks like. Does that either makes sense?
Rahul: Yes is, thanks for giving the detailed overview of your infrastructure and I really, I agree with you. Dealing with the persistent volume is always not going to be that straight forward, and then outsourcing the manage Kubernetes part to someone else if you are busy in developing an application that’s the right way to do so. As far as we have been talking a lot about Kubernetes and sort of it’s like there was those other customers postgres APIs you are providing on top of databases of postgres just whether it is requested on Azure RDS and all. So the second thing before going to gitkube I’d like to ask you about…regarding authentication. This is open that Kubernetes doesn’t has a great authentication or authorisation although there is a backend… RBAC is good but it’s still not or people have mixed feelings about rbac. So, how do you handle authentication with your Hasura?
Tanmai: So, I think again it’s kind of multiple different topics here. One of the things is it’s a very valid pinpoint. This is a very good question because this is one of the big pinpoints that we have in any Kubernetes setup. So, let me break it up into two of the different things. For example one of the…as a user when you create Hasura account and you get a Kubernetes cluster if you run a free deal you are actually on a multi-tenant Kubernetes cluster. So we have gigantic Kubernetes clusters, not gigantic but between 50-100, we have noticed our spots will be around 60, and then after 60 you just create another new Kubernetes cluster. So, we have users on the as multi-tenant users on this one Kubernetes cluster, and for that kind of the rbac and the PSPs are super handy. Mostly we have been able to solve almost all of our problems using Kubernetes as RBAC, but came out along with the Calico for reinforcing certain network or authorisation our network, authorisation rules. So along using Calico and rbac in PSPs we’ve been able to kind of create a good multi-tenants set up, but we have multi-tenant operator also that kind of managers the creation of new projects on this multi-tenant cluster and deletion of those reaping them when they are a still or powering them down when it’s not active. So we have an operator, a multi-tenant operator, Kubernetes operator that sits inside the multi-tenant cluster but also handles this. So that’s kind of how RBAC helps us internally in that multi-tenant environment. It’s just very useful. I don’t have any complaints of Kubernetes at that. The other side of it is providing authorisation or authentication. So let’s say you signed up on Hasura today you will get your own Kubernetes cluster. You want to add a provider to this Kubernetes cluster, you want to add multiple providers, you want to have a developer, you are one person, you want to add another person who can render help and also what to do on this Kubernetes cluster along with a particular rbac tool.
So, what we are working on right now is basically making that process happen by extending Kubernetes’ authentication or authorization mechanism through…Kubernetes has a web hook method for extending the authentication and authorization requests both of them, and what we do is we kind of extend that for every Kubernetes cluster that we create. We extend Kubernetes authentication by adding Hasura Kubernetes authentication to it, and then the Hasura authentication and the system comes to the dashboard which allows you to create and remove and assign several kinds of controls to users dynamically. So, your dashboard becomes that Hasura dashboard. Whenever somebody is using Kubectl or Kubernetes API server using one of the tokens that we are providing through our dashboard it gets authenticated by the Kubernetes API server through the hooks that we have. So that’s kind of the setup that we’ve created. This seems to work well. It would be ideal if there were more kind of sophisticated methods of doing this, but I think it’s not a big thing going for us, it’s something that we’ve developed and then are something that we might also look at open sourcing sometime soon.
Rahul: Nice, nice. So this really looks good, like managing authentication systems from UI…so proceeding, you also have files to the EPL, so first just basic questions. Is it objects storage type file system, is it stored on the cloud or is it replicated, and is it using cashing when users upload download frequently, and what is the scalability and their limits and pricing for the file to the API?
Tanmai: So let’s just get back a little bit, I didn’t introduce kind of the way Hasura works itself. Hasura is basically a platform that’s in top of Kubernetes that accelerates two portions of development. One of them is giving you a bunch of back in the APIs with integrated OAuth. that you can directly start using in your apps, in your client side code, in your frontend apps, in your web apps and whatnot. The other portion is setting up your githubs for you so that what you can use kind of git based automation to deploy everything from database migrations to changing our API configuration, you want to change the sub-domain that a particular service is running on or that is exposed on. You want to change the domain names or you want to add multiple domain names or if you want to just do basic CICD of deploying your code, right? So all of that is powered through just git. The only thing that you need to know is git, you don’t need to know anything else, there’s no specific tooling that you need to have for database migrations, there’s no specific tooling that you need to do for updating configure maps or cloud or custom resource definition, or a custom resource. You don’t have to do any of that. You just get as a developer and a chunk of this Devops portion is kind of taken care of for you. That’s kind of what the platform does. The way we’ve designed this platform is that we’ve created a bunch of stateless APIs which are all backend APIs which make it easy for you to do these hard bits of your application. So these are three particular things that you do; one of them is data APIs, one of them is Filestore APIs and one of them is authentication APIs. So, these APIs are basically micro-services that are running on your Hasura Cluster. So, today if you create an account on Hasura you will get a Hasura cluster or a Kubernetes cluster which will be running independent instances of each of these micro-services. So, not a S3 like API which is multi-tenant, which is supporting up different users which has to handle multi tendencies for different users or different clients. That’s not the case. It is a micro-service that is initiated for you. You can scale it up and scale it down horizontally just like you want, and these are designed to be horizontally scalable, and these are designed to be vertically scalable as well. Each of them for example has multicoreware there’s supremely light weight components addressed Hasura micro-services will take like barely a few hundred MBs combined. So the data API itself takes about 20 MB, the OAuth API will take another few of the 20-25 MB. So these are tiny, absolutely tiny light weight services. If you vertically scale, like you vertically scale your nodes they are multiple aware. They kind of scale to exploit concurrency. They are able to provide data concurrency by just using multiple holes, and they are also horizontally scalable because they don’t store state, and they will scale as you scale Kubernetes cluster as well.
So, that’s kind of what the Hasura platform is. Now, the state which is storing the state is separate, it’s outside the work of the Hasura platform. So, even internally when we think about working it’s done by a different team. We offload back to postgres, we offload to…in the case of the Filestore API we offload that to either some other kind of object storage that you want to use or the disc itself, and then whatever kind of method you use for replicating of the disc, I mean it’s an independent concern, so tomorrow if you want to use Ceph, or you want to use Minio or you want to use S3, all of these are possible. Our Filestore API earlier supports all of them. Our Filestore API is designed to have streaming, it’s designed to be integrated with OAuth, I’ll explain what that means in a minute, and is designed to be vertically and horizontally scalable. It’s because we don’t want to deal with the state. When I say integrated with OAuth, it’s because the problem that we want to solve with Filestore API is not the problem that Minio want to solve. Minio is solving a very different problem. The problem that we want to solve with Filestore API is giving the application developers a simple five upload-download the API then they can use in the app without thinking about how security works, how collaboration and sharing works. So, for example, if you have to use S3 a lot of developers if they are new to the ecosystem would end up just using AWS, IM tokens and embedding them in the apps. This is called countless breaches where these tokens because they are not sure about how to do this. A common way that people this is they all upload to their own API server, their API sever then takes that and upload that to S3 which is inefficient because you are not using screening, handling large file objects. It’s kind of hard to do this proxy to S3. So then you have to create short lived S3 tokens but suppose if you want to implement sharing, suppose you want to implement a Google docs like setup where, you know, If I upload a file I want it to be shared with my friends or some arbitrary definition of what to share is defined by minio.
So, that’s the portion that we want to make easy. So Filestore, when I say it’s integrated with OAuth, it’s integrated with user OAuth. And so, Filestore data and auth are the pieces that we provide to application developers. This is not on the offside of things but on the dead-side of things. So that’s kind of how the Hasura platform works, and from pricing point of view the way the Hasura platform works is that we charge the user based on the underlying cloud cost. So at the moment we don’t actually market up, we just give it to you at the base cost and correct for certain transaction fees etc., and what we do is let’s say you want to create the digital version and you want to get the 2GB ram, 1core or you want to get the 64GB ram, 16 core Docker then that’s the charge that you will be paying for Hasura for whatever, however you want to use the APIs, you would pay for the digit version, the structure, or you would pay for the Amazon or the Google structure. So that’s how the pricing works. It’s not based on the usage of the API, the gigabytes or whatever it’s all based on the intra-structure cost that you would have been paying anyway. That’s kind of how the Hasura module works. Unlike each of these data layer or how to make these object storage layer, how to make this authentication layer. We are providing the API gateway also. So there’s an API gateway under that, but with authentication into that. So those are the layers that we provide, these are kind of the building blocks to the data application. Does that really make sense?
Rahul: Yeah, it makes sense. So all of these points, and I’m really happy to know that all of your tooling and architecture flow is truly going towards cloud network or a CNCF of that, and this is where the world is moving that’s for sure. So that’s really interesting and the way you are handling authentication and Filestore is really efficient I think. So moving on to the micro-services and the most top product from Hasura Gitkube. So, I will again say it’s the Heroku for your Kubernetes application, and after you opensourced it I went through the architecture and it’s fairly simple, but though it’s simple we can say that you’ve done a lot of hard work, and this seems to be early working. Again congratulations for building up such a nice community tool and I’m sure this will help out the community and users and it will reduce the pain of integrating CI/CD pipeline with Gitkube. Before asking anything from me I would like to hear from you about Gitkube. Yeah.
Tanmai: So, like the way you described Gitkube is pretty perfect. The idea is to get Heroku on your own Kubernetes cluster without any external dependencies. So you don’t need to sign up for another tool, you don’t need to worry about…especially an enterprise where it will not be…where adding more services is always a pain. You have to go through process, you have to go through some control process to add another service. You don’t want to do any of this. You know that your typical workflow doesn’t involve doing some kind of container build and Kubectl apply you just want that portion automated, and that’s exactly what Gitkube does. It sits on your server, it has a very simple neat architecture, and all you do is you just give push to this amount. It uses standard methods of authentication that you would expect if you are a user using git which is basically public-private key. So you upload your public keys, your ssh keys on the git mode which is basically gitkube, and then you just do a git push and the Docker build and roll out is kind of done for you.
What is extremely exciting for us is because each kind of gained a bit of attraction in a short period of time. What is super exciting is that we want to convert gitkube to become a layer for doing more and more githubs work. So our kind of opinion on the world is that if developers know how to use Docker and Kubernetes then they pretty much don’t need any other tooling. You actually don’t need most kinds of CICD tools. You need pipelines, but your concept can just be Docker and Kubernetes and you can actually implement every single thing you want from testing to jobs to migrations, to whatever you want. The next gitkube or which is just kind of what we are working on, we are going to allow our users to write their own automation. So let’s say for example after write to a git push and a deploy you need to update a CDN gash, or let’s say you need to run a migration job, let’s say you need to start some kind of a job that does some kind of stateful activity, or you need to maybe update a particular config map or a kreg. So, any of these actions that you need to do from inside the cluster are things that you can add as your own hooks into gitkube so that after a developer does a git push you can basically start automating more of your work flow. So this automation substrate is what we want to create with gitkube.
Gitkube came out of the work that we were doing at Hasura, because on Hasura also we give you a git push to deploy workflow. Gitkube is a tiny portion of what we do at Hasura, it’s what Hasura will do a Gitkube a lot of things happen. Things from updating a Kubernetes configuration to waiting on that Kubernetes configuration to get updated, and I’ll come back to that in a second to applying database migrations to also finally doing the equivalent of what Gitkube does, which is just Docker build and. So all of these things are done by Hasura, and Hasura uses Gitkube internally, and what we now want to do is kind of make that into an automation platform so that every devops can get easily write their own automation and to do whatever it is required because, and I’m sure you can understand that writing a script in your favorite language, whether it’s python or your favorite language probably not bash, but let’s say python or bash or just want to use the common language, whatever you want to do. Writing those scripts inside the environment of your Kubernetes cluster where you know all these security and the access control is worked out, and then using and having that script execute certain actions or CACD actions or Devops actions are tasks for you inside the Kubernetes cluster it’s just much easier to reason about that to have an external tool that kind of need to access you Kubernetes cluster, update a few things, then if to wait for those updates to happen because everything in Kubernetes is asynchronized. So you can’t update a config map and then expect the API get the configuration has been updated. You can’t. You have to observe or wait for that to happen. And writing all that logic is just much easier if you are in control of writing your own script in your own favorite language inside the Kubernetes cluster. So that’s kind of where we want Gitkube to go, and to kind of power this movement of GitHub that is going to become really popular and I think this year.
Rahul: Yes. So, really interesting the way you have adopted and you’ve simplified all the things, so a couple of or things regarding Gitkube, so I’ve been seeing the way…how to provide some user authentication keys or a Dock register keys while using Gitkube. So are you using something like or any key storage management? Tanmai: No, not for Gitkube. So I think Gitkube is not intended to store any application level secrets like the way volt is. Gitkube, the only code and code secret is not really a secret, but it’s a configuration. The only configuration that Gitkube needs is your configuration for your basically ssh public keys. The other configuration that it needs to store, the secret that it needs to store is like you said the Docker registry. So, this is also in my opinion something that is okay right now to provide as a configuration instead of a secret, I mean it’s okay as a Kubernetes secret and the configuration, you don’t really need to use part of this right now because most Kubernetes vendors are going to have a kind of deep integration of their own container registry on their own Kubernetes vendors. So the need for adding our registry secret, right? This is going to decrease. So you don’t actually need to provide a Docker Registry secret like you do now. This problem will kind of get elevated as Kubernetes vendors mature, and our focus is on reducing complexity.
So, we don’t want to add more complexity by showing you, “Hey, here’s how you can implement three, four different tools to kind of handle your secret configuration.” We want to make it as dropping and as easy to use as possible, which basically means that you kind of dropping your git mode, add the research keys, git push and you are done. Even the process around the registries right now will get simplified as the ecosystem matures. The current stuff get measured is to specify your own registry secrets. Does that make sense?
Rahul: Yes. So, the push is the…I think we wanted that but by any chance might be the wrong question, but have you thought of coming up with the UI for Gitkube?
Tanmai: That’s needed, yeah. We haven’t thought about implementing a UI for Gitkube. So, what do you think the UI would look like, would it be basically…it would kind of tell you like success or failure of the things that you’ve done when you did a git push, is that what you mean? That kind of a UI?
Rahul: Yes, and the pipeline work we can see… Tanmai: Also pipeline, yeah, true. So, we’ve been thinking about the step one of that process is to kind of make sure that the automation that you do which currently is just basically doing a Dockable and keeps you if you will apply or tomorrow when you add your own events, or your own scripts in that pipeline. We want to make it output the format or output the pipeline event, the success or the failure or what events are happening, what events are succeeding or what events are failing, or what events are still going on. We want to make this output happen in a textual format and in the JSON format, so we are thinking about outputting them as logs right now or we are thinking about writing them to an external config map or a custom resource, and then the UI or UI can be built on top of it that would basically read from that config map or custom resource. So, Gitkube itself is basically executing certain pipelines and is writing the information on those pipelines into another Kubernetes object, and the UI is basically stateless. It is just reading from that Kubernetes object to do things, it’s not like you can execute things from the UI because the triggering of the actions on Gitkube, the philosophy is that it should happen through git events. So, either it should happen through a git push or it should happen asynchronously through hook.
So maybe you need a git push on your Github and when it is executed and something happen on Gitkube. So maybe that’s one bit. Those are the two ways that we want to trigger something. We don’t really want to have like a triggering mechanism on UI because that again makes this a fairly complicated tools and they are already direct tools that you can use for doing this. Either in the github pipeline or so that’s the kind of take that we have on it right now.
Rahul: Yeah, nice. Congratulations and that’s really a profit thing came out on the time, and community will be really happy to integrate Gitkube with their own deployment and automation…
Tanmai: Yeah, we are looking…it’s super for us and we are looking really forward to see what people say as they use it, what kind of features they want, we’ve already started getting some already feature requests, and the work is already started on that next release.
Rahul: So before sometime you mentioned about, you wanted some better monitoring and handling of processes on Kubernetes cluster and as we all know maintaining a Kubernetes cluster isn’t that straight forward, and you mentioned that you had your own agents for monitoring and catching those things and making it work easily and for managing Kubernetes cluster. So, is it possible for you to talk about what kind of monitoring agents you got because I think there’s Prometheus, that it’s something like influx to be grafana, and I think we used Prometheus and if we are dealing with Prometheus queries it’ll give us what we want but the valid question or the valid point is like, “Docker demon will go down.” You mentioned about IP cables, high in performance issues and all. So those are still valid issues if you are managing a large scale Kubernetes cluster. We all face it day-by-day node contains deadline accident error, and there’s no other alternative then restarting or removing the node. So, those all of the issues. So I’m just curious to know like, how have you overpass these issues of API clusters?
Tanmai: I’m unfortunately not the right person to answer that, but I can answer, probably I can answer some of these questions at a surface level, but my short answer to this when people ask me is, we have a team. So, we have a team of three people in fact, tiny team which is basically just been doing this for a long time, and what we have is kind of like a homegrown collection of scripts of deployment sects of agents that are loaded as and run as binaries on coreos itself so they are not a part of the Kubernetes ecosystem, they are independent because if you are inside Kubernetes then it’s kind of hard to monitor some things about Kubernetes itself or you might have access to lower level details especially things around storageha are hard. You need to query the kubelet API directly that is running on each node to kind of figure out what is happening at the node level.
It’s a lot of homegrown stuff, it’s not unfortunately something that is very formal, it’s something that I think we should have probably worked on make using that as a formal opensource component that we unfortunately not have time for. Basically, it’s a bunch of different kind of scripts they do a lot in different kinds of ways. We integrated directly with our email notification system that we have internally for our developers and for our office folks. So it integrates with that email or that clouding system directly. We do not have a layer or abstraction on top of it, like the prometheus are having other kind of developing system because we did not want to configure all of that. For us it was just faster to write, as in we’ve been writing a lot of these Kubernetes agents for a long time. So, it’s just simpler for us to write the components that we need to because we know that the issue that we are facing is probably going to be an issue from 1.6 to 1.7 and 1.7 to 6 issues. So, it’s kind of like a lot of the work that we’ve done is just top gap. We used to have certain issues that would prevent it from 1.2 to 1.5, and then it went off, but now we have certain issues on 1.9, 1.10, but again just almost 100% going to be stop gap till the next Kubernetes version comes out.
A lot of this is very informal, a lot of these agents that we’ve written are really informal and we just kind of deployed them. We hook that into our Kubernetes cluster creation process. When we create a Kubernetes cluster on a management or on our own Kubernetes cluster one step of that provisioning process is basically installing our agents on it as well depending on the Kubernetes version that is running. That’s kind of on a high level what the system looks like. Specifically I know of few things that we monitor, but I’m sure that the team internally monitors a bunch more things than just that. So yeah, that’s kind of the vague answer to that question.
Rahul: Okay, thanks. I truly understand that now you are into CIC doing pipeline and simplifying the things for your tool using gitkube and all. Have you ever thought of how these things would go in server-less world where people have started working on serverless tools and this kubeless and fission and all who are running Kubernetes…?
Tanmai: I have some issues as a developer against the notion of serverless, but then I think about serverless is there’s one major Pro and one major con. The major pro of serverless is that the economics of serverless are very good. The con is that they developer experiences is sht, it’s utterly sht, it’s really bad. I as a developer if I had to write my code as serverless functions I would hit my life, because I do not have access to my favourite tools, I do not have access to my favourite framework, I do not have access to my own deployment tools, I do not have access to my testing tools. All my concepts would break, testability is reduced, lucrative development is harder, replicating those environment especially if you can’t run on the public cloud you have to run on– then it’s just a pain. So, that’s what I don’t like.
The pro is that the economics are very nice, it means your function executes when your function executes. I think that serverless itself and the way I think this will evolve is that I think serverless itself is also becoming of two types; one is that there is, I have met a large number of people who conflict serverless with lambda. So, serverless and lambda are different things. If you look at lambda, lambda is the idea that you have a function, and then you specify the body of that function and the function executes. This is a dirty hack. This is a dirty hack because you are relying on function signatures to do input-output from the function or you are relying on SD and SD has to do input-output to the function. There’s a lot of pain around that. The other aspect of serverless is the definition of serverless. I think the definition of the serverless from the CNCF is that you pay for what you use. This is the core idea behind serverless, the core idea behind serverless is that you are only paying for resource when you are using that resource. That is what serverless is. So if you take that aspect of serverless then you actually don’t need the lambda portion of it, lambda works really well for some specialised use-cases. So, lambda works brilliantly for AWS, because AWS has so many internal systems that benefit with lambda, like SNS, SQS. You have a tremendous variety of ecosystem of AWS tools where you can hook into those using lambda, that’s great. That’s another benefit of lambda, but the serverless benefit is that you basically want to do.
Another way that you can reach for use by auto-scaling your Kubernetes. Auto-scaling kubernetes objects and auto-scaling the infrastructure underneath those Kubernetes objects. The auto-scaling tech is not very mature right now, but as the auto-scaling tech becomes more mature, maybe it becomes beyond just doing CPU and ram, maybe you can control that auto-scaling in a better way looking at different metrics, make some tools that are already there but as that improves your entire Kubernetes cluster itself will become serverless, except for the stateful portions, ignore those, but the stateless potions of it are pretty much almost serverless today. If you have simple work especially people would deploy jobs on Kubernetes where the clusters are just scaling up and down automatically that’s exactly what serverless is unless you have a situation where you have zero traffic, which probably people at scale will not have. So, if you have a situation if you have zero traffic, and you have no Kubernetes cluster and from zero Kubernetes cluster you are going to have Kubernetes cluster which has a pool of size five, okay that’s different, that’s hard to do right now. That’s kind of what we do in our multi-tenant but that is harder to do. But let’s say you have finite traffic. Finite traffic to increasing the spike to lowering it, you are actually doing a pace you use. You are actually paying for what you use, because you will be using CPU and ram. And so that evolution is going to happen in the Kubernetes world itself, and that kind of evolution will actually preserve the developer experience, because your developer experience is still with containers, and with containers you have stability, you have environment, you have your control, you have everything. You have joy. Yeah, you couldn’t have said this few years back because with containers you don’t have joy, but now with the containers you have joy, right? And so this will become serverless. So the evolution of serverless has to now use containers as substrate instead of using anonymous functions or lambda that substrate. So that’s kind of my better take on serverless. It’s very interesting to see this evolve but one of the major benefits right now of serverless is the economics of serverless. There are some different discussions around serverless which deal with the edge cloud which is the stuff that cloud flare and whatnot and a few people are working on with the idea of executing your functions on the edge, that’s different, but those will also probably mature from executing the lambda to executing containers. So that’s kind of my take.
Rahul: Thanks, and really interesting, and I agree with some of the points like serverless or AWS lambda is not for all these usecases, and if it for certain usecases. I still think there’s a long way to go, but the serverless is also going on with all these container Kubernetes and ecosystem world. Thanks Tanmai for your time and all the information you’ve given throughout the episode. It was really nice having you, and thanks a lot.
Tanmai: Absolutely, yes, I love having this chat Rahul, thank you for putting this together, or for anything if people want to reach out I’m sure you will have ways of people reaching out. I’m usually always hanging out on Twitter or on our website.
Rahul: Yeah, of course. Thanks a lot and good luck for your talk at Kubecon.
Tanmai: Thanks, thanks Rahul, bye.
Rahul: Have a nice day.
Tanmai: Good bye.