Future XMiDT with Kubernetes

While caduceus, petasos, and scytale seem like perfect candidates for kubernetes, I am curious how to get over the routing problem from device to talaria and from scytale to talaria. It seems like Talaria could register twice with consul, one for scytale and the other for external devices. This might require some work in order to support this.

In addition to the routing problem how can we make the deployment process easier to deploy to kubernetes. Right now there is a pr for adding helm charts xmidt. Should each repo have its own helm chart? Should the main repo have all the charts? Maybe we should just have the raw kubernets yaml files.

I’m curious what the community thinks our path forward should be with kubernetes.

Talaria should be deployed to k8s as a Statefulset instead of a regular Deployment. We use nginx reverse proxy so that the talaria fqdn can be resolved by external devices connecting to webpa and the k8s DNS for internal services. This has the major drawback of connection duplication due to the use of a reverse proxy. It would be useful to use consul to resolve the internal and external talaria names. Probably requires a reimplementation on some services. Petasos could detect if the requests come from the k8s ingress or from an internal service (e.g.: scytale) and forward the requests to the correct external/internal talaria endpoints.

Okay, cool so for future work to make this better:

  • Have all services have the ability to get the hostname fo the container for routing.
  • Have talaria register with consul twice once with the internal service name and external routing name
  • Have scytale, listening to consul and forward all requests directly too talaria.

Anything else I missed?

In regards for deploying, @dneves do you have any opinions on how this should move forward?

We have two options:

  • Petasos checks consul to resolve the internal and external requests
  • Petasos checks consul for external endpoints and Scytale checks consul for the internal endpoints

As for deployment, we setup an internal Jenkins + Azure DevOps pipeline. Jenkins builds the docker images and Azure DevOps deploys to k8s. But this can be done in so many ways that I think it needs a more in depth discussion :smiley:

1 Like

@kcajmagic we have been exploring this tool https://www.spinnaker.io/ for CI/CD. What are your thoughts on it?

I’m not with @dneves on this one. There are no real reasons as to why Talaria should be reached directly from the outside, the industry’s best practice is to have reverse proxies doing their things like mTLS, role-based API auth and the likes; you wouldn’t off-load this to talaria itself, would you?

So we won’t get rid of reverse proxies, and we’ll have to handle the tuning and monitoring of these ingress pieces of software. It’s only a question of having a nice way for petasos to send a Redirect with the correct Location header, I mean, with the public FQDN ingress address of the selected talaria instance, instead of the internal k8s FQDN.

In practical terms,
You have two talaria instances in your cluster, with a headless service resolution that assigns FQDN names as such:

  • talaria-0.talaria.webpa.svc.cluster.local
  • talaria-1.talaria.webpa.svc.cluster.local

Those FQDN are the ones the whole internal ecosystem should use when needing to talk to talaria, but that’s only for workloads running inside the cluster1. If one client wants to get onto those talarias coming from the public cloud, they’ll have to use FQDN like:

And assuming my very own 1, workloads running outside the cluster BUT on the local network, would use a third way of reaching these talarias, as the .svc.cluster.local wouldn’t be reachable by them. If you have a DNS server you can manage on that network you can assign some dns zone to reach all the ingress nodes, then it’d be a matter of deciding on a FQDN that’d expose those via the ingress of the cluster. If you have no DNS you can manage, you’d have to be creative like using node ports on a predictable way or even - god forbid - using host ports.

1: this is assuming you have a standard kubernetes with an overlay network like canal or flannel; all of this changes when you have any other CNI, and would have to be adapted to that specific case.

1 Like

@suvl I think we should continue to use reverse proxies for external communication with the cluster. But we should design a better solution for talaria. Double proxying seems overkill. But we can search other ways that don’t envolve changing code :slight_smile:

@suvl my worry with using a reverse proxy is scalability.

@dneves I think spinnaker is a good tool, especially when doing a canary for production.