The video for the talk I gave at the 2018 API Conference is now available online.
I have talked about a bit before, as well as sharing the slides, but one of my main take-aways is that we are all (mostly) in the business of building products - on a daily basis, whether we are coding, write docs, tests, change requests, specifications, designs - there is almost always an end product of our work, and the product decisions we make whilst building it has a direct impact on the end-users (people will have to read/amend your code, read your specifications, translate your designs, consume your APIs etc). With that in mind, it seems sensible that we look at what lessons we can take from the discipline of Product Management to help us make smart decisions in our day-to-day.
I have recently started working on a migration process to move our company deployments over to Kubernetes (from Fleet, if you were interested, which was at the time of dpeloyment a pretty cutting edge technology but it is pretty low level, and have to provide stuff like load balancing and DNS yourself).
A colleague of mine had already done the hard work in actually spinning up a Kubernetes cluster on AWS (using EKS) and generally most of the boiler plate around service deployment, so having had a general intro and deploying my first service (single microservice deployed as a Kubernetes “service” running inside a “pod”), which mostly just involved copy and pasting from my colleagues examples, my next goal was to deploy our monitoring setup. We currently use Prometheus & Graphana, and those still seem to be best in class monitoring systems, especially with Kubernetes.
The setup process is actually pretty simple to get up and running (at least if you are using Helm) but it did catch me out a couple times, so here are some notes.
Pre-requsites:
A cluster running Kubernetes (as mentioned, we are using an AWS cluster on EKS)
Kubectl & Helm running locally and connecting correctly to your kubernetes cluster (kubectl --version should display the client and server version details ok)
Let’s get started by getting our cluster ready to use Helm (Helm is a Kubernetes package manager that can be used to install pre-packaged "charts"), to do this we need to install the server side element of Helm, called Tiller, onto our Kubernetes cluster:
$ kubectl apply -f tiller-role-binding.yml
$ helm init --service-account tiller
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It creates a Service Account in your cluster called “tiller” in the standard kubernetes namespace “kube-system” - this will be used as the account for all the tiller operations
Next we apply the role binding for the cluster - here we define a new ClisterRole for the new Service Account
Finally we initialise Helm/Tiller, referencing our new Service Account. This step effectively installs Tiller on our kubernetes cluster
As I found myself repeatedly updating the config for the install, I preferred to use the Helm “upgrade” method rather than install (upgrade works even if it has never been installed):
$ helm upgrade -f prometheus-config.yml \
prometheus-operator stable/prometheus-operator \
--namespace monitoring --install
The above command upgrades/installs the stable/prometheus-operator package (provided by CoreOS) into the “monitoring” namespace and names the install release as “prometheus-operator”.
At the start, the config was simply:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This config (which could have been passed as CLI arguments using “--set”, but was moved to a dedicated file simply because later on we will add a bunch more config) addresses two challenges we faced:
Adding Namespaces:
We use a dedicated namespace for our actual application (in the case above “dev-apps”), and additionally we wanted to monitor our applications themselves as well as the core kubernetes health, so we had to add that namespace there so Prometheus could monitor that as well.
Monitoring the right port:
The next one was more of a head-scratcher, and used up a lot more time figuring it out. With the stable/prometheus-operator Helm install, we noticed that on the targets page of Prometheus that the monitoring/prometheus-operator-kubelet/0 and monitoring/prometheus-operator-kubelet/1 were both showing as 0/3 up.
Our targets were showing monitoring/prometheus-operator-kubelet/0 (0/3 up) monitoring/prometheus-operator-kubelet/1 (0/3 up)
They all looked correct, much like the other targets that were reported as being up, and were hitting the endpoints http://127.0.0.1/10255/metrics and /metrics/cadvisor but were all showing errors registered as "Connect: connection refused"
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
And of the three ports defined by the source Prometheus Operator code, the Helm chart is currently set to default to “http-metrics”, e.g. to use port 10255
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
However, more recently that read-only port, 10255, has been disabled and is no longer open to monitor against. This meant we had conflicting default behaviour across the software - so we had to explicitly override the default behaviour on prometheus operator by setting kubelet.servicemonitor.https flag to true.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As you can see in the above defaulting, it switches between http-metrics port and https-metrics port based on the servicemonitor.https flag. Explicitly including that in our config overrode the default value and it switched to monitor on 10250 and all was fine then.
I expect the default behaviour will be changed soon, so this gotcha will hopefully be short lived, but in case it helps anyone else I will leave it here.
Next up I will attempt to explain some of the magic behind the Prometheus configuration and how it can be setup to easily monitor all (or any) of your Kubernetes services.
I like making stuff & rap lyrics. I helped build developer profile site NerdAbility. I'm also a food nerd.
Interested in food-science? The science of cooking, heat transfer, sous-vide, bbq, fire, anything else? I have a food-science and bbq site called robbishfood, check it out!
0 comments: