Kubernetes & Prometheus: Getting started

I have recently started working on a migration process to move our company deployments over to Kubernetes (from Fleet, if you were interested, which was at the time of dpeloyment a pretty cutting edge technology but it is pretty low level, and have to provide stuff like load balancing and DNS yourself).

A colleague of mine had already done the hard work in actually spinning up a Kubernetes cluster on AWS (using EKS) and generally most of the boiler plate around service deployment, so having had a general intro and deploying my first service (single microservice deployed as a Kubernetes “service” running inside a “pod”), which mostly just involved copy and pasting from my colleagues examples, my next goal was to deploy our monitoring setup. We currently use Prometheus & Graphana, and those still seem to be best in class monitoring systems, especially with Kubernetes.

The setup process is actually pretty simple to get up and running (at least if you are using Helm) but it did catch me out a couple times, so here are some notes.

Pre-requsites:
  1. A cluster running Kubernetes (as mentioned, we are using an AWS cluster on EKS)
  2. Kubectl & Helm running locally and connecting correctly to your kubernetes cluster (kubectl --version should display the client and server version details ok)

Let’s get started by getting our cluster ready to use Helm (Helm is a Kubernetes package manager that can be used to install pre-packaged "charts"), to do this we need to install the server side element of Helm, called Tiller, onto our Kubernetes cluster:

$ kubectl apply -f  tiller-role-binding.yml
helm init --service-account tiller


The above does three things:
  1. It creates a Service Account in your cluster called “tiller” in the standard kubernetes namespace “kube-system” - this will be used as the account for all the tiller operations
  2. Next we apply the role binding for the cluster - here we define a new ClisterRole for the new Service Account
  3. Finally we initialise Helm/Tiller, referencing our new Service Account. This step effectively installs Tiller on our kubernetes cluster

Straight forward enough. For Prometheus we will be using a Helm packaged Prometheus Operator. A Kubernetes Operator is an approach that allows packaging of an application that can be deployed on Kubernetes and can also be managed by the Kubernetes API - you can read more about operators here, and there are lots of operators already created for a range of applications.

As I found myself repeatedly updating the config for the install, I preferred to use the Helm “upgrade” method rather than install (upgrade works even if it has never been installed):

helm upgrade -f prometheus-config.yml \
      prometheus-operator stable/prometheus-operator \
      --namespace monitoring --install


The above command upgrades/installs the stable/prometheus-operator package (provided by CoreOS) into the “monitoring” namespace and names the install release as “prometheus-operator”.

At the start, the config was simply:


This config (which could have been passed as CLI arguments using “--set”, but was moved to a dedicated file simply because later on we will add a bunch more config) addresses two challenges we faced:

Adding Namespaces:

We use a dedicated namespace for our actual application (in the case above “dev-apps”), and additionally we wanted to monitor our applications themselves as well as the core kubernetes health, so we had to add that namespace there so Prometheus could monitor that as well.

Monitoring the right port:

The next one was more of a head-scratcher, and used up a lot more time figuring it out. With the stable/prometheus-operator Helm install, we noticed that on the targets page of Prometheus that the monitoring/prometheus-operator-kubelet/0 and monitoring/prometheus-operator-kubelet/1 were both showing as 0/3 up.

Our targets were showing 
monitoring/prometheus-operator-kubelet/0 (0/3 up)
monitoring/prometheus-operator-kubelet/1 (0/3 up)



They all looked correct, much like the other targets that were reported as being up, and were hitting the endpoints http://127.0.0.1/10255/metrics and /metrics/cadvisor but were all showing errors registered as "Connect: connection refused"

Initial googling revealed it was a fairly common symptom, however, rather misleadingly, all the issues documented were around particular flags that needed to be set and issues with auth (the errors listed were 401/403 rather than our error “connect: connection refused”) - and is also covered in the Prometheus-Operator troubleshooting section.

After much digging, what actually seemed to have caught us out was some conflicting default behaviour.

The Promehteus Operator defines three different ports to monitor on:


And of the three ports defined by the source Prometheus Operator code, the Helm chart is currently set to default to “http-metrics”, e.g. to use port 10255


However, more recently that read-only port, 10255, has been disabled and is no longer open to monitor against. This meant we had conflicting default behaviour across the software - so we had to explicitly override the default behaviour on prometheus operator by setting kubelet.servicemonitor.https flag to true. 


As you can see in the above defaulting, it switches between http-metrics port and https-metrics port based on the servicemonitor.https flag. Explicitly including that in our config overrode the default value and it switched to monitor on 10250 and all was fine then.


I expect the default behaviour will be changed soon, so this gotcha will hopefully be short lived, but in case it helps anyone else I will leave it here.

Next up I will attempt to explain some of the magic behind the Prometheus configuration and how it can be setup to easily monitor all (or any) of your Kubernetes services.


Machine Learning with AWS & Scala

Recently, in an attempt to starting learning React, I started building an akka-http backend API as a starting point. I quickly got distracted building the backend and ended up integrating with both the Twitter streaming API and AWS' Comprehend sentiment analysis API - which is what this post will be about.

Similar to an old idea, where I built an app consuming tweets about the 2015 Rugby world cup, this time my app was consuming tweets about the FIFA world cup in Russia - splitting tweets by country and recording sentiment for each one (and so a rolling average sentiment for each team).


Overview

The premise was simple:

  1. Connect to the Twitter streaming API (aka the firehose) filtering on world cup related key words
  2. Pass the body of the tweet to AWS Comprehend to get the sentiment score
  3. Update the in memory store of stats (count and average sentiment) for each country

In terms of technology used:
  1. Scala & Akka-Http
  2. Twitter4s Scala client
  3. AWS Java SDK

As always, all the code is on Github - to run it locally, you will need a Twitter dev API key (add an application.conf as per the readme on the Twitter4s github) and you will also need an AWS key/secret - the code will look for credentials stored locally but you can also just set them in environment variables before starting. The free tier supports up to 50,000 Comprehend API requests in the first 12 months - and as you can imagine, plugging this directly into twitter can result in lots of calls, so make sure you restrict it (or at least monitor it) before you leave it running!


Consuming Tweets

Consuming tweets is really simple with the Twitter4s client - we just define a partial function that will handle the incoming tweet. 

The other functions about parsing countries/teams are excluded for brevity - and you can see its quite simple - each inbound tweet we make a call to the Sentiment Service (we will look at that later) then pass it with the additional data to our update service that will then store it in memory. You will also see it is ridiculously easy to start the Twitter streaming client filtering by key words.


Detecting Sentiment

Because I wanted to be able to stub out the sentiment analysis without being tied to AWS, you will notice I am using the self-type annotation on my twitter class above, which requires a SentimentModule to be passed in at construction - I am using a simple cake pattern to manage all my dependencies here. In the Github repo, there is also a Dummy implementation, that will just pick a random number for the score, so you can still see the rest of the API working - but the interesting part is the AWS integration:
Once again, the SDK makes the integration really painless - in my code I am simplifying the actual results a lot to a much cruder Positive/Neutral/Negative rating (plus a numeric score -100..100).

The AWSCredentials class is the bit that is going to look in the normal places for an AWS key.


Storing and updating our stats

So now we have our inbound tweets and a way to asses their sentiment score - I then setup a very simple akka actor to manage the state and just stored the API data in memory (if you restart the app, the store gets reset and the API stops serving data).

Again, very simple out of the box stuff for akka, but it allows easy and thread safe management of the in-memory data store. I also track a rolling list of the last twenty tweets processed, which is managed by a second, almost identical, actor.


The results

I ran the app during several games, below are some sample outputs from the API. The response from the stats API is fairly boring reading (just numbers) but the example tweets show two examples of a positive and neutral tweet correctly identified (apologies for the expletives in the tweet about Poland - I guess that fan wasn't too happy about being beaten by the Senegalese!) - you will also notice, the app captures the countries being mentioned, which exposes one flaw of the design: in the negative tweet from the Polish fan loosing two goals to Senegal, it correctly identifies the sentiment as negative, but we have no way to determine the subject - as both teams are mentioned, the app naively assigns it as a negative tweet to both of the teams, where as on reading, it is clearly negative with regards to Poland (I wasn't too concerned for my experiment, of course, just an observation worth noting).

Sample tweet from the latest API:

Sample response from the stats API:

When I finally did get around to starting to learn React, I just plugged in the APIs and paid no attention to styling, which is a round about way of apologising for the horrible appearence of the screenshot below (I'm really sorry about the css gradient)!





API Conf 2018 - Product Management for Engineers

Last week I attended and spoke at the 2018 API Conference in Berlin.

Having written about the topic before, my talk was titled: Your API as a Product - Thinking like a Product Manager (really aimed at engineers/architects/technologists).

It was recorded, so hopefully I will be able to share the video in the future, but the thrust of the talk was based around the concept that we are all building products on some level, and even if we don't have direct input into a commercial product that we might be writing code for, we have some output: code, bug reports, designs etc. So it makes sense, if we are all building products, and those products will all have users and therefore a user experience (your code, bug reports, design docs etc will be used by others, or maybe you yourself will be the user), that we try and learn from the discipline of Product Management, as it is focussed on building better products and better user experiences for the end users.

Slides are below, and really your best bet is to scroll down to the references section at the end and start watching the real Product Managers' talks.


Generic Programming with Scala & Shapeless part 2

Last year I spent some time playing with, and writing about, Scala & Shapeless - walking through the simple example of generating random test data for a case class.

Recently, I have played some more with Shapeless, this time with the goal of generating React (javascript) components for case classes. It was a very similar exercise, but this time I made use of the LabelledGeneric object, so I could access the field names - so I thought I'd re-visit here and talk a bit about some of the internals of what is going on.


Getting started

As before, I had to define implicits for the simple types I wanted to be able to handle, and the starting point is of course accepting a case class as input.

So there are a few interesting things going on here:

First of all, the method is parameterised with two types: caseClassToGenerator[A, Repr <: HList] A is simply going to be our case class type, and Repr is going to be a Shapeless HList.

Next up, we are expecting several implict method arguments (we will ignore the third implicit for now, that is an implicit that I am using purely for the react side of things - this can be skipped if the method handles everything itself):

implicit generic: LabelledGeneric.Aux[A, Repr], gen: ComponentGenerator[Repr],

Now, as this method's purpose is to handle the input of a case class, and as we are using shapeless, we want to make sure that from that starting input we can transform it into a HList so we can then start dealing with the fields one by one (in other words, this is the first step in converting a case class to a generic list that we can then handle element by element).  In this setting, the second implicit argument is asking the compiler to check that we have also defined an appropriate ComponentGenerator (this is my custom type for generating React components) that can handle the generic HList representation (its no good being able to convert the case class to its generic representation, if we then have no means to actually process a generic HList).

Straight forward so far?

The first implicit argument is a bit more interesting. Functionally, all LabelledGeneric.Aux[ARepr] is doing is asking the compiler to make sure we have an implicit LabelledGeneric instance that can handle converting between our parameter A (the case class input type) and Repr (the HList representation). This implicit means that if we try to pass some type A to this method, the compiler will check that we have a shapeless LabelledGeneric that can handle it - if not, we will get a compile error.

But things get more interesting if we look more at what the .Aux is doing!


Path dependent types & the Aux pattern

The best way to work out what is going on is to just jump into the Shapeless code and have a dig. I will proceed to use Generic as an example, as its a simpler case, but its the same for LabelledGeneric:

That's a lot simpler than I expected to find, to be honest, but as you can see from the above example, there are two types involved, there is the trait parameter T and the inner type Repr, and the Generic trait is just concerned with converting between these two types.

The inner type, Repr, is what is called a path dependent type, in scala. That is, the type is dependent on the actual instance of the enclosing trait or class. This is a powerful mechanism in Scala (but one that can also catch you out, if you are in the habit of defining classes etc within other classes or traits). This is an important detail for our Generic here, as it could be given any parameter T, so the corresponding HList could be anything, but this makes sure it must match the given case class T - that is, the Repr is dependent on what T is.

To try and get our head around it, let's take a look at an example:

Cool, so as we expected, we can see that in our Generic example, we can also see that the type Repr has been defined matching the HList representation of our case class. It makes sense that we want the transformed output HList to have its own, specific type (based on whatever input it was transforming), but it would be a real pain to have to actually define that as a type parameter in the class along with our case class type, so it uses this path-dependent type approach.

So, we still haven't got any closer to what this Aux method is doing, so let's dig into that..


The Aux Pattern

We can see from our code that the Aux method is taking two parameters, firstly A - which is the parameter that our Generic will take, but the Aux method also takes the parameter Repr - which we know (or at least pretend we can guess) corresponds to the path dependent type that is defined nested inside the Generic trait.

The best way to work out what is going on from is to take a look at the shapeless code!

As we can see, the Aux type (this is defined within the Generic object) is just an alias for a Generic[T], where the inner path-dependent type is defined as Repr - they have a pretty decent explanation of what is going on in the comments, so I will reproduce that here:

(that is abbreviated for the more relevant bits - they have even more detail in the comments that can be read).

That pretty nicely sums up the Aux pattern - the pattern allows us to essentially promote the result of a type-level computation to the higher level parameter. It can be used for a variety of things where we want to reason about the path dependent types, besides this, but this is a common use for the pattern.



So thats all I wanted to get into for now - you can see the code here, and hopefully with this overview, and the earlier shapeless overview, you can get an understanding of what the LabelledGeneric stuff is doing and how Shapeless is helping me generate React components.

JAXLondon 2017: Agile Machine Learning [VIDEO]

Last October a colleague and I gave a talk at the JAXLondon Conference about Machine Learning in an agile, commercial environment (I then also gave the talk again in November in Munich at the W-JAX Conference).




The video of the talk is now available - the first half (and end section) is mostly softer stuff, where I talk about lessons learnt from doing ML research in a commercial environment and the middle section is my colleague, Sumanas, talking about how Word2Vec works and some more interesting demos of using it in an interesting application!




In related conference speaking news, I will be in Berlin next month for the API Conference to talk about Product Management and API Design: https://apiconference.net/api-design-documentation/your-api-as-a-product-thinking-like-a-product-manager/



Monster dash - Making an android game with my son

At the start of last year, 2017, I set a resolution, of sorts, to build a mobile app with my older boy. He was just getting into playing games on mobile phones and tablets, and lots of them were just simple side scrolling platform games, where your character just had to run and avoid minor obstacles and perils.

Needless to say, whilst I managed to start it, it wasn't until after Easter this year that I actually managed to complete it. Now, this did appear to pose one problem: over a year later since the idea, my boy was playing far more sophisticated games so when I pitched the idea of making a mobile game together, his plans went far beyond the simple side scrolling platform game I had in mind! I was a bit unsure if he was going to be totally underwhelmed by the finished product that we were making, but in the end, having seen his creation come to life he was thoroughly pleased.



When we finally had something working (albeit fairly primitive) he took it into school as a show-and-tell, and surprisingly, the other children were all very impressed - I assume just because of the feat of making a game, despite in paling in comparison to actual games they undoubtedly all played.





Anyway, the source code for the game is all on Github (maybe one day we will make a game and publish it, so he can see that process and actually have other people play his game) and can be found here: https://github.com/robhinds/monster-dash.  I can't take much credit for the legwork on this one though - knowing that the game concept I was after was very simple, I figured there would be a how-to somewhere lurking on the internet, and sure enough we stumbled upon this: http://williammora.com/a-running-game-with-libgdx-part-1 - If you want to have a go I would recommend following William's series of articles explaining the hows and whats. I modified (read mangled) his source code a bit, simplified parts and added flourishes here and there, but its very similar in theory.



To be honest, the hardest part was cleaning up the images - He drew them, then I just snapped them with my phone, used a selection of online tools to remove backgrounds and chop them up for my animation, then loaded them into the app.

My boy really enjoyed working on it, and still asks if we can make another app, with even grander ideas, so I highly recommend it!