An opinionated guide to building APIs with Akka-Http

Akka-Http is my preferred framework for building APIs, but there are some things I have picked up along the way. For one thing, Akka-Http is very un-opinionated in its approach, there are often lots of ways to do the same thing, and there isn't a lot of opinionated guidance about how to do things.

I have been writing Akka-Http APIs for I guess about 18 months now (not long, I know), having previously worked predominantly with libraries like Spring, and I have seen some pretty nasty code resulting from the this (by this I mean, I have written nasty code - not intentionally, of course, but from good intentions starting off trying to write, clean, idiomatic Akka-Http code, and ending up in huge sprawling routing classes which are un-readable and generally not very nice).

The routing DSL is Akka-Http is pretty nice, but can quickly become unwieldy. For example, let's imagine you start off with something like this:

This looks nice right? A simple nested approach to the routing structure that reflects the URL hierarchy and the HTTP method etc. However, as you can probably imagine, try and scale this up to a full application it can very easily become fairly messy. The nested directives make it nice to group routes under similar root URLs but as you do that you end up with very long, arrow-shaped code that actually isn’t that easy to follow - if you have several endpoints nested within the structure it actually becomes quite hard to work out what endpoints there are and what is handling what.

Another problem that needs to be managed is that with the first one or two endpoints you might put the handling code directly in the routing structure, which is ok for very small numbers, but it needs to be managed sensibly as the endpoints grow and your routing structure starts to look more and more sprawling.

It is of course personal preference, but even with the simple example above, I don’t like the level of nesting that already exists there to simply define the mapping of the GET HTTP method and a given URL - and if you add more endpoints and start to break down the URL with additional directives per URL section then the nesting increases.

To simplify the code, and keep it clean from the start I go for the following approach:

  1. Make sure your Routing classes are sensibly separated - probably by the URL root (e.g. have a single UserRoutes class that handles all URLs under /users) to avoid them growing too much
  2. Hand off all business logic (well, within reason) to a service class - I use Scala’s Self-Type notation to handle this and keep it nicely de-coupled
  3. Use custom directives & non-nested routings to make the DSL more concise

Most of these steps are simple and self explanatory, so its probably just step 3 that needs some more explanation. To start with, here is a simple example:

You can see points 1 and 2 simply enough, but you will also notice that my endpoints are simple functions, without multiple levels of nesting (we may need some additional nesting at some point, as some endpoints will likely need other akka-http directives, but we can strive to keep it minimal). 

You might notice I have duplicated the URL section “users” rather than nesting it - some people might not like this duplication (and I guess risk of error/divergence of URLs - but that can be mitigated with having predefined constants instead of explicit strings), but I prefer the readability and simplicity of this over extensive nesting.

Custom Directives

First off, I have simply combined a couple of existing directives to make it more concise. Normally, you might have several levels of nested directives such as one or more pathPrefix(“path”) sections, the HTTP Method such as get{} another one to match pathEndOrSingleslash{} - To avoid this I have concatenated some of these to convenient single points.

getPath, postPath, putPath, etc simply combine the HTTP method with the URL path-matcher, and also includes the existing Akka-Http directive “redirectToTrailingSlashIfMissing” which avoids having to specify matching on either a slash or path end, and instead allows you to always match exact paths - It basically squashes the three directives in the original HelloWorld example above down to one simple, readable directive.

Custom Serialisation

You may also notice, I have implemented a custom method called “respond” - I use this to handle the serialisation of the response to a common JSON shape and to handle errors. Using this approach, I define a custom Response wrapper type that is essentially an Either of our internal custom error type and a valid response type T (implementation details below) - this means in all our code we have a consistent type that can be used to handle errors and ensure consistent responses.

This respond method simply expects a Response type to be passed to it (along with an optional success status code - defaulting to 200 OK, but can be provided to support alternative success codes). The method then uses Circe and Shapeless to convert the Response to a common JSON object. 

Let’s have a look at some of the details, first the custom types I have defined for errors and custom Response type:

Simple, now let’s take a look at the implementation of the respond method:

It might look daunting (or not, depending on your familiarity with Scala and Shapeless), but its relatively simple. The two implicit Encoder arguments that are included on the method signature simply ensure that whatever type A is in the provided Response[A], Circe & shapeless are able to serialise it. If you try to pass some response to this method that can’t be serialised you get a compile error. After that, all it does is wraps the response A in a common message and returns that along with an appropriate (or provided) HTTP status code.

You might also notice the final result is built using the wrap method in the ResponseWrapperEncoder trait - this allows easy extension/overriding of what the common response message looks like.


All of this machinery is of course abstracted away to a common library that can be used across different projects, and so, in reality it means we have a consistent, clean API with simple routing classes as simple and neat as below, whilst also handing off our business logic to neater, testable services.

All the code for my opinionated library and an example API is all on GitHub, and it is currently in progress with more ideas underway!


Conference updates: JAXLondon and W-JAX 2017

I have had the pleasure of closing out this year by speaking at two conferences. The first in October was JAXLondon, the second in November was W-JAX in Munich - the talk was titled "Agile Machine Learning: From theory to production".

As the title might suggest, the talk was about some considerations and challenges of doing Machine Learning(ML) work in a commercial environment - so a lot of softer aspects of ML: when you should adopt it, how you can work as a team on ML.

The talk at JAXLondon was recorded, so hopefully at some point that will be available for me to share as well, but for the time being, here is a brief interview that my co-speaker and I gave ahead of our JAXLondon talk.

See more details and write up here


On Education: Let us play

9:09 PM , , 0 Comments

Education is something that has been an interest to me for some time, and with a child currently going through primary school in the UK, it's something that I think about quite a lot, so this is something of an open letter on the state of the education system in the UK.

First off, I should say that I am a strong believer in the importance of both the role of free-play in learning, and also of instilling a curiosity in children as an approach to ensure future success, rather than more structured didactic teacher and test driven approach. This belief is largely grounded in various things I have read on the topic, but appreciate there is undoubtedly a lot more depth to the subject matter than I know about.

Positive Examples

If we are thinking about the education system, it seems prudent to look elsewhere for success stories to see what we can learn from and improve on in the UK system, and a glaringly obvious example would be the Finnish education system. Finland's often cited education system has consistently been the top ranked system in Europe for the last 16 years, so what are they doing right?

The first notable difference is the age at which children start school: children will attend preschool from an early age, but primary school doesn't start until 7 years old - in other words, formal teacher-lead instruction on what we would consider core topics: maths, reading and writing, do not start until the age of seven. Before that, the education system is entirely focused on free, creative play.

This model also ties into research from some neuroscientists who believe that before the age of seven or eight, "[children] are better suited for active exploration than didactic explanation" - claiming that "the trouble with over-structuring is that it discourages exploration". Having witnessed this behaviour first hand, this very much supports my anecdotal data on the subject: trying to explain to a 6 year old a moderately complex process can be a challenge, but, let them watch you perform it a few time and they will often pick it up with much greater ease (case in point: using a tech device, playing a video game etc). Furthermore, it seems to me that encouraging exploration and independent discovery should surely be a key part of any process aiming to instil curiosity in children.

These results were also mirrored in the research by the Lego Foundation, who claimed children should learn through play until at least the age of eight (despite possible cynicism based on a report from a toy manufacturer recommending more play, that article is really spot on).

To put this schooling approach into perspective, in the UK, children will already have had up to three years of five days a week, full day, classroom based teaching by the age of 7. When my eldest son turns 7 he will be finishing his third year in school and will already faced the prospect of national standardised testing in the form if the SATs (thankfully the government have decided to scrap these, but they will only stop being compulsory in 2023).

That is heartbreaking.

Thinking of this child, so full of joy and enthusiasm for playing, whether it be running around outside lost in an imaginative world of play or sitting down playing with lego, having to spend something like 25 hours a week in a classroom seems unthinkable.

But it's not just starting late, either, even once more formal education starts, they make sure to keep play an integral part of the school day, and children are required to have 15 minute play breaks every hour. Aside from the potential educational benefits of regular playing, research has also found that outdoor play is linked to healthier and happier children (aside: have you ever tried to get a 6 year old to concentrate for 45 minutes? If so, you will probably see the futility in anything other than regular play breaks)

Once again, for some perspective, whilst visiting UK primary schools for my eldest son, one school head mistress casually boasted that the traditional afternoon playtime break had been dropped in favour of more classroom time.

Unsurprisingly, we didn’t apply for a place at that school.

But why?

The common thinking behind starting formal schooling earlier is that the earlier they start learning, the better prepared they'll be, and the greater the head start they'll have. But even if the Finnish school system didn't appear to disprove this theory, it's worth considering the difference in benefits of learning by rote/testing Vs independent learning (via play or other means) and the independent curiosity needed for the latter. I'd suggest that, in the modern knowledge economy in which we live, and with the quickening rate at which information and understanding is being changed by advancing technology, the most beneficial skill that someone can leave school with is curiosity and the ability along with the desire to learn independently. That is, to leave school as lifelong learners. I think it says a lot that a key topic on the Finnish national curriculum is simply “learning to learn”.

Beyond looking at success stories of other education systems, we can look at history. Current incarnations of the education and school systems are a relatively modern thing, so what did we do to learn before then? Of course, families and communities have long recognised the importance of amassing and passing on information to younger generations, if not through formal education, but in many cultures, children learn through imitation and experimentation (which, as I mentioned previously, is easy to believe if you have a young child that has grown up around adults using mobile devices and have witnessed the speed at which they become proficient through imitation and experimentation).

It might be tempting to think that whilst humankind were able to learn through such basic play techniques in time gone by purely because what we needed to learn was simpler, and that the as we have progressed as a society it has also demanded people have a greater understanding and depth of knowledge in order to keep up with industrial and technological advances, so the education system has evolved out of necessity.

However, I would argue that the opposite is true - firstly, as pointed out earlier, the speed at which science and technology is advancing means whatever level you leave the schooling system, within a couple of years understanding and techniques will likely have moved on, and your ability to learn independently and keep up with the fast paced changes is going to be essential to success.

Secondly, as I have written before, I would argue that play provides the essential understanding and building blocks for going on to study and understand computer science, engineering and maths.  As a computer scientist, I personally think the education that stood me in best stead for going on to learn - and be successful professionally - was playing with toys like Lego and puzzle solving games.

For example, let’s take a look at one of the Key Stage One goals for computing in the current UK National Curriculum:

“use logical reasoning to predict the behaviour of simple programs”

To be clear, Key Stage 1 is 5 to 7 years old - this is children who, some neuroscientists think, are of an age that is too young to be in formal taught education, who, in Finland, would still be enjoying creative play, and who may not all be capable of reading un-aided. I don’t know about you, but I wouldn’t like to have to teach that in any medium other than play.

However, if we just re-frame the problem and consider this goal in the context of playing with train sets, it becomes simpler: “use logical reasoning to predict the behaviour of trains” - give the kids train sets, let them build tracks and think about what happens when a train is added: what happens if we add two trains? What happens if we change the direction of trains? Or change the behaviour of a junction piece? Being able to reason logically about such behaviours and changes is a very transferable skill that is useful for thinking about a range of problem solving disciplines, including computer science.

And its not just computing - there is a growing group of mathematicians who posit that preschoolers are actually capable of understanding calculus and algebra. But not just that, but  by actually attempting to teach them maths the way we currently do, it is crushing almost all appetite or future interest in the subject, that is actually an amazing world of wonder and surprise, by taking all the playful fun out of maths and making it a boring case of memorising numbers and patterns - which obviously has the end result of killing their curiosity or interest in going out and independently learning.

Ultimately, I would love it if UK schools embraced free play more, if they embraced teaching STEM subjects through play, but I understand that it’s a huge shift that needs to come from the government. Recognising that the SATs test is not a positive thing for 6 and 7 year olds is a good first step, but the UK primary schools are still so governed by the national curriculum and expectations around performance that it seems impossible for any individual school to start to move the dial.


  1. The Atlantic: The underrated gift of curiosity
  2. The Guardian: The secret of Europe's top education system
  3. The New York Times: Let the kids learn through play
  4. The Guardian: Children should learn mainly though play until the age of 8
  5. Gov.UK: SATs practice material
  6. The Atlantic: How Finland keeps kids focused through free play
  7. The Play Return: An assessment of play initiatives
  8. Wikipedia: Knowledge Economy
  9. Gov.UK: Computing National Curriculum
  10. Fostering mathematical thinking through playful learning (paper)
  11. The New York Times: What babies know about physics
  12. The Atlantic: Five year olds can learn calculus


Generic programming in Scala with Shapeless

In my last post about evolutionary computing, I mentioned I started building the project partly just so I could have a play with Shapeless. Shapeless is a generic programming library for Scala, which is growing in popularity - but can be fairly complicated to get started with.

I had used Shapeless from a distance up until this point - using libraries like Circe or Spray-Json-Shapeless that use Shapeless under the hood to do stuff like JSON de/serialisation without boilerplate overhead of having to define all the different case classes/sealed traits that we want to serialise.

You can read the previous post for more details (and the code) if you want to understand exactly what the goal was, but to simplify, the first thing I needed to achieve was for a given case class, generate a brand new instance.

Now I wanted the client of the library to be able to pass in any case class they so wished, so I needed to be able to generically inspect all attributes of any given case class* and then decide how to generate a new instance for it.

Thinking about this problem in traditional JVM terms, you may be thinking you could use reflection - which is certainly one option, although some people still have a strong dislike for reflection in the JVM, and also has the downside that its not type safe - that is, if someone passes in some attribute that you don't want to support (a DateTime, UUID, etc) then it would have to be handled at runtime (which also means more exception handling code, as well as loosing the compile time safety). Likewise, you could just ask clients of the code to pass in a Map of the attributes, which dodges the need for reflection but still has no type safety either.

Enter Shapeless

Shapeless allows us to represent case classes and sealed traits in a generic structure, that can be easily inspected, transformed and put into new classes.

A case class, product and HList walk into a bar

A case class can be thought of as a product (in the functional programming, algebraic data type sense), that is, case class Person(name: String, age: Int, living: Boolean) is the product of name AND age AND living - that is, every time you have an instance of that case class you will always have an instance of name AND age AND living. Great. So, as well as being a product, the case class in this example also carries semantic meaning in the type itself - that is, for this instance as well as having these three attributes we also know an additional piece of information that this is a Person. This is of course super helpful, and kinda central to the idea of a type system! But maybe sometimes, we want to be able to just generically (but still type safe!) say I have some code that doesn't care about the semantics of what something is, but if it has three attributes of String, Int, Boolean, then I can handle it - and that's what Shapeless provides - It provides a structure called a HList (a Heterogeneous List) that allows you to define a type as a list of different types, but in a compile time checked way.

Rather that a standard List in Scala (or Java) where by we define the type that the list will contain, with a HList, we can define the list as specific types, for example:
The above example allows us to define out HList with specific types, which provides our compile time safety - the :: notation is some syntactical sugar provided by Shapeless to mirror normal Scala list behaviour, and HNil is the type for an empty HList.

Case class to HList and back again

Ok, cool - we have a way to explicitly define very specific heterogeneous lists of types - how can this help us? We might not want to be explicitly defining these lists everywhere. Once again Shapeless steps in and provides a type class called Generic.

The Generic type class allows conversion of a case class to a HList, and back again, for example:
Through compile time macros, Shapeless provides us these Generic type classes for any case class. We can then use these Generic classes to convert to and from case classes and HLists:

Back to the problem at hand

So, we now have the means to convert any given case class to a HList (that can be inspected, traversed etc) - which is great, as this means we can allow our client to pass in any case class and we just need to be able to handle the generation of/from a HList.

To start with, we will define our type class for generation:

Now, in the companion object we will define a helper method, that can be called simply without needing anything passed into scope, and the necessary Generator instance will be provided implicitly (and if there aren't any appropriate implicits, then we get our compile time error, which is our desired behaviour)

As you can see, we can just call the helper method with an appropriate type argument and as long as there is an implicit Generator in scope for that type, then it will invoke that generate method and *hopefully* generate a new instance of that type.

Next, we need to define some specific generators, for now, I will just add the implicits to support some basic types:
All simple so far - if we call Generator.generate[Int] then it gets the implicit Generator[Int] and generates a random int (not that useful, perhaps, but it works!)

Now comes the part where our generate method can handle a case class being passed in, and subsequently the Shapeless HList representations of a case class (we will recursively traverse a HList structure using appropriate implict generators). First up, lets look at how we can implicitly handle a case class being passed in - at first this may start to look daunting, but its really not that bad.. honestly:
So, what is going on there? The implicit is bound by two types: [T, L <: HList] - which is then used in the implicit argument passed into the method implicit generic: Generic.Aux[T, L] and argument lGen: Generator[L] - this matches for all types T that there exists in scope a Shapeless Generic instance that can convert it into some HList L, and for which L an implicit Generator exists.  Now we know from our earlier look at Shapeless that it will provide a Generic instance for any case class (and case classes will be converted into a HList), so this implicit will be used in the scenario where by we pass in a case class instance, as long as we have an implicit Generator in scope that can handle a HList: So lets add that next!

Let's start with the base-case, as we saw earlier, HNil is the type for an empty HList so lets start with that one:
Remember, the goal is from a given case class (and therefore a given HList) is to generate a new one, so for an empty HList, we just want to return the same.

Next we need to handle a generator for a non-empty HList:
So this case is also relatively simple, because our HList is essentially like a linked list, we will handle it recursively - and the type parameters [H, T <: HList] represent the type of the head of the HList (probably a simple type -  in which case it will try to resolve the implicit using our original Int/Boolean/Double generators) and the tail of the HList which will be another HList (possibly a HNil if we are at the end of the list, otherwise this will resolve recursively to this generator again). We grab the implicit Generator for the head and tail then simply call generate on both and that's really all there is to it! We have defined the implicit for the top level case class (which converts it to the generic HList and then calls generate on that), we have the implicit to recursively traverse the HList structure and we have finally the implicits to handle the simple types that might be in our case class.

Now we can simply define a case class, pass it to our helper method and it will generate a brand new instance for us:

I was using this approach to generate candidates for my evolutionary algorithm, however the exact same approach could be used easily to generate test data (or much more). I used the same approach documented here to later transform instances of case classes as well.

Here is the completed code put together:


I have mentioned so far that Shapeless provides the means to transform case classes and sealed traits to generic structures, but have only really talked about case classes. The sealed trait is a slightly different shape to a case class (it is analogous to the algebraic data type co-product) and accordingly, Shapeless has a different structure to handle it, rather than a HList it has a type called Coproduct. The above code can be extended to also handle sealed traits, to do so you just need to add the implicit to handle the empty coproduct (CNil) and recursively handle the Coproduct.


Intro to Genetic Algorithms in Scala

This is intended to be quite a simple high level approach to using some of these ideas and techniques for searching solution spaces. As part of my dissertation back when I was in university, I used this approach to find optimal configurations for Neural Networks. More recently, I have applied this approach to finding good configurations for training an OpenNLP model, with pretty good results - which is really what made me think to write a little about the topic.

So what is it?

Genetic algorithms are a subset of evolutionary computing that borrow techniques from classic evolution theories to help find solutions to problems within a large search space.

So, what does that mean? It sounds a little like a bunch of academic rhetoric that make nothing any clearer, right? But actually, the concept is quite simple, and works like the evolutionary theory of "Survival of the fittest" - that is, you generate a large set of random possible solutions to the problem and see how they all perform at solving the problem, and from there, you "evolve" the solutions allowing the best solutions to progress and evolve, cutting out the worst performing solutions.

How it works

In practical terms, at a high level, it involves the following steps:
  1. Create an encoded representation for the problem solution - This might sound complicated, but this just means capture all the possible inputs or configuration for a solution to the problem.

  2. Randomly generate a set of possible solutions - that is, a set of configurations/inputs (this is called a "population")

  3. Asses the performance of these solutions - to do this, you need a concept of "fitness" or how well a solution performs - and rank the population

  4. Evolve the population of solutions - this can involve a range of techniques of varying complexity, often keeping the top n candidates, dropping the bottom m candidates and then evolving/mutating the rest of the new population

  5. Repeat the rank & evolve cycle (depending on how it performs)

To understand it lets first consider a simple, hypothetical numeric problem, let's say you have some function f(x, y, z) that returns an integer, and we want to find a combination of x, y and z that gives the a value closest to 10,000 (e.g. you want a cuboid with volume of 10,000 and you want to find dimensions for the width, depth and height that achieve this - like I said, this is just hypothetical). The first step would be to encode the solution, which is pretty simple - we have three inputs, so our candidate looks like this:

case class Candidate(x: Int, y: Int, z: Int)

We will also need a fitness function, that will asses how a candidate performs (really, this will just be evaluation of the inputs against our function f)

def fitness(candidate: Candidate): Int = Maths.abs(10000 - (x * y * z))

Now we have a representation of the solution, in generic terms, and we have a function that can measure how well they perform, we can randomly generate a population - the size of the population can be chosen as suits, the bigger the population the longer an evolution cycle will take, but of course gives you a wider gene pool for potential solutions which improves your chances of a better solution.

Once we have our initial population, we can evaluate them all against our fitness functions and rank them - from there, we can evolve!

Survival of the fit, only the strong survive

So how do we evolve our candidates? There are a few techniques, we will look at the simplest here:

  • Survival of the fittest - the simple and sensible approach that the top performing candidate in your population always survives as is (in the scenario where by we have stumbled upon the optimal solution in the first random generation, we want to make sure this is preserved)

  • Mutation - much like you might imagine genes get mutated in normal evolution, we can take a number of candidates and slightly adjust the values in our candidate - for numeric "genes" its easy - we can just adjust the number, a simple popular technique for doing this is Gaussian mutation, which ensures that in most cases the the value will be close to the original value, but there are chances that it could be a larger deviation (it's a bell curve distribution, whereby the peak of the bell curve is the original value).

  • Cross breeding - again, like normal reproduction, randomly select two parent candidates (probably from the top x% of the candidate pool) and take some attributes from each parent to make up a child

Between mutation and cross breeding you can create a new pool of candidates and continue the search for a solution.

Some code

Partly because I wanted to learn a bit more about the Shapeless library (more of that in another post) and partly for fun, I created a simple GitHub project with some code that has very simple evolutionary search machinery that allows you to try and use these techniques.

The shapeless stuff was to support generic encoding of our solution - so the library can easily be used to define a solution representation and a fitness function.

Following on from the example above to find three dimensions that have a volume of 10,000 I have the following configuration:
In the above, we first define the case class ThreeNumbers - this can be any case class you like, but must have Gene[A] as its attributes - these are custom types that I created that have defined generation and mutation methods, and have additional configuration (such as max and min values for numbers etc).  We then pass all the configuration in, along with a fitness function of the form (a: B => Double) and run it.

It varies on each run, as the initial pool of candidates is generated at random, but in most cases this example solves the problem (finds an error rate of zero) within 10 generations:

As mentioned, I have used this approach to train OpenNLP NER models - the different training data generation parameters and the training configuration was all encoded as a candidate class and then used this method to evolve to find the best training configuration for named entity recognition, but it can be applied to all sorts of problems!

Feel free to try it out and let me know if it works well for any problem domains - you can also checkout the code and make it more sophisticated in terms of evolution/mutation techniques and introducing a learning rate.


Agile Machine Learning: From Theory to Production

Later this year, Sumanas and I will be co-presenting a talk about researching Machine Learning in an agile development environment at the JAXLondon conference. This is a high level overview of some of the topics we will be presenting (we will also try to get some cool ML demos in there too, just to make things a bit more interesting).

So what’s the problem?

Artificial Intelligence(AI) and Machine Learning(ML) are all the rage right now - at Google’s recent I/O 2017 event they really put ML front and center with plans to bake it into all their products, and lots of other large companies are aligning themselves as Machine Learning companies as a natural progression from Big Data.

According to a recent Narrative Science survey, 38% of enterprises surveyed were already using AI, with 62% expecting to be using it by 2018. So it would be understandable if many companies might be feeling the pressure to invest in an AI strategy, before fully understanding what they are aiming to achieve, let alone how it might fit into a traditional engineering delivery team.

We have been working the last 12 months on taking a new product to market and trying to go from a simple idea to a production ML system. Along the way we have had to integrate open ended academic research tasks with our existing agile development process and project planning, as well as working out how to deliver the ML system to a production setting in a repeatable, robust way, with all the considerations expected from a normal software project.

Here are a few things you might consider if you are planning your ML roadmap (and topics we will cover in more detail in the JAXLondon session in October)

Machine Learning != your product

Machine Learning is a powerful tool to enhance a product - whether it be by reducing costs of human curation, or more powerful understanding for your voice/natural language interface - however, Machine Learning shouldn’t be considered the selling point of the product - think of the end result Product First, that is, is there a market for the product regardless of whether it is powered by ML or human oversight. Consider if it makes sense to build a fully non-ML version of the product to start proving the market fit and delivering value to customers.

Start small and build

The Lean Startup principles of MVP and fast iterations still apply here: following on from the point above, starting with a non-ML product, as you start to apply ML techniques, if you are able to start leveraging some of these techniques to get even a small increase in performance (better recommendations, reduced human effort/cost, improved user experience - replacing human process with ML for just 5% of cases can start to realise cost benefits) then that can start adding value straight away. From a small start you are able to start proving the value that can be added whilst also getting the ML infrastructure tested and proven.

Tie into development sprint cycles

You may be hiring a new R&D team, or you may be using members from your existing engineering team, either way, it is helpful to have them working in similar development sprint cycles (if you work in sprints). It will allow both sides of the teams to understand what is happening and how work is progressing - product and engineering changes and issues might be useful in informing the direction of R&D and likewise, there may be data features or feedback from the R&D team that could be easily engineered and would make things simpler for research. Whilst research is ongoing, and often a time consuming task, having fortnightly (or whatever the sprint length is) checkpoints where ideas can be discussed and demoed can be good for the whole team’s understanding as well as being a positive motivator.

Don’t forget Clean Code!

Whilst experimenting and researching different ideas it can be pretty easy to fall into hacking mode, just rattling out rough scripts to prove an initial concept or idea - and there is definitely a place for this, but as your team progresses it will be more beneficial to actually invest in good coding principles. Whilst one-off scripts make sense to be hacked out, as the team works across several ideas, having code that is re-usable and organised sensibly with proper separation of concerns can make the research in the future easier, as well as reducing the cost when it comes to the production-isation. Investing in some machinery to make experiments easily testable (and benchmarking different solutions) will be very beneficial to invest in from the start.

Recommended Reading

While the interwebs are awash with Machine Learning articles, tutorials and click-baitey links guaranteed to reduce the error of your model in 3 quick steps, the following is a small list of resources that we think are worth browsing.

For the academically inclined, the following is a list of papers, both recent and not so recent:

During the session in JAXLondon later this year, we will go into more detail with these ideas, as well as others including technical and architectural considerations for building and deploying an ML stack.


Building a RESTful API like a Product Manager

Something I have been thinking about for a while is the idea of APIs as your Product - the idea that you should design, build and test your API like it's a public facing product (as it might actually be if its a public API), and what lessons we should be learning from Product Managers about how we achieve this.

In the latest instalment of their "Technology Radar", ThoughtWorks recommended the concept "API as a Product" as ready to trial (although I think it's far beyond trial stage - I can see no real risk of adopting the lessons from Product Management for your API).

I have generally been interested in Product Management for several years - I think it's hard not to get interested in the science of Product Management when you are building side projects, as you naturally have to think about the product you are building - sure, with side projects they can easily turn into vanity projects, where you just build what you think is cool, but I think it's still fun to think about real users and good Product Management techniques that can be applied even to side-projects.

Having tried to gain some understanding, I have turned to some classic material on the subject of Product Management: The Lean StartupDon't Make Me ThinkHow to Build a Billion Dollar App. I have also been lucky enough to attend Mind The Product Conference twice (though volunteering - but managed to watch lots of the talks), and to have worked with several great Product Managers.

So what lessons can we start applying? I think, if one statement sums it up, its an idea from Kathy Sierra:

Does your product make your users awesome?
This is a great way to think about APIs (I'm mostly talking about REST/GraphQL type APIs here - but this thinking is a great way of approaching design for normal class/interface/library APIs too).

As engineers, I'm sure we have all used APIs that don't achieve this - APIs that make us have to think, that make us feel stupid, that seem overly complicated (for example, the old Java Date library - the number of times I had to google just to find out how to simply create a specific date for something simple like a unit test is ridiculous). And hopefully, we have all used APIs that make us feel awesome - APIs that make us productive, that let us do what we expect to be able to do, and can (with the help of the IDE's auto-complete) let us guess what the method calls are.

Another idea Kathy Sierra mentions in that talk, and is also made famous from the title of the Product Management classic, is:

Don't make me think

I have written before about the StackOverflow and GitHub APIs, but they deserve a mention again, if you have ever written an integration with them, you will know.  The APIs are designed in such a way that sensibly models the domain model.

For example, take StackOverflow, if we were going to describe their domain model, what first class objects would we have? I guess Questions, Answers, Comments etc - and sure enough, their API models these:

/questions - will get you all the questions on the site

/answers - will get you all the answers,

Now, how do you think you might get all answers for a given question? Well, a list of answers should probably be a child of a question, so something that represents that, something like:

/questions/{Question ID}/answers

Sure enough, thats the URL - and the behaviour is consistent through the design, I'm sure you can guess how you might retrieve all comments for a given answer.

I'll be honest, developing against an API where I can guess the endpoints makes me feel pretty awesome.

MVP and the Lean Product

The idea of a Minimum Viable Product is a core concept in Lean Startup - the idea that you build the minimum version of the product that can get you feedback as to whether you are heading in the right direction before investing more effort in building out a fuller fledged product.

Whilst I think an appreciation of an MVP is a good awareness for any engineer - I have generally found it is useful in facilitating conversation when building a new product - if as an engineer you recognise some new feature is going to be considerable effort, its helpful to think about what hypothesis the feature is trying to test, and is there a simpler engineering solution to test the hypothesis.

However, the principal of not over engineering the API is also a good one - once you have a well designed domain model, its quite tempting to get carried away with REST design best practices and start building endpoints for all different variations of the model - however, unless you need them, it's better to start with less (if you are building a public REST API from the start then it might be harder to know what isn't needed, but you can still certainly build an MVP version of the API to test the hypothesis - e.g. whether people actually want to use your API)

User Testing

Another important part of building a product is collecting user feedback - Whilst a REST API may not be well suited to A/B testing (your consumers won't be very happy if switch the API design on them after they have implemented against a specific variation!), listening to user feedback is still important - because the consumers will likely be engineers (writing the client code), then hopefully they will be expecting REST standards and good API design principles, but another way to think about this is the importance of not making breaking changes to your API.

A technical way of thinking about this is Consumer Driven Contracts Testing - As RESTful APIs are de-coupled from the client code, the API doesn't have any idea what clients are using or doing, so this approach allows consumers to define what they expect from the API, and then the API can always test against these contracts, so be sure that any change to the API is still in keeping with existing consumers, or users, of the API.


I think, in general, all engineers can benefit from thinking about how good products are built, what concepts are important, and be able to think about the products or services they are building in this way - both in terms of thinking about the actual implementation as a product that someone else will be the user of (either externally if a RESTful API, but also internally if its cross-team or even just someone coming to take over your code once you have moved on).


First of the year

9:30 PM , , 0 Comments

Ignoring global events, 2016 was an interesting year for me. Personally, I manged to achieve, and more importantly learn, quite a bit: I switched jobs, started learning Scala (full time), started learning and working with machine learning, entered my first BBQ competition and my first Chilli cook-off competition. I also submitted my first proposal for a conference talk - it was not selected in the end, but I was pleased to have at least submitted it, and one of my goals for 2017 is start public speaking (conferences/meetups) so will hopefully continue to submit talk proposals in the hope that one gets selected by the end of the year.

There are probably lots of things that I had in mind to do at the start of the year that I didn't manage - such as finishing my education side project, but never mind.

The new tech stuff I have been able to learn has been a product of moving jobs, so that's cool, and I'm very happy with what we are doing there at the moment, but with the two cooking competitions I signed and paid up to enter both of them early in the year on a whim, and that provided the motivation and deadline to force me to practice, prepare and actually complete them.


So here are a few goals I am aiming for in this coming year:

Machine learning - let's start with a realistic one. I want to become a lot more competent and experienced in the ML field. It's an exciting area right now, and something that is only going to get more prevalent, so I am really happy to be somewhere I can be involved in it from the start of a project and get to know it better.

Writing - A little different this time. As a goal to aim for, and also just to see the wider process I want to write a book. About meat. I have lots of ideas already, it will be relatively short but at first at least it should be an e-book whistlestop tour of hows and whys of cooking meat.

Kickstarter - I am currently working on an android app for my boy (5 years old), and it made me think about how I could make it more accessible for other kids to make their own apps. So I might kickstart it once I have made some progress. Again, mostly for the experience of running a kickstart project rather than actually expecting to get the money to do it.

Public speaking - As mentioned, I would like to get into this. So if I manage to get a proposal accepted before the end of the year, that would be cool.

On to the next one!


Managing your personal finance (a Java webapp)

"Cash Rules Everything Around Me" - Method Man

A few years ago I was fed up of how useless all the banks were at helping me understand my transactions/spending patterns with their online banking. Basically, they all just seemed to let you log on, and then see a list of transactions (often just referencing codes) of payments in and out by date. Nothing else. It seemed like there were lots of simple things missing that could make it a lot more useful, to name a few:

  • Smarter tagging/categorisation - where some direct debit was just listed by the code of the recipient rather than meaningful info, or being able to group all the spending in coffee shops
  • Alerting to changes/unexpected behaviour - for example when a fixed term-price contract comes to an end and the price changes - detecting a change in a regular fixed aount/recipient should be easy

So this is what I started building - a lot of the functionality has since been built into banks such as Monzo, the aim of this webapp was simply to allow you to import transaction histories (most online banks I use provide the ability to export transactions in csv) so you could view/filter/tag any existing bank data. It only supports Santander at the moment, and the alerting stuff never actually got built in, but I thought I would chuck it on Github rather than have it just sit around gathering dust in a private repo.

I have written about the project a while ago, when it was first underway, but as it has been not worked on for a while, I thought I would move it to GitHub - hence the post here today. The app and all the code can be downloaded here:

Building & Running

Its a simple java, maven app (it was several years ago, so not migrated to Spring Boot, Gradle, Groovy etc) - and it builds a WAR file. It also uses LESS for style stuff, which is also built by maven (although watch out for silent failures if you make changes!). If you are using an eclipse based IDE you can follow the guide here: to get incremental build support for LESS building (e.g. just change the LESS source and go refresh in the browser and it will reload).

You can run the WAR file in any tomcat/IDE server, and the backend is currently expecting a MySql DB (but can easily be switched for a differed DB driver if required)


  • Migrate to Spring Boot, Groovy, Gradle
  • Add further bank transaction importers
  • Add alerting framework


(I loaded in a generated transaction history - which is why some of the charts look a little weird)