Sentiment Analysis of tweets with Scala and AWS Machine Learning

Recently, in an attempt to starting learning React, I started building an akka-http backend API as a starting point. I quickly got distracted building the backend and ended up integrating with both the Twitter streaming API and AWS' Comprehend sentiment analysis API - which is what this post will be about.

Similar to an old idea, where I built an app consuming tweets about the 2015 Rugby world cup, this time my app was consuming tweets about the FIFA world cup in Russia - splitting tweets by country and recording sentiment for each one (and so a rolling average sentiment for each team).


The premise was simple:

  1. Connect to the Twitter streaming API (aka the firehose) filtering on world cup related key words
  2. Pass the body of the tweet to AWS Comprehend to get the sentiment score
  3. Update the in memory store of stats (count and average sentiment) for each country

In terms of technology used:
  1. Scala & Akka-Http
  2. Twitter4s Scala client
  3. AWS Java SDK

As always, all the code is on Github - to run it locally, you will need a Twitter dev API key (add an application.conf as per the readme on the Twitter4s github) and you will also need an AWS key/secret - the code will look for credentials stored locally but you can also just set them in environment variables before starting. The free tier supports up to 50,000 Comprehend API requests in the first 12 months - and as you can imagine, plugging this directly into twitter can result in lots of calls, so make sure you restrict it (or at least monitor it) before you leave it running!

Consuming Tweets

Consuming tweets is really simple with the Twitter4s client - we just define a partial function that will handle the incoming tweet. 

The other functions about parsing countries/teams are excluded for brevity - and you can see its quite simple - each inbound tweet we make a call to the Sentiment Service (we will look at that later) then pass it with the additional data to our update service that will then store it in memory. You will also see it is ridiculously easy to start the Twitter streaming client filtering by key words.

Detecting Sentiment

Because I wanted to be able to stub out the sentiment analysis without being tied to AWS, you will notice I am using the self-type annotation on my twitter class above, which requires a SentimentModule to be passed in at construction - I am using a simple cake pattern to manage all my dependencies here. In the Github repo, there is also a Dummy implementation, that will just pick a random number for the score, so you can still see the rest of the API working - but the interesting part is the AWS integration:
Once again, the SDK makes the integration really painless - in my code I am simplifying the actual results a lot to a much cruder Positive/Neutral/Negative rating (plus a numeric score -100..100).

The AWSCredentials class is the bit that is going to look in the normal places for an AWS key.

Storing and updating our stats

So now we have our inbound tweets and a way to asses their sentiment score - I then setup a very simple akka actor to manage the state and just stored the API data in memory (if you restart the app, the store gets reset and the API stops serving data).

Again, very simple out of the box stuff for akka, but it allows easy and thread safe management of the in-memory data store. I also track a rolling list of the last twenty tweets processed, which is managed by a second, almost identical, actor.

The results

I ran the app during several games, below are some sample outputs from the API. The response from the stats API is fairly boring reading (just numbers) but the example tweets show two examples of a positive and neutral tweet correctly identified (apologies for the expletives in the tweet about Poland - I guess that fan wasn't too happy about being beaten by the Senegalese!) - you will also notice, the app captures the countries being mentioned, which exposes one flaw of the design: in the negative tweet from the Polish fan loosing two goals to Senegal, it correctly identifies the sentiment as negative, but we have no way to determine the subject - as both teams are mentioned, the app naively assigns it as a negative tweet to both of the teams, where as on reading, it is clearly negative with regards to Poland (I wasn't too concerned for my experiment, of course, just an observation worth noting).

Sample tweet from the latest API:

Sample response from the stats API:

When I finally did get around to starting to learn React, I just plugged in the APIs and paid no attention to styling, which is a round about way of apologising for the horrible appearence of the screenshot below (I'm really sorry about the css gradient)!

We need to talk about AI

Ethical and Regulatory questions facing AI

At work, as part of a reading group, I had the chance to spend some time reading about some challenges facing AI regarding ethics and regulation. Despite having worked with and thought about AI and Machine Learning before, I hadn't spent much time thinking about the ethical and legal challenges facing the field, so it was a lot of fun to do some reading in the area. I was mostly interested in the more short term challenges of the space in the coming years, so didn't get into the more long term concern about things like Artificial General (Super) Intelligence or weaponised AI.

Regardless of area of expertise, most of us are probably already aware of the momentum around Artificial Intelligence (AI). Between self driving cars, home assistants (Alexa, Google Home, et al) and the growing capabilities of our mobile devices there is no escaping the ever looming presence of AI in our lives.

Furthermore, it seems unlikely that this will slow down anytime soon. A recent Narrative Science study found that AI adoption grew by 60% in the last year with 61% of organisations having reported to have implemented AI within their business, and a Gartner report predicted that by 2020 85% of customer interactions will be managed without human intervention.

But despite this growth, there is still a question mark over whether, and if so, how, the field should be regulated. Having been brought up on decades of sci-fi about AI going rogue and robots enslaving the human race, it feels like there is both the fear of this possible future, whilst also scepticism that these fears are only the stuff of movies. Elon Musk has famously warned of the future risks of AI: “I think we should be very careful about artificial intelligence. If I had to guess at what our biggest existential threat is, it’s probably that” whilst others, including Mark Zuckerberg, have downplayed the claims of doomsday scenarios as irresponsible.

So what's the big deal? AI already permeates so many aspects of life and business, but considering for a moment that these technologies could be being used to control autonomous cars on public roads, determine people’s credit score or suitability for a job, to detect illness or even in policing and judicial decision making - it is pretty clear that we should have a good understanding of these technologies and clear systems of accountability and control in place. In all these examples getting a decision wrong has the potential to ruin lives, yet there is still limited regulation, control or even understanding of the algorithms, the data and their usage.

A common analogy is with other heavily regulated industries: big pharma companies can’t release drugs without thorough testing and approval, yet several big tech companies have already started testing autonomous vehicles on public roads with limited regulatory controls (that’s not to say that they have had a completely free pass, there are varying levels of regulation, depending on the region. Arizona has long been promoting itself as an AI friendly state to try to attract business from big tech, making it as easy as possible for companies to test self driving cars with minimal regulatory friction, and they recently saw the first fatality from a self-driving car).

In its 2017 report, the AI Now Institute recommended that AI be outright ban from use in any high risk areas, such as criminal justice, healthcare, welfare and education and further measures for other domains - which given the potential impact of errors in these domains, seems like a fairly sensible starting point.

Uncertainty and the unknown

One key aspect that is especially troubling is the lack of understanding of both the data and the underlying technology. This isn’t necessarily a surprise - we have computers being trained on millions of data points, to the point of being able to outperform humans at their tasks, so it should come as no surprise that both the inner workings and the end results could be beyond easy comprehension.

This problem has been demonstrated by several high profile mishaps from large tech companies, showing that even companies that have a wealth of resources and technical expertise in the domain can be caught out - such as Microsoft’s AI chatbot Tay, who quickly became racist when released into the wild. Clearly Microsoft had neither intended nor envisaged that end result. Similarly, when Google translate revealed gender bias in pairing “he” with “hardworking” and “she” with “lazy” - it clearly wasn’t an intentional or foreseen behaviour, but eventually revealed itself with wider usage.

Understanding where bias in AI comes from

To get a better understanding of where these biases and blind spots come from, let’s take a look at how AI learns. Broadly speaking, there are three primary approaches to training AI: Supervised, Unsupervised and Reinforcement.

Unsupervised learning is where the AI is fed very large amounts of raw data - for example an entire corpus of fictional texts - and it is left to work out patterns or groupings. That is, it doesn’t know a right or wrong answer, but can identify related things from the dataset and group them together (for example, AI reading popular fiction might group together terms such as “batman” and “wonder woman”, but it would have no knowledge of what these terms actually mean).

Supervised learning is where the AI is fed very large amounts of marked up data - that is, for each input, it also gets passed the expected output. An example of this is if you had a large set of photos (say Google Photos) which are pre-tagged with descriptions of what is in the photo, the dataset could be used to train an AI to identify contents of a photo.

Reinforcement learning is similar to supervised in as much as the algorithm gets information as to whether or not it is performing well (like knowing the answer for a given input) but is in the form of a feedback loop and works more like a trial-and-error approach to learning (it might have a general fitness score function that can be used by the algorithm to determine whether or not its response to given input has been successful or not and adjust its response for the next cycle). The simplest example of this is something like AlphaGo/AlphaZero, where an algorithm learns to play a game like Go or chess by trial and error and gets feedback on its attempted response from the game itself.

Both Supervised and Unsupervised learning cases require vast amounts of data to accurately train AI, which really leads us to one of the primary challenges for building fair and ethical AI: sourcing the data to train on. AI is dependent on these huge datasets, and finely tuned to all the details and subtle underlying patterns, regardless of whether we are aware of them or not, and as we will see, getting objective, raw data sets of sufficient magnitude is rife with challenges.

Institutional bias

Similar to the concept of Conway’s Law, which states “any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure”, the data we naturally generate in action, conversation and interactions as a society or organisation will naturally reflect the values, beliefs and structure of the society (or organisation). There is an intrinsic and inescapable subjectivity in all big data, best described by Lisa Gitelman in her book Raw Data is an Oxymoron:

Objectivity is situated and historically specific; it comes from somewhere and is the result of ongoing changes to the conditions of inquiry, conditions that are at once material, social, and ethical

A simple example of this could be in criminal statistics: if a police force stop-and-search a particular demographic more heavily than others, then that will be reflected in the numbers and therefore that cultural subjectivity influences the data set - this subjectivity will then naturally carry over to, and likely be amplified by, the trained AI as it becomes finely tuned to the data (an example of this was seen where some software used to inform sentencing decisions relied on data that had institutional bias, which resulted in a racial bias in the risk assessment - strengthening the AI Now report’s proposal of banning AI use in these areas).

Finding complete & representative data

Compounding this problem is the fact that researchers working in AI face the challenge of finding datasets that are big enough and permitted for such use, which can be hard to come by, meaning they often make-do with incomplete or skewed datasets. For example, the popular community discussion web site Reddit makes its vast historic dataset publicly available, which is a rich source of natural text and conversation, and makes for a very tempting dataset for engineers and researchers to take advantage of - however, Reddit is a very specific subset of the internet, and the real world demographic, meaning that whilst there is undoubtedly a lot that can be learnt from that wealth of data, any AI trained on it will be heavily subjective.

There have been several reports finding that these incomplete or skewed data sets just further add to the bias. The 2017 AI Now report said:

data can easily privilege socioeconomically advantaged populations, those with greater access to connected devices and online services

Which is to be expected when you think about it really - always connected people with mobile devices will naturally be generating a lot more date than those without easy access to computers. On a very simple level, the core regular users of reddit, for example, will likely have access to mobile devices or in the very least have available access to computers and the internet - which rules out large parts of the population - not to mention the inclination to partake in the online community.

There are also other challenges that are intrinsic to the way AI currently works: if we have a dataset where a particular demographic is only reflected by 1% of the data, then the AI could claim to achieve 99% accuracy whilst being completely inaccurate for all of that 1% minority. Furthermore, we know that there is a strong relationship between the amount of training data and the accuracy of AI, so in the scenario we have a perfect representation of the population, by definition, all minority groups will have a smaller selection of data points to train on so inevitably the performance of the AI for minority groups will fare worse.

Finally let’s consider again that we have a huge, rich dataset (the idea scenario), and we try to intentionally exclude sensitive features that might explicitly encode bias: race, gender, age, etc. There are still loads of data points that may still act as a indirect proxy to these features, so even without including gender, age and sex in the input data, it is easy to see how these features can get encoded in other data points such as names, location, interests, communication style. This makes it even harder to detect and prevent bias in our datasets.

There is no objectivity in big data.

How can we address the problem?

Some of these examples might have clearer cases of existing bias that we need to be address in training our AI, but a tougher challenge is how can we address the more subtle biases hidden in the cultural objectivity that we might not even be aware of? We all carry our own opinions and biases that subconsciously affect our opinions and attitudes toward things - but if we are not consciously aware of those, we need to think about how we can ensure that developers training AI can have the foresight to engineer around these biases?

This issue highlights one often recommended  approach to tackling the problem of having a greater emphasis on the need for diversity in the teams building AI. Both diversity in terms of individual identities but also cross-functional teams. Statistically and broadly speaking, AI is often developed by teams of engineers with limited diversity, which results in a limited range of views when thinking about the dataset and in what goals are optimised for in the training process. The 2017 AI Now report recommended:

“​stakeholders​ ​in​ ​the​ ​AI​ ​field​ ​should release​ ​data​ ​on​ ​the​ ​participation​ ​of​ ​women,​ ​minorities​ ​and​ ​other​ ​marginalised​ ​groups within​ ​AI​ ​research​ ​and​ ​development.

Aside from trying to recognise subtle bias in the data, we also need to consider that the objective norm, and what we consider to be ok at the moment is changing. Going back to Lisa Gitelman’s quote: “Objectivity is situated and historically specific”. If you could get a dataset from even just two decades ago, it’s not hard to imagine that AI trained on that would have un-acceptable biases because the societal norm and general attitudes to race, gender and identity, etc have changed significantly since then.

As a simple example, take the motor insurance industry. For decades, insurance companies identified young male drivers as a particularly high risk of accident so traditionally charged much higher premiums for that demographic - previously a widely accepted approach, and one based in statistics: young male drivers were statistically more likely to have an accident behind the wheel. But then, in 2012 EU gender discrimination regulation came into effect that prevented companies charging men more than women, so now the insurers have stopped that categorisation for pricing despite the data being available. If that was AI it would need to be re-trained with a modified dataset, with gender probably removed from the data and thought put into other data points that would also need to be removed (names, for example, might very easily be a broad proxy to gender). Whilst this is a simpler example, as its a binary change in legislation with clear requirements, there are also the more gradual shifts in attitude where it becomes a lot fuzzier - like the changes in attitudes on race, gender and secuality over the last thirty years.

We previously discussed the idea that even if we exclude socially salient data points, such as gender, those features can still get encoded via other proxies in the data, and this example of the change in EU regulation and its effect on the insurance industry provides an interesting case study in exactly that phenomenon. There was an article written in the Guardian following the EU ruling, explaining that, despite the ruling meaning insurers couldn’t charge more because a driver was male, male premiums have actually increased in comparison to female premiums since. The reasoning they provide, is that rather than classifying on the crude, data point of gender, the system instead places greater importance on a wider set of data points, and it turns out that these other data points are really just acting as encoded proxies (they list car size, occupation, vehicle modifications). The article makes the observation that MoneySupermarket released a study showing that 8 out of the worst 10 occupations for drink/drug drive incidents were the building trade, with midwives being the least likely to have a drink/drug drive offence, the suggestion being that building trade is predominantly male, and midwives, predominantly female.

It certainly seems to me like there are still lots of challenges as to how we can foresee potential problems and how to tackle them. A key starting point will be ensuring teams working in the area have a good understanding of the dataset they are working with: where it comes from, any inherent bias or blind spots and which of the data points might need modifying or weighting due to their contextual/social salience. This will need to be driven through agreed best practices and AI development standards from organisations like AI Now and from academia, as well as a need for appropriate regulatory controls (although these face their own challenges, which I will discuss in a later article).

I also believe that these challenges mean an even greater need for for diversity of the teams -  both in terms of the race, background, gender etc of the team, and also cross-functional members, not just engineers but also working closely with the specific domain experts for the field.

Photo credits:

Heading Photo by Alex Knight on Unsplash
Anonymous person Photo by Andrew Worley on Unsplash

Serverless with AWS Lambda & Scala

About a year ago, I started looking at AWS's serverless offering, AWS Lambda. The premise is relatively simple, rather than a full server running that you manage and deploy your docker/web servers to, you just define a single function endpoint and map that to the API gateway and you have an infinitely* scale-able endpoint.

The appeal is fairly obvious - no maintenance or upgrading servers, fully scalable and pay per second of usage (so no cost for AWS Lambda functions that you define whilst not being called). I haven't looked into the performance of using the JVM based Lambda functions, but my assumption is that there will be potential performance costs if your function isn't frequently used, as AWS will have to start up the function, load its dependencies etc, so depending on your use case, it would be advisable to look at performance bench marking before putting into production use.

When I first looked into AWS Lambda a year ago, it was less mature than it is today, and dealing with input/output JSON objects required annotating POJOs, so I decided to start putting together a small library to make it easier to work with AWS Lambda in a more idiomatic Scala way - using Circe and it's automatic encoder/decoder generation with Shapeless. The code is all available on github.

Getting Started

To deploy on AWS I used a framework called Serverless - this is a really easy framework to setup serverless functions on a range of cloud providers. Once you have followed the pre-requisite install steps, you can simply run:

serverless create --template aws-java-gradle 

This will generate you a Java (JVM) based gradle project template, with a YML configuration file in the root that defines your endpoints and function call. If you look in the src folder as well, you will also see the classes for a very simple function that you can deploy and check your Lambda works as expected (you should also take the time at this point to login to your AWS console and have a look at what has been created in the Lambda and API Gateway sections. You should now be able to curl your API endpoint (or use the serverless cli with a command like: serverless invoke -f YOUR_FUNCTION__NAME -l).

ScaLambda - AWS Lambda with idiomatic Scala

Ok, so we have a nice simple Java based AWS Lambda function deployed and working, let's looking at moving it to Scala. As you try to build an API in this way you will need to be able to define endpoints that can receive inbound JSON being posted as well as return fixed JSON structures - AWS provides its inbuilt de/serialisation support, but inevitably you will have a type that might need further customisation of how it is de/serialized (UUIDs maybe, custom date formats etc) and there are a few nice libraries that can handle this stuff and Scala has some nice ways that can simplify this.

We can simply upgrade our new Java project to a Scala one (either convert the build.gradle to an sbt file, or just add Scala dependency/plugins to the build file as is) and then add the dependency:

We can now update the input/output classes so they are just normal Scala case classes:

Not a huge change from the POJOs we had, but is both more idiomatic and also means you can use case classes that you have in other existing Scala projects/libraries elsewhere in your tech stack.

Next we can update the request handler - this will also result in quite similar looking code to the original generated Java code, but will be in Scala and will be backed by Circe and it's automatic JSON encoder/decoder derivation.

You will see that similar to the AWS Java class we define generic parameter types for the class that represents the input case class and the output case class and then you simply implement the handleRequest method which expects the input class and returns the output response.

You might notice the return type is wrapped in the ApiResponse class - this is simply an alias for a Scala Either[Exception, T] - which means if you need to respond with an error from your function you can just return an exception rather than the TestOutput. To simplify this, there is an ApiResponse companion object that provides a success and failure method:

All the JSON serialisation/de-serialisation will use Circe's auto derived code which relies on Shapeless - if you use custom types that cannot be automatically derived, then you can just define implicit encoder/decoders for your type and they will be used.

Error handling

The library also has support for error handling - as the ApiResponse class supports returning exceptions, we need to map those exceptions back to something that can be returned by our API. To support this, the Controller class that we have implemented for our Lambda function expects (via self type annotations) to be provided an implementation of the ExceptionHandlerComponent trait and of the ResponseSerializerComponent trait.

Out of the box, the library provides a default implementation of each of these that can be used, but they can easily be replaced with custom implementations to handle any custom exception handling required:

Custom response envelopes

We mentioned above that the we also need to provide an implementation of the ResponseSerializerComponent trait. A common pattern in building APIs is the need to wrap all response messages in a custom envelope or response wrapper - we might want to include status codes or additional metadata (paging, rate limiting etc) - this is the job of the ResponseSerializerComponent. The default implementation simply wraps the response inside a basic response message with a status code included, but this could easily be extended/changed as needed.


The project is still in early stages of exploring the AWS Lambda stuff, but hopefully is starting to provide a useful approach to idiomatic Scala with AWS Lambda functions, allowing re-use of error handling and serialisation so you can just focus on the business logic required for the function.

An opinionated guide to building APIs with Akka-Http

Akka-Http is my preferred framework for building APIs, but there are some things I have picked up along the way. For one thing, Akka-Http is very un-opinionated in its approach, there are often lots of ways to do the same thing, and there isn't a lot of opinionated guidance about how to do things.

I have been writing Akka-Http APIs for I guess about 18 months now (not long, I know), having previously worked predominantly with libraries like Spring, and I have seen some pretty nasty code resulting from the this (by this I mean, I have written nasty code - not intentionally, of course, but from good intentions starting off trying to write, clean, idiomatic Akka-Http code, and ending up in huge sprawling routing classes which are un-readable and generally not very nice).

The routing DSL is Akka-Http is pretty nice, but can quickly become unwieldy. For example, let's imagine you start off with something like this:

This looks nice right? A simple nested approach to the routing structure that reflects the URL hierarchy and the HTTP method etc. However, as you can probably imagine, try and scale this up to a full application it can very easily become fairly messy. The nested directives make it nice to group routes under similar root URLs but as you do that you end up with very long, arrow-shaped code that actually isn’t that easy to follow - if you have several endpoints nested within the structure it actually becomes quite hard to work out what endpoints there are and what is handling what.

Another problem that needs to be managed is that with the first one or two endpoints you might put the handling code directly in the routing structure, which is ok for very small numbers, but it needs to be managed sensibly as the endpoints grow and your routing structure starts to look more and more sprawling.

It is of course personal preference, but even with the simple example above, I don’t like the level of nesting that already exists there to simply define the mapping of the GET HTTP method and a given URL - and if you add more endpoints and start to break down the URL with additional directives per URL section then the nesting increases.

To simplify the code, and keep it clean from the start I go for the following approach:

  1. Make sure your Routing classes are sensibly separated - probably by the URL root (e.g. have a single UserRoutes class that handles all URLs under /users) to avoid them growing too much
  2. Hand off all business logic (well, within reason) to a service class - I use Scala’s Self-Type notation to handle this and keep it nicely de-coupled
  3. Use custom directives & non-nested routings to make the DSL more concise

Most of these steps are simple and self explanatory, so its probably just step 3 that needs some more explanation. To start with, here is a simple example:

You can see points 1 and 2 simply enough, but you will also notice that my endpoints are simple functions, without multiple levels of nesting (we may need some additional nesting at some point, as some endpoints will likely need other akka-http directives, but we can strive to keep it minimal). 

You might notice I have duplicated the URL section “users” rather than nesting it - some people might not like this duplication (and I guess risk of error/divergence of URLs - but that can be mitigated with having predefined constants instead of explicit strings), but I prefer the readability and simplicity of this over extensive nesting.

Custom Directives

First off, I have simply combined a couple of existing directives to make it more concise. Normally, you might have several levels of nested directives such as one or more pathPrefix(“path”) sections, the HTTP Method such as get{} another one to match pathEndOrSingleslash{} - To avoid this I have concatenated some of these to convenient single points.

getPath, postPath, putPath, etc simply combine the HTTP method with the URL path-matcher, and also includes the existing Akka-Http directive “redirectToTrailingSlashIfMissing” which avoids having to specify matching on either a slash or path end, and instead allows you to always match exact paths - It basically squashes the three directives in the original HelloWorld example above down to one simple, readable directive.

Custom Serialisation

You may also notice, I have implemented a custom method called “respond” - I use this to handle the serialisation of the response to a common JSON shape and to handle errors. Using this approach, I define a custom Response wrapper type that is essentially an Either of our internal custom error type and a valid response type T (implementation details below) - this means in all our code we have a consistent type that can be used to handle errors and ensure consistent responses.

This respond method simply expects a Response type to be passed to it (along with an optional success status code - defaulting to 200 OK, but can be provided to support alternative success codes). The method then uses Circe and Shapeless to convert the Response to a common JSON object. 

Let’s have a look at some of the details, first the custom types I have defined for errors and custom Response type:

Simple, now let’s take a look at the implementation of the respond method:

It might look daunting (or not, depending on your familiarity with Scala and Shapeless), but its relatively simple. The two implicit Encoder arguments that are included on the method signature simply ensure that whatever type A is in the provided Response[A], Circe & shapeless are able to serialise it. If you try to pass some response to this method that can’t be serialised you get a compile error. After that, all it does is wraps the response A in a common message and returns that along with an appropriate (or provided) HTTP status code.

You might also notice the final result is built using the wrap method in the ResponseWrapperEncoder trait - this allows easy extension/overriding of what the common response message looks like.


All of this machinery is of course abstracted away to a common library that can be used across different projects, and so, in reality it means we have a consistent, clean API with simple routing classes as simple and neat as below, whilst also handing off our business logic to neater, testable services.

All the code for my opinionated library and an example API is all on GitHub, and it is currently in progress with more ideas underway!

Conference updates: JAXLondon and W-JAX 2017

I have had the pleasure of closing out this year by speaking at two conferences. The first in October was JAXLondon, the second in November was W-JAX in Munich - the talk was titled "Agile Machine Learning: From theory to production".

As the title might suggest, the talk was about some considerations and challenges of doing Machine Learning(ML) work in a commercial environment - so a lot of softer aspects of ML: when you should adopt it, how you can work as a team on ML.

The talk at JAXLondon was recorded, so hopefully at some point that will be available for me to share as well, but for the time being, here is a brief interview that my co-speaker and I gave ahead of our JAXLondon talk.

See more details and write up here

On Education: Let us play

Education is something that has been an interest to me for some time, and with a child currently going through primary school in the UK, it's something that I think about quite a lot, so this is something of an open letter on the state of the education system in the UK.

First off, I should say that I am a strong believer in the importance of both the role of free-play in learning, and also of instilling a curiosity in children as an approach to ensure future success, rather than more structured didactic teacher and test driven approach. This belief is largely grounded in various things I have read on the topic, but appreciate there is undoubtedly a lot more depth to the subject matter than I know about.

Positive Examples

If we are thinking about the education system, it seems prudent to look elsewhere for success stories to see what we can learn from and improve on in the UK system, and a glaringly obvious example would be the Finnish education system. Finland's often cited education system has consistently been the top ranked system in Europe for the last 16 years, so what are they doing right?

The first notable difference is the age at which children start school: children will attend preschool from an early age, but primary school doesn't start until 7 years old - in other words, formal teacher-lead instruction on what we would consider core topics: maths, reading and writing, do not start until the age of seven. Before that, the education system is entirely focused on free, creative play.

This model also ties into research from some neuroscientists who believe that before the age of seven or eight, "[children] are better suited for active exploration than didactic explanation" - claiming that "the trouble with over-structuring is that it discourages exploration". Having witnessed this behaviour first hand, this very much supports my anecdotal data on the subject: trying to explain to a 6 year old a moderately complex process can be a challenge, but, let them watch you perform it a few time and they will often pick it up with much greater ease (case in point: using a tech device, playing a video game etc). Furthermore, it seems to me that encouraging exploration and independent discovery should surely be a key part of any process aiming to instil curiosity in children.

These results were also mirrored in the research by the Lego Foundation, who claimed children should learn through play until at least the age of eight (despite possible cynicism based on a report from a toy manufacturer recommending more play, that article is really spot on).

To put this schooling approach into perspective, in the UK, children will already have had up to three years of five days a week, full day, classroom based teaching by the age of 7. When my eldest son turns 7 he will be finishing his third year in school and will already faced the prospect of national standardised testing in the form if the SATs (thankfully the government have decided to scrap these, but they will only stop being compulsory in 2023).

That is heartbreaking.

Thinking of this child, so full of joy and enthusiasm for playing, whether it be running around outside lost in an imaginative world of play or sitting down playing with lego, having to spend something like 25 hours a week in a classroom seems unthinkable.

But it's not just starting late, either, even once more formal education starts, they make sure to keep play an integral part of the school day, and children are required to have 15 minute play breaks every hour. Aside from the potential educational benefits of regular playing, research has also found that outdoor play is linked to healthier and happier children (aside: have you ever tried to get a 6 year old to concentrate for 45 minutes? If so, you will probably see the futility in anything other than regular play breaks)

Once again, for some perspective, whilst visiting UK primary schools for my eldest son, one school head mistress casually boasted that the traditional afternoon playtime break had been dropped in favour of more classroom time.

Unsurprisingly, we didn’t apply for a place at that school.

But why?

The common thinking behind starting formal schooling earlier is that the earlier they start learning, the better prepared they'll be, and the greater the head start they'll have. But even if the Finnish school system didn't appear to disprove this theory, it's worth considering the difference in benefits of learning by rote/testing Vs independent learning (via play or other means) and the independent curiosity needed for the latter. I'd suggest that, in the modern knowledge economy in which we live, and with the quickening rate at which information and understanding is being changed by advancing technology, the most beneficial skill that someone can leave school with is curiosity and the ability along with the desire to learn independently. That is, to leave school as lifelong learners. I think it says a lot that a key topic on the Finnish national curriculum is simply “learning to learn”.

Beyond looking at success stories of other education systems, we can look at history. Current incarnations of the education and school systems are a relatively modern thing, so what did we do to learn before then? Of course, families and communities have long recognised the importance of amassing and passing on information to younger generations, if not through formal education, but in many cultures, children learn through imitation and experimentation (which, as I mentioned previously, is easy to believe if you have a young child that has grown up around adults using mobile devices and have witnessed the speed at which they become proficient through imitation and experimentation).

It might be tempting to think that whilst humankind were able to learn through such basic play techniques in time gone by purely because what we needed to learn was simpler, and that the as we have progressed as a society it has also demanded people have a greater understanding and depth of knowledge in order to keep up with industrial and technological advances, so the education system has evolved out of necessity.

However, I would argue that the opposite is true - firstly, as pointed out earlier, the speed at which science and technology is advancing means whatever level you leave the schooling system, within a couple of years understanding and techniques will likely have moved on, and your ability to learn independently and keep up with the fast paced changes is going to be essential to success.

Secondly, as I have written before, I would argue that play provides the essential understanding and building blocks for going on to study and understand computer science, engineering and maths.  As a computer scientist, I personally think the education that stood me in best stead for going on to learn - and be successful professionally - was playing with toys like Lego and puzzle solving games.

For example, let’s take a look at one of the Key Stage One goals for computing in the current UK National Curriculum:

“use logical reasoning to predict the behaviour of simple programs”

To be clear, Key Stage 1 is 5 to 7 years old - this is children who, some neuroscientists think, are of an age that is too young to be in formal taught education, who, in Finland, would still be enjoying creative play, and who may not all be capable of reading un-aided. I don’t know about you, but I wouldn’t like to have to teach that in any medium other than play.

However, if we just re-frame the problem and consider this goal in the context of playing with train sets, it becomes simpler: “use logical reasoning to predict the behaviour of trains” - give the kids train sets, let them build tracks and think about what happens when a train is added: what happens if we add two trains? What happens if we change the direction of trains? Or change the behaviour of a junction piece? Being able to reason logically about such behaviours and changes is a very transferable skill that is useful for thinking about a range of problem solving disciplines, including computer science.

And its not just computing - there is a growing group of mathematicians who posit that preschoolers are actually capable of understanding calculus and algebra. But not just that, but  by actually attempting to teach them maths the way we currently do, it is crushing almost all appetite or future interest in the subject, that is actually an amazing world of wonder and surprise, by taking all the playful fun out of maths and making it a boring case of memorising numbers and patterns - which obviously has the end result of killing their curiosity or interest in going out and independently learning.

Ultimately, I would love it if UK schools embraced free play more, if they embraced teaching STEM subjects through play, but I understand that it’s a huge shift that needs to come from the government. Recognising that the SATs test is not a positive thing for 6 and 7 year olds is a good first step, but the UK primary schools are still so governed by the national curriculum and expectations around performance that it seems impossible for any individual school to start to move the dial.


  1. The Atlantic: The underrated gift of curiosity
  2. The Guardian: The secret of Europe's top education system
  3. The New York Times: Let the kids learn through play
  4. The Guardian: Children should learn mainly though play until the age of 8
  5. Gov.UK: SATs practice material
  6. The Atlantic: How Finland keeps kids focused through free play
  7. The Play Return: An assessment of play initiatives
  8. Wikipedia: Knowledge Economy
  9. Gov.UK: Computing National Curriculum
  10. Fostering mathematical thinking through playful learning (paper)
  11. The New York Times: What babies know about physics
  12. The Atlantic: Five year olds can learn calculus